By:
(Marwa & Erlik
Widiyani Styati)
Definition of Equating
The process of equating is used to
obtain comparable scores when more than one test forms are used in a test
administration. As Petersen et al. (1989) point out the process of equating “is
used to ensure that scores resulting from the administration of the multiple
forms can be used interchangeably.” They further argue that equating can be
defined as empirical procedures for establishing a relationship between raw
scores on two test forms that can then be used to express the scores on one
form in terms of the scores on the other forms. Angoff (1971) has defined the
equating of tests as a process “to convert the system of units of one form to
the system of units of the other” so that the scores obtained from one form could
be compared directly with the scores obtained from the other form.
There are several techniques and
methodologies that can be used in equating test forms. Generally speaking,
these techniques and methodologies can be divided into two major methods,
namely classical test theory (including linear equating and equipercentile
equating) and item response theory methods (including Rasch model/one parameter
logistic model, two parameter logistic model,
and three parameter logistic model).
Conditions for Equating
Test Forms
As Petersen et al. (1989) point out,
equating is necessary because it is almost impossible to construct multiple forms
of a test that are completely parallel. Even though the test developers use the
same test specifications to write test items and make every effort to write
items in one form as similar as possible to the items in another form, there is
no guarantee that the difficulty levels of the items will be the same.
Experts in testing have different
views on the conditions that must be met in equating. Petersen et al. (1989)
suggest that according to Lord (1980), for the scores on test X to be equated
with the scores on test Y, the following four conditions must be met.
1.
Same underlying trait: The two tests must both be measures of the same
characteristics (latent trait, ability, or skill).
2.
Equity: For every group of examinees of identical performance level on the
underlying trait, the conditional frequency distribution of scores on test Y,
after transformation, must be the same as the conditional frequency
distribution of scores on test X.
3.
Population invariance: The transformation must be the same regardless of the
group of examinees from which it was derived.
4.
Symmetry: The transformation must be invertible, that is, the mapping of scores
from Form X to Form Y must be the same as the mapping of scores from
Form
Y to Form X. (Petersen et al., 1989)
Parallel Forms and
Equating Methods
When an exam program has multiple
test forms it is critical that they be assembled to be parallel to one another.
Two or more forms of an exam are considered parallel when they have been
developed to be as similar to one another as possible in terms of the test
specifications and statistical criteria. The primary reason for having multiple
exam forms is to improve test security. In the simplest case, an exam program
may have two test forms. One form may be currently used in regular
administrations and the second form may be available for examinees who are
restesting. Alternatively, the second form may be held in reserve, to be
available if the security of the first form is breached. High-stakes exam
programs with greater security concerns may have multiple forms in use at every
test administration. To obtain even greater test security, some exam programs
use new test forms at every administration.
Even when every effort is made to
develop parallel forms, some differences in the statistical characteristics
between the test forms can still be expected. The statistical method used to
resolve these test form differences is called equating. A test form is
statistically equated to another test form to make the resulting test scores
directly comparable. In order to conduct an equating, data must be collected
about how the test forms differ statistically. That is, information is needed
to determine whether differences in the two groups of test scores are caused by
a difference in the proficiency of the two examinee groups or by a difference
in the average difficulty in the two tests. Two of the most common data
collection designs that are used for equating are the random groups design and
the common-item nonequivalent groups design.
In the random groups design, two (or
more) test forms are given at a single test administration: the test forms are
distributed across examinees through a spiraled process. For example, Form A
may be given to the first examinee, Form B to the second examinee, Form A to
the third examinee, and so on. When the random assignment of test forms to
examinees is used, the two examinee groups can be considered equivalent in
proficiency. Any statistical differences across the two groups on the two test
forms can be interpreted as a difference in the test forms. For example, if the
group of examinees who took Form A overall performed better than the group of
examinees who took Form B, you can probably assume that Form A is easier than
Form B.
References
Angoff, W. H.
(1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational
measurement (2nd ed.).
Washington, DC: American Council on Education.
Petersen, N. S.,
Kolen, M. J., & Hoover, H.D. (1989). Scaling, norming, and equating. In R.
L. Linn
(Ed.), Educational measurement (3rd ed., pp. 221–262). Washington, DC: American Council on Education.
http://www.proftesting.com/test_topics/steps_7.php
Tidak ada komentar:
Posting Komentar