Assessment Class B 2014 UM: Parallel Tests & Equating: Theory, Principles, and Practice

By:

(Marwa & Erlik Widiyani Styati)

Definition of Equating

The process of equating is used to obtain comparable scores when more than one test forms are used in a test administration. As Petersen et al. (1989) point out the process of equating “is used to ensure that scores resulting from the administration of the multiple forms can be used interchangeably.” They further argue that equating can be defined as empirical procedures for establishing a relationship between raw scores on two test forms that can then be used to express the scores on one form in terms of the scores on the other forms. Angoff (1971) has defined the equating of tests as a process “to convert the system of units of one form to the system of units of the other” so that the scores obtained from one form could be compared directly with the scores obtained from the other form.

There are several techniques and methodologies that can be used in equating test forms. Generally speaking, these techniques and methodologies can be divided into two major methods, namely classical test theory (including linear equating and equipercentile equating) and item response theory methods (including Rasch model/one parameter logistic model, two parameter logistic model, and three parameter logistic model).

Conditions for Equating Test Forms

As Petersen et al. (1989) point out, equating is necessary because it is almost impossible to construct multiple forms of a test that are completely parallel. Even though the test developers use the same test specifications to write test items and make every effort to write items in one form as similar as possible to the items in another form, there is no guarantee that the difficulty levels of the items will be the same.

Experts in testing have different views on the conditions that must be met in equating. Petersen et al. (1989) suggest that according to Lord (1980), for the scores on test X to be equated with the scores on test Y, the following four conditions must be met.

1. Same underlying trait: The two tests must both be measures of the same characteristics (latent trait, ability, or skill).

2. Equity: For every group of examinees of identical performance level on the underlying trait, the conditional frequency distribution of scores on test Y, after transformation, must be the same as the conditional frequency distribution of scores on test X.

3. Population invariance: The transformation must be the same regardless of the group of examinees from which it was derived.

4. Symmetry: The transformation must be invertible, that is, the mapping of scores from Form X to Form Y must be the same as the mapping of scores from

Form Y to Form X. (Petersen et al., 1989)

Parallel Forms and Equating Methods

When an exam program has multiple test forms it is critical that they be assembled to be parallel to one another. Two or more forms of an exam are considered parallel when they have been developed to be as similar to one another as possible in terms of the test specifications and statistical criteria. The primary reason for having multiple exam forms is to improve test security. In the simplest case, an exam program may have two test forms. One form may be currently used in regular administrations and the second form may be available for examinees who are restesting. Alternatively, the second form may be held in reserve, to be available if the security of the first form is breached. High-stakes exam programs with greater security concerns may have multiple forms in use at every test administration. To obtain even greater test security, some exam programs use new test forms at every administration.

Even when every effort is made to develop parallel forms, some differences in the statistical characteristics between the test forms can still be expected. The statistical method used to resolve these test form differences is called equating. A test form is statistically equated to another test form to make the resulting test scores directly comparable. In order to conduct an equating, data must be collected about how the test forms differ statistically. That is, information is needed to determine whether differences in the two groups of test scores are caused by a difference in the proficiency of the two examinee groups or by a difference in the average difficulty in the two tests. Two of the most common data collection designs that are used for equating are the random groups design and the common-item nonequivalent groups design.

In the random groups design, two (or more) test forms are given at a single test administration: the test forms are distributed across examinees through a spiraled process. For example, Form A may be given to the first examinee, Form B to the second examinee, Form A to the third examinee, and so on. When the random assignment of test forms to examinees is used, the two examinee groups can be considered equivalent in proficiency. Any statistical differences across the two groups on the two test forms can be interpreted as a difference in the test forms. For example, if the group of examinees who took Form A overall performed better than the group of examinees who took Form B, you can probably assume that Form A is easier than Form B.

References

Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). Washington, DC: American Council on Education.

Petersen, N. S., Kolen, M. J., & Hoover, H.D. (1989). Scaling, norming, and equating. In R. L. Linn

(Ed.), Educational measurement (3rd ed., pp. 221–262). Washington, DC: American Council on Education.

http://www.proftesting.com/test_topics/steps_7.php

Assessment Class B 2014 UM

Senin, 04 Mei 2015

Parallel Tests & Equating: Theory, Principles, and Practice

Tidak ada komentar:

Posting Komentar