Senin, 04 Mei 2015

Introduction to Item Response Theory (Comparison between CTT and IRT)

By:
Marwa & Erlik Widiyani Styati
Classical test theory (CTT and item response theory (IRT) are widely perceived as representing two very different measurement frameworks. There have been a brief review of related theories. Additional detail is provided elsewhere (Crocker & Algina, 1986; McKinley & Mills, 1989).

            Although CTT has served the measurement community for most of this century, IRT has witnessed an exponential growth in recent decades. The major advantages of CTT are its relatively weak theoretical assumptions, which make CTT easy to apply in many testing situations (Hambleton & Jones, 1993). Relatively weak theoretical assumptions not only characterize CTT but also its extensions (e.g., generalizability theory). Although CTT’s major focus is on test-level information, item statistics (i.e., item difficulty and item discrimination) are also an important part of the CTT model.
            At the item level, the CTT model is relatively simple. CTT does not invoke a complex theoretical model to relate an examinee’s ability to success on a particular item. Instead, CTT collectively considers a pool of examinees and empirically examines their success rate on an item (assuming it is dichotomously scored). This success rate of a particular pool of examinees on an item, well known as the p value of the item, is used as the index for the item difficulty (actually, it is an inverse indicator of item difficulty, with higher value indicating an easier item). The ability of an item to discriminate between higher ability examinees and lower ability examinees is known as item discrimination, which is often expressed statistically as the Pearson product-moment correlation coefficient between the scores on the item (e.g., 0 and 1 on an item scored right-wrong) and the scores on the total test. When an item is dichotomously scored, this estimate is often computed as a point-biserial correlation coefficient.
            The major limitation of CTT can be summarized as circular dependency: (a) The person statistic (i.e., observed score) is (item) sample dependent, and (b) the item statistics (i.e., item difficulty and item discrimination) are (examinee) sample dependent. This circular dependency poses some theoretical difficulties in CTT’s application in some measurement situations (e.g., test equating, computerized adaptive testing). Despite the theoretical weakness of CTT in terms of its circular dependency of item and person statistics, measurement experts have worked out practical solutions within the framework of CTT for some otherwise difficult measurement problems. For example, test equating can be accomplished empirically within the CTT framework (e.g., equipercentile equating). Similarly, empirical approaches have been proposed to accomplish item-invariant measurement (e.g., Thurstone absolute scaling) (Englehard, 1990). It is fair to say that, to a great extent, although there are some issues that may not have been addressed theoretically within the CTT framework, many have been addressed through ad hoc empirical procedures.
            IRT, on the other hand, is more theory grounded and models the probabilistic distribution of examinees’ success at the item level. As its name indicates, IRT primarily focuses on the item-level information in contrast to the CTT’s primary focus on test-level information. The IRT framework encompasses a group of models, and the applicability of each model in a particular situation depends on the nature of the test items and the viability of different theoretical assumptions about the test items. For test items that are dichotomously scored, there are three IRT models, known as three-, two-, and one-parameter IRT models. Although the one-parameter model is the simplest of the three models, it may be better to start from the most complex, the three-parameter IRT model; the reason for this sequence of discussion will soon become obvious.

REFERENCES
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York:        Holt, Rinehart             & Winston.
Englehard, G., Jr. (1990). Thorndike, Thurstone and Rasch: A comparison of their approaches to item-invariant measurement. Paper presented at the annual meeting of the American         Educational     Research Association, Boston. (ERIC Document Reproduction Services     No. ED 320 921)
McKinley, R., & Mills, C. (1989). Item response theory: Advances in achievement and attitude    measurement. In B. Thompson (Ed.), Advances in social science methodology (Vol. 1,           pp. 71- 135). Greenwich, CT: JAI.

Tidak ada komentar:

Posting Komentar