By:
Marwa & Erlik
Widiyani Styati
Classical
test theory (CTT and item response theory (IRT) are widely perceived as
representing two very different measurement frameworks. There have been a brief
review of related theories. Additional detail is provided elsewhere (Crocker
& Algina, 1986; McKinley & Mills, 1989).
Although CTT has served the
measurement community for most of this century, IRT has witnessed an
exponential growth in recent decades. The major advantages of CTT are its
relatively weak theoretical assumptions, which make CTT easy to apply in many
testing situations (Hambleton & Jones, 1993). Relatively weak theoretical
assumptions not only characterize CTT but also its extensions (e.g.,
generalizability theory). Although CTT’s major focus is on test-level information,
item statistics (i.e., item difficulty and item discrimination) are also an
important part of the CTT model.
At the item level, the CTT model is
relatively simple. CTT does not invoke a complex theoretical model to relate an
examinee’s ability to success on a particular item. Instead, CTT collectively
considers a pool of examinees and empirically examines their success rate on an
item (assuming it is dichotomously scored). This success rate of a particular
pool of examinees on an item, well known as the p value of the item, is used as
the index for the item difficulty (actually, it is an inverse indicator of item
difficulty, with higher value indicating an easier item). The ability of an
item to discriminate between higher ability examinees and lower ability
examinees is known as item discrimination, which is often expressed statistically
as the Pearson product-moment correlation coefficient between the scores on the
item (e.g., 0 and 1 on an item scored right-wrong) and the scores on the total
test. When an item is dichotomously scored, this estimate is often computed as
a point-biserial correlation coefficient.
The major limitation of CTT can be
summarized as circular dependency: (a) The person statistic (i.e., observed
score) is (item) sample dependent, and (b) the item statistics (i.e., item
difficulty and item discrimination) are (examinee) sample dependent. This
circular dependency poses some theoretical difficulties in CTT’s application in
some measurement situations (e.g., test equating, computerized adaptive
testing). Despite the theoretical weakness of CTT in terms of its circular
dependency of item and person statistics, measurement experts have worked out
practical solutions within the framework of CTT for some otherwise difficult
measurement problems. For example, test equating can be accomplished
empirically within the CTT framework (e.g., equipercentile equating).
Similarly, empirical approaches have been proposed to accomplish item-invariant
measurement (e.g., Thurstone absolute scaling) (Englehard, 1990). It is fair to
say that, to a great extent, although there are some issues that may not have
been addressed theoretically within the CTT framework, many have been addressed
through ad hoc empirical procedures.
IRT, on the other hand, is more
theory grounded and models the probabilistic distribution of examinees’ success
at the item level. As its name indicates, IRT primarily focuses on the
item-level information in contrast to the CTT’s primary focus on test-level
information. The IRT framework encompasses a group of models, and the
applicability of each model in a particular situation depends on the nature of
the test items and the viability of different theoretical assumptions about the
test items. For test items that are dichotomously scored, there are three IRT
models, known as three-, two-, and one-parameter IRT models. Although the
one-parameter model is the simplest of the three models, it may be better to
start from the most complex, the three-parameter IRT model; the reason for this
sequence of discussion will soon become obvious.
REFERENCES
Crocker,
L., & Algina, J. (1986). Introduction
to classical and modern test theory. New York: Holt, Rinehart &
Winston.
Englehard,
G., Jr. (1990). Thorndike, Thurstone and Rasch: A comparison of their
approaches to item-invariant measurement.
Paper presented at the annual meeting of the American Educational Research
Association, Boston. (ERIC Document Reproduction Services No. ED 320 921)
McKinley, R.,
& Mills, C. (1989). Item response theory: Advances in achievement and
attitude measurement. In B. Thompson
(Ed.), Advances in social science
methodology (Vol. 1, pp. 71- 135). Greenwich, CT: JAI.
Tidak ada komentar:
Posting Komentar