A
Summary on the Introduction to Item Response Theory
By: Agus Eko Cahyono and Jumariati
In the practices of equating test forms,
some methods are used such as the Item-Response Theory (IRT) and the Classical
Test Theory (CTT) methods. The CTT is a theory about test
scores that introduces three concepts-test score (often called the observed score),
true score, and error score. Within that theoretical framework, models of various
forms have been formulated. For example, in what is often referred to as the
"classical test model," a simple linear model is postulated linking
the observable test score (X) to the sum of two unobservable (or often called
latent) variables, true score (T) and error score (E), that
is, X = T + E. Because for each examinee there are two unknowns
in the equation, the equation is not solvable unless some simplifying
assumptions are made. The assumptions in the classical test model are that (a)
true scores and error scores are uncorrelated, (b) the average error score in
the population of examinees is zero, and (c) error scores on parallel tests are
uncorrelated. In this formulation, where error scores are defined, true score
is the difference between test score and error score.
The IRT is a general statistical theory
about examinee item and test performance and how performance relates to the
abilities that are measured by the items in the test. Item responses can be
discrete or continuous and can be dichotomously or polychotomously scored; item
score categories can be ordered or unordered; there can be one ability or many
abilities underlying test performance; and there are many ways (i.e., models)
in which the relationship between item responses and the underlying ability or
abilities can be specified. Within the general IRT framework, many models have
been formulated and applied to real test data.
Difficulty is defined in both CTT and
IRT in terms of the likelihood of correct response, not in terms of the
perceived difficulty or amount of effort required. In CTT, the difficulty
index, P, is the proportion of examinees who answer the item correctly
(sometimes P is called the P-value, but this terminology will be avoided here
because it is easily confused with the p-value, or probability value, used in
statistical hypothesis testing, which has an entirely different meaning). For
polytomous items, the item difficulty is the mean score. So, a more difficult
item has a lower difficulty index in CTT. In IRT, the difficulty index, b, is
on the same metric as the proficiencies or traits. This metric is arbitrary,
but often it is anchored such that the proficiency distribution in a designated
group has a mean of 0 and standard deviation of 1.
In contrast to CTT, difficult items have
higher difficulty indices. A higher discrimination means that the item
differentiates (discriminates) between examinees with different levels of the
construct. Thus, high discrimination is desirable. The purpose of using the
instrument is to differentiate between examinees who know the material tested
and those who do not, or on an attitude scale, between those who have positive
attitudes and those who have negative attitudes. In CTT, the corrected item total
point-biserial correlation is the typical index of discrimination; when this is
positive, examinees who answer the item correctly (or endorse the item) score
higher on the sum of the remaining items than do those who answer the item
incorrectly (or disagree with the item). In IRT, an index symbolized as a is a measure
of the item discrimination. This index is sometimes called the slope, because
it indicates how steeply the probability of correct response changes as the
proficiency or trait increases. In both CTT and IRT, higher values indicate
greater discrimination.
References:
Baker,
F.B. 2001. The Basics of Item Response
Theory. ERIC Clearinghouse on Assessment
and
Evaluation.
DeMars,
C. 2010. Item Response Theory:
Understanding Statistics Measurement. New York:
Oxford
University Press.
Hambleton,
R.K. & Jones, R.W. 1993. Comparison of Classical test Theory and Item
Response
Theory and their Applications to Test Development. Educational
Measurements:
Issues and Practice.
Tidak ada komentar:
Posting Komentar