Selasa, 28 April 2015



A Summary on the Introduction to Item Response Theory
By: Agus Eko Cahyono and Jumariati

In the practices of equating test forms, some methods are used such as the Item-Response Theory (IRT) and the Classical Test Theory (CTT) methods. The CTT is a theory about test scores that introduces three concepts-test score (often called the observed score), true score, and error score. Within that theoretical framework, models of various forms have been formulated. For example, in what is often referred to as the "classical test model," a simple linear model is postulated linking the observable test score (X) to the sum of two unobservable (or often called latent) variables, true score (T) and error score (E), that is, X = T + E. Because for each examinee there are two unknowns in the equation, the equation is not solvable unless some simplifying assumptions are made. The assumptions in the classical test model are that (a) true scores and error scores are uncorrelated, (b) the average error score in the population of examinees is zero, and (c) error scores on parallel tests are uncorrelated. In this formulation, where error scores are defined, true score is the difference between test score and error score.

The IRT is a general statistical theory about examinee item and test performance and how performance relates to the abilities that are measured by the items in the test. Item responses can be discrete or continuous and can be dichotomously or polychotomously scored; item score categories can be ordered or unordered; there can be one ability or many abilities underlying test performance; and there are many ways (i.e., models) in which the relationship between item responses and the underlying ability or abilities can be specified. Within the general IRT framework, many models have been formulated and applied to real test data.
Difficulty is defined in both CTT and IRT in terms of the likelihood of correct response, not in terms of the perceived difficulty or amount of effort required. In CTT, the difficulty index, P, is the proportion of examinees who answer the item correctly (sometimes P is called the P-value, but this terminology will be avoided here because it is easily confused with the p-value, or probability value, used in statistical hypothesis testing, which has an entirely different meaning). For polytomous items, the item difficulty is the mean score. So, a more difficult item has a lower difficulty index in CTT. In IRT, the difficulty index, b, is on the same metric as the proficiencies or traits. This metric is arbitrary, but often it is anchored such that the proficiency distribution in a designated group has a mean of 0 and standard deviation of 1.
In contrast to CTT, difficult items have higher difficulty indices. A higher discrimination means that the item differentiates (discriminates) between examinees with different levels of the construct. Thus, high discrimination is desirable. The purpose of using the instrument is to differentiate between examinees who know the material tested and those who do not, or on an attitude scale, between those who have positive attitudes and those who have negative attitudes. In CTT, the corrected item total point-biserial correlation is the typical index of discrimination; when this is positive, examinees who answer the item correctly (or endorse the item) score higher on the sum of the remaining items than do those who answer the item incorrectly (or disagree with the item). In IRT, an index symbolized as a is a measure of the item discrimination. This index is sometimes called the slope, because it indicates how steeply the probability of correct response changes as the proficiency or trait increases. In both CTT and IRT, higher values indicate greater discrimination.

References:
Baker, F.B. 2001. The Basics of Item Response Theory. ERIC Clearinghouse on Assessment
and Evaluation.

DeMars, C. 2010. Item Response Theory: Understanding Statistics Measurement. New York:
Oxford University Press.

Hambleton, R.K. & Jones, R.W. 1993. Comparison of Classical test Theory and Item
Response Theory and their Applications to Test Development. Educational
Measurements: Issues and Practice. 



Tidak ada komentar:

Posting Komentar