I.G.A. Lokita Purnamika Utami & Rina Sari
Item Response Theory relates student’s ability and item characteristics to the probability
of obtaining a particular score on an item.
IRT models depend on item
and person parameters. Item and person parameters have to be estimated. The end products are best estimates of the item
parameters and person ability estimates.
Item parameters are, discrimination (a) and difficulty
parameters (b) and the ‘guess ability’ (c). Person parameters are the ability estimates,
for example, represent a
person's intelligence or the strength of an attitude.
Data Sample
Design
In CTT (Classical Test
Theory), the item parameters depend on population so that the data
should be randomly sampled
from population. Unlike CTT, because of the invariance property the
sample in IRT theoretically
does not need to be a random sample from the population. Thus, a non random sample
could be used.
One limitation to the sampling is that the examinee
sample does need to span the
range of item difficulties for accurate estimation (calibration) of the item
parameters.
Data Requirement
Large samples of examinee are required to
accurately estimate the item
parameter.
Longer test provides more acurrate ϴ (ability) estimate. IRT models do not require any
assumption of normal distribution (neither normally distributed examinee’s
ability nor normally distributed item parameter. Better quality item will be more discriminating and
more useful in estimating ϴ. Increasing the examinee size and
increasing the parameters accuracy can increase the precision of ϴ
estimation.
Assumption #1 Unidimensionality
Unidimensional test is a
test which consists of items which measure only one dimension/trait
or ability. This means that the item shares a common primary construct. The test item should demonstrate empirical evidence of
sound construct validity, in that they merely assess a single trait or ability. Violating this assumption may lead to misestimation of parameters or standard errors. However, even though conceptually
this unidemensionality is highly desired ideally, this is never absolutely
satisfied because in on test performance there are numerous
interacting factors that contribute to performance.Thus, suffice to say that unidemensional trait is the
existence of one dominant factor to play most in the performance.
Methods of testing
unidemensionality
Three common
unidimensionality are:
•
analysis
of the eigenvalues of the inter-item correlation matrix,
•
Stout’s
test of essential unidimensionality, and indices based on the residuals from a
unidimensional solution (de Mars, 2010)
Assumption #2 : Local Independence
Local independence assumes
that item responses are independent given a subject’s constant ability. The response to one item is independent of and does
not influence the probability of responding correctly to another item (after
controlling for ability). With the test taker’s ability kept constant, a test
taker’s score in responding to an item is the quality derivable from answering
that item only and does not depend on the score other than that.
Example of item which
violates of local dependence:
•
One
item builds on the answer of previous item.
•
When
items are grouped around a reading passage or a common scenario that provide
context for all of the items.
Assumption #3 Correct Model
Specification or Parameter Invariance
Parameter invariance can
relate with the item parameters such as discrimination (a) and difficulty
parameters (b) and the ‘guess ability’/chance-level parameter (c), as well as
ability parameter symbolized by ϴ (theta). The
assumption states the item parameter do not undergo changes even though the
test takers accomplishing the items are different in their ability. The characteristics of test takers remain the same
although they answer test items with different values of item parameters.
Assumption #4: montonicity
A
more able person has higher probability of responding correctly to an item than
person with lower ability.
Estimation of Parameters
Probability constitutes an important measure that
indicates probability of a correct response for test takers with a given
ability expressed as P (ϴ).This
measure represents a central role in different models in IRT meanwhile the
usefulness in the interpretation of scores will depend on the model employed.
Therefore, model selection becomes an important aspect in the application of
IRT.
Parameter Logistic Model
The data follow the model used in the analysis,
such as the 1PL, 2PL, 3PL models:
One-parameter logistic model: 1PLà
(1PL) assumes that guessing is a part of the ability and that all items that
fit the model have equivalent discriminations, so that items are
only described by a single parameter (ability). (Rasch model)This item
parameter is the item difficulty (b)
Two-parameter
logistic model: 2PLà assumes that the data have
no guessing, but that items can vary in terms of location/ability and
discrimination. (Birnbaum model)
Item difficulty (b) and item discrimination (a) are
considered.
Three-parameter logistic model: 3PL à ability, discrimination, guessing . Item difficulty (b), item
discrimination (a) and lower asymptote chance-level (c) are considered essential factors in probability function. Item parameter (c)
accomodates a chance-level for test takers with low abilities may speculatively
respond correctly to a difficult item.
Instrument Development Based on IRT
Four
of the questions should be considered together:
•
What is the spread of item difficulties (and category difficulties, for polytomous items)? The item difficulties can
be used to judge whether the test or survey items are targeted to the level of y where measurement precision is desired.The values of the item parameters can be used to examine this question. To assess the item difficulty, the
b-parameters can be inspected. For the 1PL and
2PL models, the b-parameter is the location at which an examinee has a 50% probability of answering the item correctly, or of endorsing the item. For
the 3PL model, the probability is slightly
higher than 50%. For all three dichotomous models, the b-parameter is the location at which
the probability is changing
most rapidly.
•
How discriminating is each item? The a-parameter tells how steep the item slope is,
or how rapidly the probability is changing
at the item difficulty level. For the 1PL and PC models, the discrimination is assumed to be
the same for all items within a test.
For the 2PL, 3PL models, the items within a test have varying
a-parameters.
• What is the distribution of
abilities/traits in this group of examinees/respondents? How
does the ability distribution compare to the item
difficulty distribution?
The metric of the measurements must be defined by choosing a center point and a unit size. If the
items have not been calibrated (the item
parameters have not been estimated) before, then typically, in
the calibration sample, the metric is set by defining the y distribution to have a mean of 0 and
standard deviation of 1. If the calibration sample
is a meaningful norming sample, this metric will probably be used later to estimate the ϴs of new examinees to a constant metric. Otherwise, it
would be possible to redefine the metric
later, but all of the item parameter and ϴ estimates for the original sample would need to be
transformed to the new metric.
•
How does each item contribute to the test information? Item information depends on the item parameters.
For dichotomous items, within an item, the information reaches its highest value at or near where ϴ = b. The item information is more peaked when the a-parameter is high and flatter when the a-parameter is low. A good test item is the one that can provide
maximum information from the point of view of the standard error that the item
yields. Function of item information is known from the value of the proportion
of standard error produced by an item.
References
DeMars, C. 2010. Item Response Theory.
Oxford: Oxford University Press.
Sulistyo, G. H. 2015. Assessment at Schools: An Introduction to Its
Basic Concepts and Principles. Malang: CV. Bintang Sejahtera.
Tidak ada komentar:
Posting Komentar