Jumat, 01 Mei 2015

Requirements, Assumptions, Estimation of Parameters as well as Instrument Development Based on IRT (Item Response Theory)

Requirements, Assumptions, Estimation of Parameters
as well as Instrument Development Based on IRT (Item Response Theory)

I.G.A. Lokita Purnamika Utami & Rina Sari

Item Response Theory relates student’s ability and item characteristics to the probability of obtaining a particular score on an item. IRT models depend on item and person parameters. Item and person parameters have to be estimated. The end products are best estimates of the item parameters and person ability estimates.
 Item parameters are, discrimination (a) and difficulty parameters (b) and the ‘guess ability’ (c). Person parameters are the ability estimates, for example, represent a person's intelligence or the strength of an attitude.


Data Sample Design
In CTT (Classical Test Theory), the item parameters depend on population so that the data
should be randomly sampled from population. Unlike CTT, because of the invariance property the sample in IRT theoretically does not need to be a random sample from the population. Thus, a non random sample could be used. One limitation to the sampling is that the examinee sample does need to span the range of item difficulties for accurate estimation (calibration) of the item parameters.
Data Requirement
Large samples of examinee are required to accurately estimate the item parameter. Longer test provides more acurrate ϴ (ability) estimate. IRT models do not require any assumption of normal distribution (neither normally distributed examinee’s ability nor normally distributed item parameter. Better quality item will be more discriminating and more useful in estimating ϴ. Increasing the examinee size and increasing the parameters accuracy can increase the precision of ϴ estimation.
Assumption #1 Unidimensionality
Unidimensional test is a test which consists of items which measure only one dimension/trait
or ability. This means that the item shares a common primary construct. The test item should demonstrate empirical evidence of sound construct validity, in that they merely assess a single trait or ability. Violating this assumption may lead to misestimation of parameters or standard errors. However, even though conceptually this unidemensionality is highly desired ideally, this is never absolutely satisfied because in on test performance there are numerous interacting factors that contribute to performance.Thus, suffice to say that unidemensional trait is the existence of one dominant factor to play most in the performance.
Methods of testing unidemensionality
Three common unidimensionality are:
         analysis of the eigenvalues of the inter-item correlation matrix,
         Stout’s test of essential unidimensionality, and indices based on the residuals from a unidimensional solution  (de Mars, 2010)
Assumption #2 : Local Independence
Local independence assumes that item responses are independent given a subject’s constant ability. The response to one item is independent of and does not influence the probability of responding correctly to another item (after controlling for ability). With the test taker’s ability kept constant, a test taker’s score in responding to an item is the quality derivable from answering that item only and does not depend on the score other than that.
Example of item which violates of local dependence:
         One item builds on the answer of previous item.
         When items are grouped around a reading passage or a common scenario that provide context for all of the items.
Assumption #3 Correct Model Specification or Parameter Invariance
Parameter invariance can relate with the item parameters such as discrimination (a) and difficulty parameters (b) and the ‘guess ability’/chance-level parameter (c), as well as ability parameter symbolized by ϴ (theta). The assumption states the item parameter do not undergo changes even though the test takers accomplishing the items are different in their ability. The characteristics of test takers remain the same although they answer test items with different values of item parameters.
Assumption #4: montonicity
A more able person has higher probability of responding correctly to an item than person with lower ability.
Estimation of Parameters
Probability constitutes an important measure that indicates probability of a correct response for test takers with a given ability expressed as P (ϴ).This measure represents a central role in different models in IRT meanwhile the usefulness in the interpretation of scores will depend on the model employed. Therefore, model selection becomes an important aspect in the application of IRT.
Parameter Logistic Model
The data follow the model used in the analysis, such as the 1PL, 2PL, 3PL models:
One-parameter logistic model: 1PLà (1PL) assumes that guessing is a part of the ability and that all items that fit the model have equivalent discriminations, so that items are only described by a single parameter (ability). (Rasch model)This item parameter is the item difficulty (b)
Two-parameter logistic model: 2PLà assumes that the data have no guessing, but that items can vary in terms of location/ability and discrimination. (Birnbaum model)
Item difficulty (b) and item discrimination (a) are considered.
Three-parameter logistic model: 3PL à ability, discrimination, guessing . Item difficulty (b), item discrimination (a) and lower asymptote chance-level (c) are considered essential factors in probability function. Item parameter (c) accomodates a chance-level for test takers with low abilities may speculatively respond correctly to a difficult item.
Instrument Development Based on IRT
Four of the questions should be considered together:
• What is the spread of item difficulties (and category difficulties, for polytomous items)? The item difficulties can be used to judge whether the test or survey items are targeted to the level of y where measurement precision is desired.The values of the item parameters can be used to examine this question. To assess the item difficulty, the b-parameters can be inspected. For the 1PL and 2PL models, the b-parameter is the location at which an examinee has a 50% probability of answering the item correctly, or of endorsing the item. For the 3PL model, the probability is slightly higher than 50%. For all three dichotomous models, the b-parameter is the location at which the probability is changing most rapidly.
• How discriminating is each item? The a-parameter tells how steep the item slope is, or how rapidly the probability is changing at the item difficulty level. For the 1PL and PC models, the discrimination is assumed to be the same for all items within a test. For the 2PL, 3PL models, the items within a test have varying a-parameters.
• What is the distribution of abilities/traits in this group of examinees/respondents? How does the ability distribution compare to the item difficulty distribution?
The metric of the measurements must be defined by choosing a center point and a unit size. If the items have not been calibrated (the item parameters have not been estimated) before, then typically, in the calibration sample, the metric is set by defining the y distribution to have a mean of 0 and standard deviation of 1. If the calibration sample is a meaningful norming sample, this metric will probably be used later to estimate the ϴs of new examinees to a constant metric. Otherwise, it would be possible to redefine the metric later, but all of the item parameter and ϴ estimates for the original sample would need to be transformed to the new metric.
• How does each item contribute to the test information? Item information depends on the item parameters. For dichotomous items, within an item, the information reaches its highest value at or near where ϴ = b. The item information is more peaked when the a-parameter is high and flatter when the a-parameter is low. A good test item is the one that can provide maximum information from the point of view of the standard error that the item yields. Function of item information is known from the value of the proportion of standard error produced by an item.

References
DeMars, C. 2010. Item Response Theory. Oxford: Oxford University Press. 
Sulistyo, G. H. 2015. Assessment at Schools: An Introduction to Its Basic Concepts and Principles. Malang: CV. Bintang Sejahtera.



































Tidak ada komentar:

Posting Komentar