Senin, 12 Januari 2015

Summary of Standard Stages in Assessment Instrument Development by Marwa & Erlik, W. S.

1.      MARWA
2.      ERLIK WIDIYANI STYATI

Assessment Instrument Development

Summary of Standard Stages in Assessment Instrument Development
There are several standard stages of test development. The first essential stage is stating the problem that is to make oneself perfectly clear about what it is one wants to know and for what purpose, for examples, what kind of test is it to be?, what is its precise purpose?, what abilities are to be tested? Etc. if the problem is clear, steps can be taken to solve it.

The second step is writing specifications for the test. A set of specification for the test must be written at the outset. This will include information on (1) content which refers to the entire potential content of any number of versions of the test. (2) structure, timing, medium/channel, and techniques which includes the specific test structure, number of items, number of passages, medium/channel, timing and techniques that will be used to measure skills or sub skills. (3) Criteria levels of performance - the required level of performance for different levels of success should be specified. This may involve a simple statement to the effect that, to demonstrate ‘mastery’, 80 percent of the items must be responded to correctly (4) scoring procedures, in this case the test developers should be clear as to how they will achieve high reliability and validity in scoring.
The third stage is writing and moderating items. Once specifications are in place, the writing of items can begin. (1) Sampling, in this case it is most unlikely that everything found under the heading of ‘Content” in the specifications can be covered by the items in any one version of the test. Choices have to be made. For content validity and for beneficial backwash, the important thing is to choose widely from the whole area of content. Succeeding version of the test should also sample widely and unpredictably. (2) Writing items should always be written with the specifications in mind. As one writes an item, it is essential to try to look at it through the eyes of test takers. The writing of successful items is extremely difficult. The best way to identify items that have to be improved or abandoned is through the process of moderation. (3) Moderating items, this is the scrutiny of proposed items by ideally at least two colleagues, neither of whom is the author of the items being examined. Their task is to try to find weaknesses in the item and where possible remedy them. Where the successful modification is not possible, they must reject the item.
The fourth stage is informal trialing of items on native speakers. Items which have been through the process of moderation should be presented in the form of a test (or tests) to a number of native speakers- twenty or more if possible. The native speakers should be similar to the people for whom the test is being developed, in terms of age, education and general background. Items that prove difficult for the native speakers almost certainly need revision or replacement.
The fifth stage is trialing of the test on a group of non-native speakers similar to those for whom the test is intended. The items that have survived moderation and informal trialing on native speakers should be put together into a test, which is then administered under test conditions to a group similar to that for which the test is intended. Problems in administration and scoring are noted. For a number of reasons, trialing of this kind is often not feasible. In some situations a group for trialing may simply not be available. In other situations, although a suitable group exists, it may be thought that the security of the test might be put at risk. Thus, it is worthwhile noting problems that become apparent during administration and scoring, and afterwards carrying out statistical analysis.
The sixth stages is analysis of results of the trial; making of any necessary changes. There are two kinds of analysis that should be carried out. The first is statistical analysis that reveals qualities such as reliability of the test as a whole and of individual items (for example, how difficult they are, how well they discriminate between stronger and weaker candidates). The second analysis is qualitative. Responses should be examined in order to discover misinterpretations, unanticipated but possibly correct responses, and any other indicators of faulty items.    
The seventh stage is calibration of scales. Where rating scales are used for oral testing or testing of writing, these should be calibrated. This means collecting samples of performance (for example pieces of writing) which cover the full range of the scales. A team of experts then look at these samples and assign each of them to a point on the relevant scale. The assigned samples provide reference points for all future uses of the scale as well as being necessary training materials.
The eighth stage is validation. For a high stakes, or published test, this should be regarded as essential. For relatively low stakes test that are to be used within an institution, this may not be thought necessary, although the test is likely to be used many times over a period of time, informal, small-scale validation is still desirable.
The ninth stage is writing handbooks for test takers, test users and staff. Handbooks (each with rather different content, depending audience) may be expected to contain the rationale for the test, an account of how the test was developed and validated, a description of the test, sample items, advice on preparing for taking the test, an explanation of how test scores are to be interpreted, training materials and details of test administration.
The last stage is training staff. Using the handbook and other materials, all staff who will be involved in the test process should be trained. This may include interviewers, raters, scorers, computer operators and invigilators (proctors).
Reference
Hughes, A. 2003. Testing for Language Teachers. Cambridge: Cambridge University Press.



Tidak ada komentar:

Posting Komentar