1.
MARWA
2.
ERLIK WIDIYANI
STYATI
Assessment
Instrument Development
Summary
of Standard Stages in Assessment Instrument Development
There are several
standard stages of test development. The first essential stage is stating the
problem that is to make oneself perfectly clear about what it is one wants to
know and for what purpose, for examples, what kind of test is it to be?, what
is its precise purpose?, what abilities are to be tested? Etc. if the problem
is clear, steps can be taken to solve it.
The second step is writing specifications for the test. A set of specification for the test must be written at the outset. This will include information on (1) content which refers to the entire potential content of any number of versions of the test. (2) structure, timing, medium/channel, and techniques which includes the specific test structure, number of items, number of passages, medium/channel, timing and techniques that will be used to measure skills or sub skills. (3) Criteria levels of performance - the required level of performance for different levels of success should be specified. This may involve a simple statement to the effect that, to demonstrate ‘mastery’, 80 percent of the items must be responded to correctly (4) scoring procedures, in this case the test developers should be clear as to how they will achieve high reliability and validity in scoring.
The third stage is
writing and moderating items. Once specifications are in place, the writing of
items can begin. (1) Sampling, in this case it is most unlikely that everything
found under the heading of ‘Content” in the specifications can be covered by
the items in any one version of the test. Choices have to be made. For content
validity and for beneficial backwash, the important thing is to choose widely
from the whole area of content. Succeeding version of the test should also
sample widely and unpredictably. (2) Writing items should always be written
with the specifications in mind. As one writes an item, it is essential to try
to look at it through the eyes of test takers. The writing of successful items
is extremely difficult. The best way to identify items that have to be improved
or abandoned is through the process of moderation. (3) Moderating items, this
is the scrutiny of proposed items by ideally at least two colleagues, neither
of whom is the author of the items being examined. Their task is to try to find
weaknesses in the item and where possible remedy them. Where the successful
modification is not possible, they must reject the item.
The fourth stage is
informal trialing of items on native speakers. Items which have been through
the process of moderation should be presented in the form of a test (or tests)
to a number of native speakers- twenty or more if possible. The native speakers
should be similar to the people for whom the test is being developed, in terms
of age, education and general background. Items that prove difficult for the
native speakers almost certainly need revision or replacement.
The fifth stage is
trialing of the test on a group of non-native speakers similar to those for
whom the test is intended. The items that have survived moderation and informal
trialing on native speakers should be put together into a test, which is then
administered under test conditions to a group similar to that for which the
test is intended. Problems in administration and scoring are noted. For a
number of reasons, trialing of this kind is often not feasible. In some
situations a group for trialing may simply not be available. In other
situations, although a suitable group exists, it may be thought that the
security of the test might be put at risk. Thus, it is worthwhile noting
problems that become apparent during administration and scoring, and afterwards
carrying out statistical analysis.
The sixth stages is
analysis of results of the trial; making of any necessary changes. There are
two kinds of analysis that should be carried out. The first is statistical
analysis that reveals qualities such as reliability of the test as a whole and
of individual items (for example, how difficult they are, how well they
discriminate between stronger and weaker candidates). The second analysis is
qualitative. Responses should be examined in order to discover
misinterpretations, unanticipated but possibly correct responses, and any other
indicators of faulty items.
The seventh stage is
calibration of scales. Where rating scales are used for oral testing or testing
of writing, these should be calibrated. This means collecting samples of
performance (for example pieces of writing) which cover the full range of the
scales. A team of experts then look at these samples and assign each of them to
a point on the relevant scale. The assigned samples provide reference points
for all future uses of the scale as well as being necessary training materials.
The eighth stage is
validation. For a high stakes, or published test, this should be regarded as
essential. For relatively low stakes test that are to be used within an
institution, this may not be thought necessary, although the test is likely to
be used many times over a period of time, informal, small-scale validation is
still desirable.
The ninth stage is
writing handbooks for test takers, test users and staff. Handbooks (each with
rather different content, depending audience) may be expected to contain the
rationale for the test, an account of how the test was developed and validated,
a description of the test, sample items, advice on preparing for taking the
test, an explanation of how test scores are to be interpreted, training
materials and details of test administration.
The last stage is
training staff. Using the handbook and other materials, all staff who will be
involved in the test process should be trained. This may include interviewers,
raters, scorers, computer operators and invigilators (proctors).
Reference
Hughes, A. 2003. Testing
for Language Teachers. Cambridge: Cambridge University Press.
Tidak ada komentar:
Posting Komentar