Stages of Test Development
by:
I.G.A Lokita
Purnamika Utami
Rina Sari
Test development is
best carried out by a team. It is difficult to develop a test by individual,
especially in the stage of item writing. When fault in item writing is obvious
for others it could be invisible for the writer. There are some qualities
necessary possessed by an item writer, one of them is the willingness to accept
justified criticism. Other qualities for item writer or test developer are
native or near native command of the language, intelligence and imagination (to create context for
an item and foresee possible misinterpretation).
1.
Stating
the problem
It is the stage where test developer
should be clear about: what kind of test is it to be, what the purpose, what
abilities to be tested,
how detailed the result will be, how important backwash is, and what constrains
are set by unavailability of expertise, facilities, time.
2.
Writing
specification for the test
Test specification means
information on content, test structure, timing, medium/channel, technique to be
used, criteria levels of performance, and scoring procedure.
a.
Content
Content should be as fully
specified as possible. The
following is a possible framework of describing content of a test:
-
Operation: the task to be carried out.
-
Types of text: for a writing test this
may include letter forms, academic essay.
-
Addressees of texts: the kind of people
the candidate is expected to be able to write to or to speak to.
-
Length of test: for reading test, this
could be the length of the passage.
-
Topics
-
Readability
-
Structural rage: list of structure which
may occur in text, or should be excluded from the text or general indication of
a range of structure
-
Vocabulary range
-
Dialect, accent, style: dialect the test
taker should understand. Style may be formal or informal.
-
Speed of processing: in reading test is
reading speed. In speaking test it could be rate of speech.
b.
Structure,
timing, medium/channel and techniques
-
Test structure: sections in the test and
what things to be tested in each section
-
Number of item
-
Number of passage
-
Medium/channel: paper and pencil test,
tape, computer, face to face, telephone, etc.
-
Time: for each section or the entire
test
-
Techniques: techniques to measure the
skill and subskills
c.
Criterial
Level of performance
Required level performance for different level of success
should be specified. For example, to demonstrate ‘mastery’, 80% of the item
must be responded correctly. However,
for speaking and writing,
one can expect the criteria level to be much more complex.
d.
Scoring
procedure
Test developer
should be sure as to how they will achieve high reliability and validity in scoring,
especially for subjective scoring. This include considering what rating scale
to be used, how many raters will be employed and what the consequences of the
disagreement of the raters on a piece of work.
3.
Writing
and moderating items
Here are the procedures:
-
Sampling: text samples will be chosen as
wide a range of topics and types of writing as is compatible with the
specification.
-
Writing item: test developer should be
able to anticipate
possible misinterpretation. Items writer cannot be expected to be able to
produce consistent perfect items. Some items will have to be rejected or
reworked. The process of moderation is the best way to identify items that have
to be improved.
-
Moderating items: moderation is the
scrutiny of proposed items by at least two colleagues to see the weakness of
the items. A checklist is useful to moderators of a test.
4.
Informal
trialling of items on native speakers
The test should be tried out to
some native speakers, about twenty or so. This native speakers should be similar,
in terms of age, education and general background to the intended test takers.
This does not need to be conducted formally. The native speakers can do the
test in their own time. Items
which are difficult for them certainly need revision or
replacement. So do items with unexpected or inappropriate response.
5.
Trialling
of the test on a group of non-native speakers similar to those for whom the
test is intended
Those items that have
survived moderation and trialling on native speakers, should be put together
into a test and tried out to a group of non-native speakers similar to those
for whom the test is intended. Problems in administration and scoring are
noted.
6.
Analysis
of result of the trial: making of any necessary changes
There are two kinds of analysis:
statistical and qualitative. Statistical analysis will show qualities of the
test as a whole
and of individual item ( how difficult they are, how well they discriminate
between stronger and weaker candidate). The qualitative analysis is based on
the examination of the responses to see misinterpretation or unanticipated but
possible correct responses. Items which
are
proven to be faulty should be dropped or modified.
7.
Collaboration
of scales
Where rating scales are going to be
used for oral or writing testing, these should be calibrated. This means
collecting samples of performance which cover the full range of the scales. The
experts team then look
at these samples and to see a point on the relevant sample. These samples
provide referenced points for future use or training materials
8.
Validation
The final version of the test can
be validated.
9.
Writing
handbooks for test takers, test users and staff
The content of the handbook may be
expected to contain some points: the rationale of the test, how test was
developed and validated, a description of the test, sample item, advice on
preparing for taking the test, an explanation on how test scores are to be
interpreted, training materials, and details of test administration.
10. Training staff
Using the handbook, all staff
should be trained. These people may include interviewers, raters, scorers,
computer operators and invigilators.
Reference
Hughes, A. 2003. Testing for Language Teachers. Cambridge: Cambridge
University Press.
Tidak ada komentar:
Posting Komentar