Minggu, 01 Februari 2015

Quality Assurance on Internal Attributes of a Good Assessment Language Device: Reliability, Validity, and Classical Item Analysis By: Agus Eko Cahyono and Jumariati



Quality Assurance on Internal Attributes of a Good Assessment Language Device: Reliability, Validity, and Classical Item Analysis
By: Agus Eko Cahyono and Jumariati

An assessment language device is said to be good provided that it has met these attributes: reliability and validity. Reliability of a test is achieved if the test results are consistent and dependable in its conditions across two or more administrations. Heaton (1988) and Brown and Abeywickrama (2010) mention that reliability of a test can be determined by the student, scoring, test administration, and test itself. Student-related reliability deals with the conditions of the student taking the test. The student’s fatigue, anxiety, motivation, and other physical and psychological factors can hinder the student in performing his true ability in the test. Therefore, teachers need to consider the condition of the students before administering a test. Rater-reliability deals with the consistency of the scores given by one rater (intra-rater reliability) or two or more raters (inter-rater reliability). This is especially difficult to score subjective tests like essay writing wherein the rater may feel fatigue in scoring and thus may reduce the reliability. It is suggested that the rater reads through the essays and recycles back through to arrive at a good judgment. When there are two or more raters involve and the scores are quite different, probably the scoring criteria need to be revised. Hughes (2003) suggests that training to raters is necessary to have similar interpretation regarding the scoring criteria. Test-administration reliability is determined by the condition of the room where the test is administered, the seating position, the room temperature, and the quality of the test sheet or test audio. Therefore, teachers should carefully prepare a good room for the test and provide clear audio or copies of the test sheets. Finally, test-reliability which directly relates to the test itself: the clear instruction, the unambiguous item, and balanced item numbers with the time allotted for the test. These factors may help increasing the reliability of a test.

  Validity is also important to determine the quality of a test. It is the extent to which a test measures what it is supposed to measure. Brown and Abeywickrama (2010:30) propose that a valid test measures exactly what it proposes to measure which relies on test-taker’s performance and offers meaningful information about the test-taker’s ability. There are several types of validity evidence. First, content validity deals with the content of the test that should cover the materials taught or the instructional objectives. It also deals with the direct testing requiring direct skill students to perform for instance a writing test which directly asks students to produce a piece of writing. Second, construct validity in which a test contains the concepts or theories that students must perform. For example a test of speaking ability requires students to use English orally regarding the fluency, intonation, and pronunciation since the constructs of speaking performance deal with those elements. Third, face validity is the extent to which a test looks appropriate to measure students’ knowledge or abilities based on the subjective judgment of the students as the test-takers. Fourth, empirical validity is the validity that can be achieved by comparing the test result with other tests’ results for instance the result from another existing and valid test which is also known as concurrent validity. Another empirical validity evidence is obtained by comparing the test results with the result of teachers’ ratings given later. It is also called as predictive validity. The last is consequential validity. It is the degree of which all consequences produced as the impacts of a test like its accuracy in measuring the intended criteria, its effect on the preparation of test-takers, and test’ interpretation and use.
A classical item analysis is also done to ensure a test’s quality covering the analysis of item difficulty and item discrimination. In addition, a multiple-choice test needs an analysis on distracter items. Analysis on the item difficulty is aimed at finding out how easy or how difficult each item is for the test-taker. It is done by finding how many students can answer each item correctly. The total students whose answer is correct is divided by the total students taking the test resulting in a value of difficulty level. Meanwhile, the analysis on item discrimination is aimed at discriminating students who are able to answer correctly from those who are unable. An item which can be answered by all students needs to be revised because a good item can discriminate high achieving students from low achieving ones. Analysis on distracter deals with the similarity of each distracter to the correct answer which can distract students from choosing the correct alternative answer. If nobody is distracted by the alternative answer, it means the distracter power is low.
In conclusion, there are three factors of assuring the test’s quality: reliability, validity, and item analysis. Test-developers should take these three factors into consideration so that the tests that they develop are really meaningful.      

References:
Brown, H.D. & Abeywickrama, P. 2010. Language Assessment: Principles and Classroom
Practices. Second Edition. White Plains: Pearson Education, Inc.

Heaton, J.B. 1988. Writing English Language Tests. New York: Longman Inc.
Hughes, A. 2003. Testing for Language Teachers. Cambridge: Cambridge University Press.
       

Tidak ada komentar:

Posting Komentar