Issues in the Development of Standardized Proficiency Tests for Academic
Purposes: Language Skills and Components
By: Agus Eko Cahyono and Jumariati
Proficiency tests have been used widely to measure test-takers ability in using English for education purposes like the Test of English as a Foreign Language (TOEFL) and the International English Language Testing System (IELTS). The TOEFL is designed by the Education Testing Service (ETS) with the Headquarter is in Princeton, New Jersey. In its item development, TOEFL had gone through careful construction and revision. Previously, it contained some test-wiseness items as studies found (Allan, 1992; Yang, 2000) which then make the items revised. This is seen as the effort to ensure the validity of the TOEFL.
In the past, TOEFL was influenced by the discrete point approach of testing and thus the test consisted of vocabulary and grammar section. As the testing approach moves to a more communicative one, TOEFL items today assess the test-takers ability in using English to communicate, that is to listen, speak, read, and write. The test types are in the multiple-choice for the listening and reading sections requiring one response based on what has been read and heard. For the writing section, test-taker is to write at least one essay whereas in the speaking section test-taker is to perform at least one speaking task. The construct of the TOEFL clearly measures the four language skills so that this test gains the construct validity. There are two versions of TOEFL: paper-and-pencil based (PBT) and internet-based (iBT).
The score is made based on the correct answer; there is no penalty for the wrong answer. The total correct answer is then converted into the scale that has been made by certified examiners to determine the level of English proficiency. In terms of ensuring the reliability, the scoring of the TOEFL PBT and TOEFL iBT is done using both multiple and well-trained human raters and automated scoring method so that accurate and reliable score can be achieved. Especially with the writing section, automated scoring is used in addition to human raters. The human raters are deal with the content and meaning while the automated scoring focuses on language features as well as the consistency and quality of the scores. By doing so, the reliability of the scores is maintained.
The IELTS (academic and general) is designed by the University of Cambridge. As its name suggests, IELTS academic is for the education purpose while IELTS general test is for general conversation purpose. The raters in the IELTS are retrained and recertified every two years. The scoring of speaking and writing is rated by standardized examiners. The reliability of Listening and Reading tests is reported using Cronbach's alpha which measures the internal consistency of the 40-item test. Whereas the reliability of the Writing and Speaking results, because they are not item-based, are rated by trained and standardized examiners according to detailed descriptive criteria and rating scales. The speaking section measures the fluency, coherence, pronunciation, lexical and grammatical accuracy with the score band from 0-9 like the score for writing. The writing section focuses on the task response, coherence and cohesion, as well as lexical and grammatical range and accuracy. The correct answer is converted into the scale to decide the level of test-takers’ proficiency. The IELTS is also validated, measured, and analyzed using item analysis methods to ensure its validity and reliability (Young, et al., 2013). As the main goal of the IELTS and the TOEFL is for academic purposes, the contents of the test are about academic discourses like campus talks, college administration, lectures, and academic reading passages to achieve the content validity.
To conclude, the standardized tests discussed here are developed following careful analysis in order to meet the validity and reliability. These standardized tests are claimed as reliable because great pains are taken to standardize the testing environment, to ensure inter-rater reliability, and to eliminate test items that do not function properly. These tests are also considered as valid as on-going studies show that each test measures what it is intended to measure.
References:
Young, J.W., So, Y., & Ockey, G.J. 2013. Guidelines for Best Test Development Practices to
Ensure Validity and Fairness for International English Language Proficiency
Assessments. Educational Testing Service. Retrieved on March 11, 2015 at http://www.ets.org/s/about/pdf/best_practices_ensure_validity_fairness_english_language_assessments.pdf
http://www.ecole-de-langues-orleans.com/en/toeic-toefl-bulats/
Tidak ada komentar:
Posting Komentar