Session Information
ERG SES C 02, PechaKucha Poster Session
Poster Session
Contribution
The poster presentation outlines a dissertation project that focuses on the issue of assurance of the test versions' comparability in the context of high-stakes exams – upper-secondary school leaving examinations in English as a foreign language.
The research will be realized using real data provided by the Slovak institution NÚCEM. The findings, conclusions and suggestions may be useful for similar centralised exams that arose in Central, East and East-Southern Europe after 1989.
Comparability has been considered as an overarching term and according to the AERA, APA, NCME standards (1999), there are different levels of comparability and interchangeability: comparable forms, equivalent forms and parallel forms (the order indicates the increasing strictness of the criteria for evaluating the comparability).
Test versions' comparability are viewed as one of the key concepts related closely to the fairness and validity of the exams at all stake levels. We follow the view of validity as it was outlined by Messick (1989), i.e. that the key aspects of test validity are “the interpretability, relevance, and utility of scores“, and the decisions, actions, and social consequences based on the test results. Public communication about how comparability has been ensured is vital for the accountability to the stakeholders in every high-stakes testing context, and striving for the comparability itself is a crucial part of the validation process.
The aims of the research are as follows:
- o investigate what comparability means in high- and low-stakes contexts and how it has been dealt with, established and proved by different bodies in the field of language testing (Cambridge ESOL, Goethe Institut, European universities, national testing bodies, etc.);
- to provide theoretical rationale for the development of comparable test versions;
- to investigate what methods and approaches would be suitable for the context of Slovak national high-stakes exams given the existing constraints (e.g. legislation, accountability to the stakeholders and to the public in general) and how to implement them in the processes of test development.
The investigation of test versions' comparability comprises analyses of different aspects of the exam: structural and content equivalence of the construct, psychometric equivalence of the test versions, and structural equivalence of the test-takers´ population.
RQ1:
Can the test versions from 2011 – 2015 be considered equivalent in terms of a) structure of the construct, b) content, and c) psychometric characteristics?
RQ2:
What is the nature of any observed difference and how significant is it for the interpretation of test results?
RQ3:
What is the validity of the interpretation of test results and of the conclusions drawn on the basis of these results?
Method
Expected Outcomes
References
AERA, APA & NCME (1999). Standards for educational and psychological testing. Bachman, L. F. (2004). Statistical Analyses for Language Assessment. Cambridge: Cambridge University Press. Bachman, L.F. (2012). Justifying the Use of Language Assessments: Linking Interpretations with Consequences. Presented at the International Conference on Language Proficiency Testing in the Less Commonly Taught Languages, Bangkok, Thailand. Retrieved on January, 10, 2015 from http://www.sti.chula.ac.th/conference Bachman, L.F., Davidson, F., Ryan, K., Choi, I. (1995). An Investigation into the Comparability of Two Tests of English as a Foreign Language: the Cambridge TOEFL Comparability Study. Cambridge University Press. Buck, G. (2001). Assessing Listening. Cambridge: Cambridge University Press. Chapelle, C. (1999). Validity in Language Assessment. Annual Review of Applied Linguistics 19, 254-272. Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press. Chvál. M., Procházková, I. & Straková, J. (2015) Hodnocení výsledků vzdělávání didaktickými testy. Česká školní inspekce. Available from http://www.csicr.cz/cz/Dokumenty/Projektove-vystupy/Hodnoceni-vysledku-vzdelavani-didaktickymi-testy Davies, A. (2008). Ethics, professionalism, rights and codes. In Hornberger, N. H. (Ed.) Encyclopaedia of Language and Education. Springer. Khalifa, H. & Weir, C. (2009). Examining Reading. Cambridge: Cambridge University Press. Livingston, S.A. (2004). Test score equating (without IRT). Educational Testing Service. Messick, S. (1987). Validity. ETS Research Report Series, 1987: i–208. doi: 10.1002/j.2330-8516.1987.tb00244.x Messick, S. (1989). Meaning and Values in Test Validation: The Science and Ethics of Assessment. Educational Researcher 18(2), 5-11. Messick, S. (1993). Foundations of Validity: Meaning and Consequences in Psychological Assessment. ETS Research Report Series, i–18. doi: 10.1002/j.2333-8504.1993.tb01562.x Rapp, J., Allalouf A. 2002. Evaluating Cross-lingual Equating. National Institute for Testing and Evaluation. Presented at the Annual Meeting of AERA, New Orleans. Sireci, S., Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing 2003 20(2), 148–166. DOI: 10.1191/0265532203lt249oa
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.