Test Versions' Comparability in the Upper-Secondary School Leaving Examinations of English

Author(s):

Martina Hulesova(presenting / submitting)

Conference:

ECER 2016

Network:

Emerging Researchers' Group (for presentation at Emerging Researchers' Conference)

Format:

Poster

Session Information

ERG SES C 02, PechaKucha Poster Session

Poster Session

Time:

2016-08-22

11:00-12:30

Room:

OB-H1.49 (ALE 2)

Chair:

Patrícia Fidalgo

Contribution

The poster presentation outlines a dissertation project that focuses on the issue of assurance of the test versions' comparability in the context of high-stakes exams – upper-secondary school leaving examinations in English as a foreign language.

The research will be realized using real data provided by the Slovak institution NÚCEM. The findings, conclusions and suggestions may be useful for similar centralised exams that arose in Central, East and East-Southern Europe after 1989.

Comparability has been considered as an overarching term and according to the AERA, APA, NCME standards (1999), there are different levels of comparability and interchangeability: comparable forms, equivalent forms and parallel forms (the order indicates the increasing strictness of the criteria for evaluating the comparability).

Test versions' comparability are viewed as one of the key concepts related closely to the fairness and validity of the exams at all stake levels. We follow the view of validity as it was outlined by Messick (1989), i.e. that the key aspects of test validity are “the interpretability, relevance, and utility of scores“, and the decisions, actions, and social consequences based on the test results. Public communication about how comparability has been ensured is vital for the accountability to the stakeholders in every high-stakes testing context, and striving for the comparability itself is a crucial part of the validation process.

The aims of the research are as follows:

- o investigate what comparability means in high- and low-stakes contexts and how it has been dealt with, established and proved by different bodies in the field of language testing (Cambridge ESOL, Goethe Institut, European universities, national testing bodies, etc.);

- to provide theoretical rationale for the development of comparable test versions;

- to investigate what methods and approaches would be suitable for the context of Slovak national high-stakes exams given the existing constraints (e.g. legislation, accountability to the stakeholders and to the public in general) and how to implement them in the processes of test development.

The investigation of test versions' comparability comprises analyses of different aspects of the exam: structural and content equivalence of the construct, psychometric equivalence of the test versions, and structural equivalence of the test-takers´ population.

RQ1:

Can the test versions from 2011 – 2015 be considered equivalent in terms of a) structure of the construct, b) content, and c) psychometric characteristics?

RQ2:

What is the nature of any observed difference and how significant is it for the interpretation of test results?

RQ3:

What is the validity of the interpretation of test results and of the conclusions drawn on the basis of these results?

Method

Several methods will be used and qualitative and quantitative analyses will be combined. For RQ1, the degree of construct equivalence will be judged using confirmatory factor analysis and/or structural equation modelling. Content analysis will be conducted by a panel of expert judges using a descriptive framework based on the CEFR (Council of Europe, 2001) and on the models by Khalifa and Weir (2009) for reading, Buck´s model (2001) for listening and Purpura´s model (2004) for grammatical ability. For the content analysis, the amount of agreement will be calculated using Krippendorff´s Alpha. Psychometric analyses will be conducted, under both the assumptions of Classical Test Theory, and Item Response Theory. For RQ2, descriptive and inferential statistics will be calculated and comparative analyses will be conducted in order to evaluate the statistical significance of the results (e.g. score distributions, descriptive test and item statistics, reliability and generalizability will be compared and their differences will be evaluated). For RQ3, the structure of the populations from the years 2011 – 2015 will be compared and frequencies of different groups and subgroups will be compared and evaluated. In case of any significant differences in population, the research design would have to be adjusted, using sampling of the original populations in order to reach randomised or equivalent (stratified) samples in terms of population characteristics, such as age, geographical characteristics, type of school, etc.

Expected Outcomes

The results and findings should both answer the research questions about the degree of comparability (equivalence, parallelness) of the exams realised in 2011-2015, and provide an insight into the variety of ways comparability can be reached in different testing contexts. Investigation of problems with test versions' comparability may be a useful platform for discussing its potential sources and solutions under specific constraints of a particular context. The outcomes can also show the variety of actions that can be taken and methods that can be used when pursuing test versions' comparability. The importance of a well-defined construct represented by relevant and representative items and tasks will be emphasized as a crucial step for subsequent valid interpretation of results and for the validity of decisions and their social consequences. The results of the research may have beneficial implications for the fairness in high-stakes language testing in Slovakia and potentially also for the field of language testing in general.

References

AERA, APA & NCME (1999). Standards for educational and psychological testing. Bachman, L. F. (2004). Statistical Analyses for Language Assessment. Cambridge: Cambridge University Press. Bachman, L.F. (2012). Justifying the Use of Language Assessments: Linking Interpretations with Consequences. Presented at the International Conference on Language Proficiency Testing in the Less Commonly Taught Languages, Bangkok, Thailand. Retrieved on January, 10, 2015 from http://www.sti.chula.ac.th/conference Bachman, L.F., Davidson, F., Ryan, K., Choi, I. (1995). An Investigation into the Comparability of Two Tests of English as a Foreign Language: the Cambridge TOEFL Comparability Study. Cambridge University Press. Buck, G. (2001). Assessing Listening. Cambridge: Cambridge University Press. Chapelle, C. (1999). Validity in Language Assessment. Annual Review of Applied Linguistics 19, 254-272. Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Cambridge University Press. Chvál. M., Procházková, I. & Straková, J. (2015) Hodnocení výsledků vzdělávání didaktickými testy. Česká školní inspekce. Available from http://www.csicr.cz/cz/Dokumenty/Projektove-vystupy/Hodnoceni-vysledku-vzdelavani-didaktickymi-testy Davies, A. (2008). Ethics, professionalism, rights and codes. In Hornberger, N. H. (Ed.) Encyclopaedia of Language and Education. Springer. Khalifa, H. & Weir, C. (2009). Examining Reading. Cambridge: Cambridge University Press. Livingston, S.A. (2004). Test score equating (without IRT). Educational Testing Service. Messick, S. (1987). Validity. ETS Research Report Series, 1987: i–208. doi: 10.1002/j.2330-8516.1987.tb00244.x Messick, S. (1989). Meaning and Values in Test Validation: The Science and Ethics of Assessment. Educational Researcher 18(2), 5-11. Messick, S. (1993). Foundations of Validity: Meaning and Consequences in Psychological Assessment. ETS Research Report Series, i–18. doi: 10.1002/j.2333-8504.1993.tb01562.x Rapp, J., Allalouf A. 2002. Evaluating Cross-lingual Equating. National Institute for Testing and Evaluation. Presented at the Annual Meeting of AERA, New Orleans. Sireci, S., Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing 2003 20(2), 148–166. DOI: 10.1191/0265532203lt249oa

Author Information

Martina Hulesova (presenting / submitting)

Masaryk University and VTC UJOP Charles University

Prague 4

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.