09 SES 03 A, Comparing Computer- and Paper-Based-Assessment
To benefit from the possibilities of technology-based testing, an existing paper-based assessment (PBA) needs to be transferred to a computer-based assessment (CBA). In longitudinal studies, such as the National Educational Panel Study (NEPS; Blossfeld, Roßbach, & von Maurice, 2011) in Germany, the comparability of ability estimates measured over time are a fundamental requirement for valid interpretations of change scores and precise comparisons of ability distributions between cohorts. Hence, the replacement of PBA with CBA must be prepared carefully and consequences of the mode change need to be investigated.
Previous research revealed heterogeneous mode effects that are not predictable without empirical investigation (e.g., Wang, 2008). The risk of mode effects differs between domains and is increased with the complexity of items, i.e., it can be supposed that the response format is a possible predictor for mode effects, as it may differ in complexity between modes (e.g., Heerwegh and Loosveldt, 2002). For example, assignment tasks are of higher complexity. They are typically used in reading tests, when the assignment of given headings to paragraphs of the text is required. Assignment tasks can be computerized using so-called combo boxes (or drop-down boxes) and were found to be more difficult than assignment tasks on paper tests (Heerwegh & Loosveldt, 2002). Moreover, previous findings give reason to assume that reading tests are more susceptible for mode effects when scrolling in longer texts and navigation between tasks within a unit are required (e.g., Poggio, Glasnapp, Yang, & Poggio, 2005; Pommerich, 2004).
The ongoing transition from PBA to CBA in the NEPS is accompanied by additional experimental mode effect studies to learn more about whether it makes a difference if one takes a reading test on computer or on paper. For this presentation we are analyzing data of two reading tests (for more details see Gehrer, Zimmermann, Artelt & Weinert, 2013) of different grades (seven and twelve) that were computerized and administered in a between-subject design where students were randomly assigned to modes. In addition, each student completed a common PBA reading test of a lower grade as well as a test for basic computer skills (BCS) used as external criteria to inspect construct equivalence.
To evaluate mode effects, appropriate equivalence criteria need to be derived from the intended use of test scores and test score interpretations (Buerger, Kroehne, & Goldhammer, 2016). Therefore, the following research questions were investigated for each test: Do CBA and PBA measure the same underlying construct? Is reliability equal between modes? Are the item parameters invariant between modes? Is there a homogeneous shift in item difficulty on computer? Can mode effects be explained by item properties such as the response format or navigation requirements?
American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for Educational and Psychological Testing. Washington: AERA, APA, NCME. Blossfeld, H.-P., Roßbach, H.-G, & von Maurice, J. (Eds.) (2011). Education as a Lifelong Process – The German National Educational Panel Study (NEPS). [Special Issue] Zeitschrift für Erziehungswissenschaft, 14. Buerger, S., Kroehne, U., & Goldhammer, F. (2016). The Transition to Computer-Based Testing in Large-Scale Assessments: Investigating (Partial) Measurement Invariance between Modes. Psychological Test and Assessment Modeling, 58 (4), 487-606. Gehrer, K., Zimmermann, S., Artelt, C. & Weinert, S. (2013). NEPS framework for assessing reading competence and results from an adult pilot study. Journal for educational research online, Volume 5 (No. 2), 50–79. Heerwegh, D. & Loosveldt, G. (2002). An Evaluation of the Effect of Response Formats on Data Quality in Web Surveys. Social Science Computer Review, 20 (4), 471–484. Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practices, 20 (3), 16–25. International Test Commission (ITC). (2005). International Guidelines on Computer-Based and Internet Delivered Testing. Retrieved from https://www.intestcom.org/files/guideline_computer_based_testing.pdf Kiefer, T., Robitzsch, A., & Wu, M. (2015). TAM: Test analysis modules. (R package version 1.15-0). Muthén, L.K., & Muthén, B.O. (1998-2015). Mplus User’s Guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén. Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2002). Practical considerations in computer-based testing. New York: Springer. Penfield, R. D., & Camilli, G. (2007). Differential item functioning and item bias. In C. R. Rao, & S. Sinharay (Eds.), Handbook of Statistics: Vol. 26. Psychometrics, (pp.125–167). New York, NY: Elsevier. Poggio, J., Glasnapp, D. R., Yang, X. & Poggio, A. J. (2005). A Comparative Evaluation of Score Results from Computerized and Paper & Pencil Mathematics Testing in a Large Scale State Assessment Program. The Journal of Technology, Learning, and Assessment, 3 (6). Pommerich, M. (2004). Developing Computerized Versions of Paper-and-Pencil Tests: Mode Effects for Passage-Based Tests. The Journal of Technology, Learning, and Assessment, 2 (6). R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from http://www.R-project.org. Wang, S., Jiao, H., Young, M. J., Brooks, T. & Olson, J. (2008). Comparability of Computer-Based and Paper-and-Pencil Testing in K 12 Reading Assessments: A Meta-Analysis of Testing Mode Effects. Educational and Psychological Measurement, 68 (1).
00. Central Events (Keynotes, EERA-Panel, EERJ Round Table, Invited Sessions)
Network 1. Continuing Professional Development: Learning for Individuals, Leaders, and Organisations
Network 2. Vocational Education and Training (VETNET)
Network 3. Curriculum Innovation
Network 4. Inclusive Education
Network 5. Children and Youth at Risk and Urban Education
Network 6. Open Learning: Media, Environments and Cultures
Network 7. Social Justice and Intercultural Education
Network 8. Research on Health Education
Network 9. Assessment, Evaluation, Testing and Measurement
Network 10. Teacher Education Research
Network 11. Educational Effectiveness and Quality Assurance
Network 12. LISnet - Library and Information Science Network
Network 13. Philosophy of Education
Network 14. Communities, Families and Schooling in Educational Research
Network 15. Research Partnerships in Education
Network 16. ICT in Education and Training
Network 17. Histories of Education
Network 18. Research in Sport Pedagogy
Network 19. Ethnography
Network 20. Research in Innovative Intercultural Learning Environments
Network 22. Research in Higher Education
Network 23. Policy Studies and Politics of Education
Network 24. Mathematics Education Research
Network 25. Research on Children's Rights in Education
Network 26. Educational Leadership
Network 27. Didactics – Learning and Teaching
The programme is updated regularly (each day in the morning)
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.