Session Information
09 SES 11 C, Methodological Issues in Tests and Assessments
Paper Session
Contribution
The possibilities that Computer Based Testing brings to large-scale assessment broadens the breadth, usage and utility of such programs.
National full-cohort assessments provide a wealth of diagnostic information for all key stakeholders. As the trend for large-scale assessment is to move closer to online adaptive testing, the question of the reliability and validity of the tests must be paramount as researchers collect data, which evidences whether online assessments are measuring what they purport; measuring a continuity or derivative of traditional pen-and-paper forms; or whether perhaps new variables are being tested.
Pre-existing scales developed from proven paper and pencil instruments and analysis methods have embedded a traditional view of the skills and content deemed assessable and measureable. Worthy of investigation is the issue of transferability of these scales to the online medium and whether student abilities are captured and measured appropriately and consequently reported accurately to stakeholders.
This study examines the design considerations in the development of an online assessment instrument, and compares the expected outcomes with the actual findings in terms of the distribution of student abilities. Grounded in current research, the structural and design development was initially guided by previous studies then adapted and refined to serve the requirements of the users and stakeholders.
As part of the design and development process, a mode-effect study was undertaken to ascertain whether significant differences existed between pen-and-paper and computer screen formats. The results were analysed to determine whether the delivery mode affected the parameters of the items within the tests, and the choice of items included in the final instruments determined using these data.
Consideration was given to the branching design involving set modules, the optimum number of items within initial and subsequent modules, and specific subject domain constraints. As the structure of the assessments evolved using pre-calibrated items, the cut points were theoretically determined, using testlet information curves and item location to raw score derived from Rasch first principles as under-pinning methodologies.
The assessment scales from previous assessment data have defined standards cut scores and suggest the proportions of students that will fall into the pre-defined Levels. Given the emphasis on the importance of the results to stakeholders it is essential that the actual results be interrogated to determine if the predicted outcome is what eventuates; or conversely, does the online branching model explore a tangential variable? Are there issues pertaining to reliability and validity relative to the established scale?
This paper reports an investigation on the accuracy of a Multi-Stage Computer Based Assessment in assigning an appropriate ability as accurately as traditional methods. It examines the use of the online instrument in the calculation of ability estimates of all students, and challenges its validity when compared to the results produced by pen-and-paper instruments.
It examines the appropriateness of the model selected, the determined cut points for branching decisions and ultimately the ranking of students and assigning of ‘Levels’ based on their demonstrated ability on differing sets of items.
Data from a large-scale student assessment from a Middle Eastern region will be compared with a dataset from the online proof of concept of a sample of students matched to the same curriculum based outcomes and using common items.
Using IRT analysis techniques (Rasch 1960), the results of the online trial will be analysed and compared with data from the 2016 of the national assessment in the same domain and Grades.
The findings of this study will help guide the direction of the online assessment by providing recommendations for its future use.
Method
Expected Outcomes
References
• Australian Council for Education Research 2013, Analytical Report: Psychometric Analysis for the Trial of the Tailored Test Design. ACER, Melbourne, December. Prepared for the Australian Curriculum, Assessment and Reporting Authority. • Holling. H., Keynote 6: CAT and optimal design for Rasch Poisson Counts Models. Keynote presentation at IACT Conference, Cambridge 2015. • Masters, G.N. (1988). The Analysis of Partial Credit. Applied Measurement in Education, 1(4), 279-297. Copyright 1988, Laurence Erlbaum Associates, Inc. • Wu, M.L., Adams, R.J., Wilson, M.R., Haldane, S.A. (2007). ACER ConQuest Version 2: Generalised item response modelling software [computer program]. Camberwell: Australian Council for Educational Research.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.