Designing Items for Technology-Based Assessments (TBAs): Using eye movement data to understand test-taker behaviour

Author(s):

Paula Lehane (presenting / submitting)

Conference:

ECER 2022

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 ONLINE 26 B, Assessment of ICT and Digital Skills

Paper Session
MeetingID: 913 1220 1913 Code: 95UjHC

Time:

2022-09-02

18:00-19:30

Room:

n/a

Chair:

Eugenio Gonzalez

Contribution

Since the introduction of computer-based exams by the Organisation for Economic Co-operation and Development (OECD) for the Programme for International Student Assessments (PISA) in 2015, a number of countries have introduced online testing systems for post-primary students e.g. New Zealand, Ireland. Technology-Based Assessments (TBAs) such as these can use questions and items that employ a broad array of multimedia stimuli and response mechanics (Oranje et al., 2017). Although Bryant (2017) argued that such items can make TBAs more authentic than their paper-based counterparts and can measure a greater array of knowledge and skills, the impact of medium-unique features like multimedia stimuli on test-taker performance for post-primary students has yet to be fully clarified.

‘Multimedia’ refers to the combination of text with other media elements such as images, animations or simulations to communicate meaning and information (Jordan, 1998). The addition of multimedia objects to an item can greatly affect test-taker performance. For example, Lindner et al. (2017a) found that the addition of representational pictures to text-based items improved student performance, accelerated item processing and reduced rapid guessing behaviours in testing contexts. This ‘multimedia effect’ is well established in learning contexts and there does seem to be some evidence of its presence in assessments as well (e.g. Lindner et al., 2017b). Therefore, it is hardly surprising that there is now a growing discussion around the use of multimedia stimuli for test items, particularly those with dynamic versions e.g. animations. It is thought that animations may be able to reduce construct-irrelevant variance related to reading or language proficiency (e.g. Karakolidis et al., 2021). However, it is also possible that the use of this objects in in test items may alter the information processing and attentional allocation behaviours of test-takers (e.g. Malone & Brünken, 2013; Wu et al., 2015). Consequently, it is possible that the inclusion of multimedia stimuli in TBAs may modify the knowledge or skill being assessed or introduce something that could result in an erroneous judgement of competency (Vorstenbosch et al., 2014).

Other multimedia objects such as simulations are now becoming more commonplace in TBAs. Simulations are interactive forms of multimedia objects whereby test-takers can ‘produce’ an imitation of a real world scenario (Levy, 2012). Consequently, simulations hold a significant amount of potential for the assessment of some of the more complex cognitive processes of Bloom’s taxonomy, specifically “analysis” and “evaluation” which can be difficult to achieve using traditional exam questions (Scully, 2017). While use of these items is growing in popularity, some commentators have noted that their introduction to educational TBAs has been somewhat rushed from a practical and psychometric perspective; this poses a threat to the validity of inferences drawn (e.g. Shiel et al. 2016). In particular, there appears to be a lack of understanding as to how test-takers engage with these multimedia objects.

To add a meaningful contribution to this field of research, the aim of this study was to determine if differences in item stimuli e.g. use of text and images or animations can affect test-taker or item performance. It also sought to identify how test-takers engage with complex simulation-type items where multiple actions are required by the test-taker. To do this, test score data, eye-movement data and cognitive interview data were gathered from Irish post-primary students.

Method

Three related studies were conducted to address the stated research aims. For Study 1, data were gathered from 251 Irish second-level students (Mean Age: 15.6, SD: 0.5). The participants were randomly assigned by an online testing platform to take either the dynamic (using animated stimuli; n=110) or static (text and image-based stimuli; n=141) version of the same TBA for scientific literacy, creating two groups of comparable size. The 15 items in the TBA for scientific literacy were modified versions of publicly available PISA 2015 items (OECD, 2016). The items aimed to assess the general scientific literacy skills that students aged between 14 and 16 years are expected to have. Using the Tobii Pro Fusion eye-tracker (120Hz) to monitor their eye movements while completing these items, eye movement data as a measure of attentional behaviour was collected from 32 of these participants. For Study 2, eye movement and performance data was collected from 24 participants who completed the five ‘Running in Hot Weather’ simulation-type items available online after their involvement in Study 1. As it was not possible for these items to be downloaded or modified, participants accessed the unit directly through the PISA website (https://tinyurl.com/yu4jmacj). Twelve participants agreed to participate in Study 3. For this, the participants were presented with a gaze plot that replayed their eye movements whilst simultaneously engaging in a think-aloud interview. Think-aloud methods ask participants to verbalise their thoughts on a task in order to better understand the mental processes that underlie an individual’s performance (Salkind 2012). Taking into consideration that cognitive processes are quicker than verbal processes (whereby participants may be thinking about more than they can verbally express) and that the act of trying to verbalise thoughts may also interfere with task performance, a retrospective think-aloud (RTA) approach was considered the most appropriate. To minimise the possible effects of ‘misremembering’ or forgetting important information, a cued-RTA (whereby the ‘cue’ was an eye movement video) was deployed. Participants were asked to recall their thoughts and actions for four items (one multiple-choice item, one item requiring a drag-and-drop response, one item involving an open-ended text response and one simulation-type item).

Expected Outcomes

For Study 1, there was no statistically significant difference in scores between those who took the dynamic (M = 53.88, SD = 20.07) and static version (M = 52.53, SD = 18.70) of the TBA for scientific literacy; t(249) = -0.549, p =0.58. d=-0.07. However, item statistics (difficulty, discrimination) for a number of items indicated that some items were easier in the static condition than in the dynamic condition or vice versa. Analysis of the accompanying eye-movement data (n=32) suggested that, for these items at least, there were marked differences in the attentional behaviours of test-takers according to key eye-movement metrics. Understanding that stimulus modality can affect test-taker behaviour, without necessarily affecting performance, should support the optimal design of test items in future TBAs as well as the validity of inferences drawn from them. In relation to Study 2, it was found that test-takers who received full credit for their performance on simulation-type items paid more attention to relevant areas of the simulation than those who did not receive full credit. However, this behaviour was not consistently found across all tasks, suggesting that operationalising test-takers’ behaviour and performance in simulation-type items may not be as straightforward as expected. Other log-file data variables (e.g. number of simulations) were also examined. These analyses also supported the conclusion that inferences on test-taker actions in a TBA should be only applied after certain contextual factors have been considered. Thematic analysis of the data gathered in Study 3 captured the nature of students’ interactions with online testing environments under three main themes: Familiarisation, Sense-making and Making Decisions. Students also provided their opinions of and recommendations for the future of Irish online assessments which should be of use of to all stakeholders involved in any current or future post-primary TBAs.

References

Bryant, W. (2017). Developing a strategy for using technology-enhanced items in large-scale standardized tests. Practical Assessment, Research & Evaluation, 22(1), 1–5. https://pareonline.net/getvn.asp?v=22&n=1 Jordon, K. (1998). Defining multimedia. IEEE Multimedia, 5, 8-15. https://www.researchgate.net/publication/220635177_Defining_Multimedia Karakolidis, A., O'Leary, M. & Scully, D. (2021). Animated videos in assessment: comparing validity evidence from and test-takers' reactions to an animated and a text-based situational judgement test. International Journal of Testing. 10.1080/15305058.2021.1916505 Lindner, M., Lüdtke, O., Grund, S., & Köller, O. (2017a). The merits of representational pictures in educational assessment: Evidence for cognitive and motivational effects in a time-on-task analysis. Contemporary Educational Psychology, 51, 482–492. https://doi.org/10.1016/J.CEDPSYCH.2017.09.009 Lindner, M., Eitel, A., Strobel, B., & Köller, O. (2017b). Identifying processes underlying the multimedia effect in testing: An eye-movement analysis. Learning & Instruction, 47, 91–102. https://doi.org/10.1016/J.LEARNINSTRUC.2016.10.007 Levy, R. (2012). Psychometric advances, opportunities, and challenges for simulation-based assessment. Paper presented at the Invitational Research Symposium on Science Assessment, May 2012. https://www.ets.org/Media/Research/pdf/session2-levy-paper-tea2012.pdf Malone, S. & Brünken, R. (2013). Assessment of driving expertise using multiple choice questions including static vs. animated presentation of driving scenarios. Accident Analysis and Prevention, 51, 112–119. https://doi.org/10.1016/j.aap.2012.11.003 Oranje, A., Gorin, J., Jia, Y., & Kerr, D. (2017). Collecting, analysing, and interpreting response time, eye tracking and log data. In K. Ercikan & J. W. Pellegrino (Eds.), Validation of score meaning for the next generation of assessments (pp. 39–51). Routledge. Organisation for Economic Co-Operation and Development (OECD). (2016). PISA 2015 results: Policies and practices for successful schools (Volume I). OECD Publishing. https://www.oecd.org/publications/pisa-2015-results-volume-i-9789264266490-en.htm Salkind, N.J. (2010). Encyclopedia of Research Design. SAGE. https://dx.doi.org/10.4135/9781412961288.n460 Scully, D. 2017. "Constructing multiple-choice items to measure higher-order thinking". Practical Assessment, Research, and Evaluation, 22: Article 4. https://doi.org/10.7275/swgt-rj52 Shiel, G., C. Kelleher, C. McKeown and S. Denner. 2016. Future ready? The performance of 15-year-olds in Ireland on science, reading literacy and mathematics in PISA 2015. ERC. Tobii AB (2020). Eye tracker data quality test report: Tobii Pro Fusion. https://www.tobiipro.com/siteassets/tobii-pro/accuracy-and-precision-tests/tobii-pro-fusion-accuracy-and-precision-test-report.pdf Vorstenbosch, M., Bouter, S., van den Hurk, M., Kooloos, J., Bolhuis, S., & Laan, R. (2014). Exploring the validity of assessment in anatomy: Do images influence cognitive processes used in answering extended matching questions? Anatomical Sciences Education, 7(2), 107–116. https://doi.org/10.1002/ase.1382 Wu, H., Chang, C., Chen, C.-L. D., Yeh, T.K. & Liu, C.C. (2010). Comparison of Earth Science Achievement Between Animation-Based and Graphic-Based Testing Designs. Research in Science Education, 40, 639–673. https://doi.org/10.1007/s11165-009-9138-9

Author Information

Paula Lehane (presenting / submitting)

Dublin City University

Dublin

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.