Session Information
09 SES 07 A, Exploring Behavior, Learning, and Well-being in Diverse Educational Contexts
Paper and Ignite Talk Session
Contribution
When participating in assessments, it is assumed that examinees have invested effort to perform well; otherwise, scores will not reflect their true ability and will not be valid indicators of their proficiency (Baumert & Demmrich, 2001; Wise, 2015). However, lack of motivation and effort during the test-taking process, creates a threat to the validity of test outcomes, especially with International Large-Scale Assessments (Rutkowski & Wild, 2015).
Response-time data from computerized tests have enabled researchers to study test-taking effort, e.g. by identifying respondents who respond rapidly before a certain time point. However, there is a need to move beyond the examination of rapid responses through thresholds, since the time needed to respond to a test item is dependent on various factors, including ability and test-taking behaviors. Consequently, it is argued that to be interpreted appropriately, response times should be examined in relation to examinee’s performance at the item level.
The purpose of this study is to examine two novel response-based, indicators of test-taking behaviors that utilize a combination of examinee response and process (timing) data to better understand and describe test-taking effort in online assessments. These indicators, which have been named “Unsuccessful time management” and “Successful time management” will be empirically estimated with data from the fourth-grade e-TIMSS 2019 mathematics assessment. This study further aims to examine these variables in relation to achievement benchmarks, student background characteristics such as attitudes towards mathematics, confidence in mathematics, gender, as well as overall achievement. The ultimate goal of these analyses is to try to obtain further insights on examinees who participate in online assessments through the use of their timing data.
Method
The sample utilized in the study was that of grade 4 students from the USA who had participated in e-TIMSS 2019. The sample included 10029 students, of which 49.44% were female. The average age of the students was 10.29 years of age (SD=0.43) To calculate the indicators of the current study, the average time spent on an item was first calculated for each test item separately. At a second stage, a deviation score was calculated for each student who was administered item i, by subtracting the average sample screen time for item i from the students’ time for the same item. Based on these deviation scores, a cumulative indicator was calculated as follows: 1) For items that were omitted or were answered incorrectly in less time than average, this negative timing difference was added to the Unsuccessful Time Management indicator for the examinee. Therefore, this indicator represents the sum of the unused time that was spent on test items that were answered incorrectly indicating that most likely, the students made less than adequate effort to answer them correctly. 2) For items that were answered correctly in less time than average, this negative timing difference was added to the Successful Time Management indicator for the examinee. This indicator represents the sum of the unused time that was spent on correct answers, indicating that most likely the students were either already proficient on the specific content and thus did not need additional time to correctly respond to those items, or that the correct answer was a consequence of lucky guess. Overall, 86.459% of the participants utilized less time than average on at least one item of their incorrect responses. This resulted in an average of 320.881 (sd=166.792) seconds of unused time for those examinees. Of the 48.489% of the participants who utilized less time than average on at least one of their correct responses, had an average of 20.617 seconds (sd=17.576) of unused time. The correlation between these two indicators was 0.29 (se=0.01)
Expected Outcomes
The results of this study showed that when examining these indicators by benchmark level, as the benchmark levels increase, the successful time management indicator increased, while the unsuccessful time management indicator decreased. Also, students who spent less time in incorrect answers tended to be in the lower benchmarks. The correlation between the Successful Time Management indicator and achievement equaled r=0.25 (se=0.01), while the correlation between the Unsuccessful Time Management indicator and achievement equaled r=-0.08 (se=0.02). So, the students with higher levels of achievement tended to have more unused time on their correct answers (thus, most likely being an indicator of mastery of the test content), and had less unused time for their incorrect answers. This indicated that they generally struggled more with such items; however, this relationship was very small. Further analyses found that 89.50% of the students who reached all test items, had the most amount of unused time. Most likely, this occurred while trying to ensure that they had time to complete the test. These were also the students who had the highest average achievement (M=537.75, SD=84.80). The students who ran out of time were the ones who had the least amount of unused time. These are most likely students who spent more time than average on most items, which resulted in their running out of time in the end. Finally, the 5.36% of the students that stopped responding, were most likely the students who made the least amount of effort and had the lowest average achievement (M=482.50, SD=81.59) which further verifies their low effort on the test. Overall, the results of this study revealed that both indicators might provide additional insights related to examinee test-taking effort and characteristics, when conditioned on the accuracy of their responses. However, more research is needed to understand these indicators more comprehensively.
References
Baumert, J., & Demmrich, A. (2001). Test motivation in the assessment of student skills: The effects of incentives on motivation and performance. European Journal of Psychology of Education, 16(3), 441-462. https://doi.org/10.1007/BF03173192 Rutkowski, D., & Wild, J. (2015). Stakes matter: Student motivation and the validity of student assessments for teacher evaluation. Educational Assessment, 20(3), 165-179. https://doi.org/10.1080/10627197.2015.1059273 Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education, 28(3), 237-252. https://doi.org/10.1080/08957347.2015.1042155
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.