Session Information
99 ERC SES 03 G, Assessment, Evaluation, Testing and Measurement
Paper Session
Contribution
Background
In low-stakes tests, the incentives for students to invest their maximum effort may be minimal. Therefore, differences in test performance may not only reflect variations in content knowledge but differences in non-content-knowledge factors, such as student effort. One such example is international assessments, such as PISA (Programme for International Student Assessment), in which differences in student effort may explain part of the within-country variation in test performance by demographic groups, such as gender, as well as across countries. Research suggests that ignoring student effort may lead to biased conclusions about the test performance of a group of examinees (Wise & DeMars, 2010; Wise & Kong, 2005). Evidence from international assessments shows that student effort predicts part of the variation in test performance across countries (Zamarro et al., 2019).
Understanding to what extent standardized assessments may capture student effort could be important to comprehend the degree to which it helps to explain differential achievement across genders. Differences in student effort could help explain differences in student performance across countries, as well as test score gender gaps within countries. Obtaining a better understanding of the role of effort on gender achievement gaps is important given the predictive power of math and science performance on explaining women’s underrepresentation in science occupations (Anaya et al., 2017; Ceci et al., 2014).
Research question
In this paper, we use data from PISA 2015 and study the extent to which student effort contributes to explain gender gaps in math, science, and reading achievement within each country.
Conceptual framework and literature review
Prior research suggests that differences in student test effort help explain within-country and cross-country variations in test performance, and the overall performance of a group of examinees (Wise & DeMars, 2010; Zamarro et al., 2019). If student effort varies by gender, differences in effort could affect our understanding of gender gaps in test performance. Along these lines, Balart & Oosterveen (2019) use measures of decline in performance throughout the PISA test and find that girls are better at sustaining test performance than boys. According to the authors, this result has consequences for the measurement of the gender achievement gap because in longer assessments, the gap in math and science is smaller compared to shorter assessments.
Using data from the U.S., Soland (2018a) obtains similar findings. Soland (2018b) measures effort based on response times of test questions and finds that after removing the effect of effort in test scores, the gender gap in math achievement would be wider, and it is more sensitive to effort-adjustment than the reading gap.
We advance the current state of knowledge in two ways:
- We contribute to prior literature about student effort in international assessments (Boe et al., 2002; Debeer et al., 2014; Zamarro et al., 2019) that, to our knowledge, mostly uses data from paper-based assessments. We employ the normative threshold (NT) method, a validated technique that uses response times from computer-based tests to measure student effort (Soland et al., 2019; Wise & Ma, 2012).
- We contribute to the NT literature by reproducing this method in a large international representative sample since most of the evidence focuses on the U.S. (Soland et al., 2019; Wise & Ma, 2012). Additionally, few studies analyze whether or not there are gender differences in test effort (Wise & Kong, 2005).
Method
In this paper, we use data from PISA 2015, a low-stakes triannual test that evaluates 15-year-old students around the world in math, reading, and science. We restrict our sample to the 54 countries and economies that took the computer-based test. We measure student effort in computer-based tests using the response-time-effort (RTE) score (Wise & Kong, 2005). The RTE score is the proportion of questions of the assessment in which the examinee engages in solution behavior (i.e., the examinee takes the time to analyze the question [Schnipke & Scrams, 1997]). The higher the score, the more the student strives in the test. The RTE score uses response times for each question and requires setting a time threshold for each item that separates solution behavior responses from rapidly guessed responses. To set the time thresholds, we use the NT method. In this method, the time threshold is a percentage of the mean response time for a given question (Wise & Ma, 2012). Evidence from different studies shows that RTE is a valid measure of student effort (Swerdzewski et al., 2011; Wise & Kong, 2005). One of the features of the PISA 2015 computer assessment is that it provides response times for each question. We use the proportion of rapid-guessing responses (i.e., the inverse RTE score: 1–RTE) to measure student effort since we are interested in understanding whether or not the question difficulty level triggers rapid-guessing behavior. A high proportion of rapid-guessing responses suggests that the examinee invests low-effort. We exploit the random assignment of PISA test booklets to students within each country to estimate a country-random-effects model that assesses the role of student effort in explaining test performance. From this estimation, we obtain effort-adjusted test scores by adding the estimated residuals and the country random-effect to calculate the estimated gender achievement gap for each subject (i.e., math, reading, and science) and country. Then, we compare the estimated gap with the calculated gap, which is based on actual test performance, to obtain the change in the gender achievement gap that would occur in the absence of student effort heterogeneity.
Expected Outcomes
We find considerable heterogeneity of student effort across countries but not across gender groups. We find that the estimated gender achievement gap in math and science for each country could be up to 0.4 standard deviations wider in favor of boys in the absence of variation in student effort, whereas in reading the estimated gap could be up to 0.39 standard deviations wider in favor of girls. Altogether, our effort measures on average explain between 43 and 48 percent of the cross-country variation in test scores. Our results highlight the importance of accounting for differences in student effort to understand cross-country heterogeneity in performance and variations in gender achievement gaps across nations.
References
Anaya, L., Stafford, F., & Zamarro, G. (2017). Gender Gaps in Math Performance, Perceived Mathematical Ability and College STEM Education: The Role of Parental Occupation. EDRE Working Paper, 2017–21. https://doi.org/10.2139/ssrn.3068971 Balart, P., & Oosterveen, M. (2019). Females show more sustained performance during test-taking than males. Nature Communications, 10(1), 3798. https://doi.org/10.1038/s41467-019-11691-y Boe, E. E., May, H., & Boruch, R. F. (2002). Student Task Persistence in the Third International Mathematics and Science Study: A Major Source of Acheievement Differences at the National, Classroom, and Student Levels. https://eric.ed.gov/?id=ED478493 Ceci, S. J., Ginther, D. K., Kahn, S., & Williams, W. M. (2014). Women in Academic Science: A Changing Landscape. Psychol Sci Public Interest, 15(3), 75–141. https://doi.org/10.1177/1529100614541236 Debeer, D., Buchholz, J., Hartig, J., & Janssen, R. (2014). Student, School, and Country Differences in Sustained Test-Taking Effort in the 2009 PISA Reading Assessment. Journal of Educational and Behavioral Statistics, 39(6), 502–523. https://doi.org/10.3102/1076998614558485 Schnipke, D. L., & Scrams, D. J. (1997). Modeling Item Response Times with a Two-State Mixture Model: A New Method of Measuring Speededness. Journal of Educational Measurement, 34(3), 213–232. http://www.jstor.org/stable/1435443 Soland, J., Jensen, N., Keys, T. D., Bi, S. Z., & Wolk, E. (2019). Are Test and Academic Disengagement Related? Implications for Measurement and Practice. Educational Assessment, 24(2), 119–134. https://doi.org/10.1080/10627197.2019.1575723 Swerdzewski, P. J., Harmes, J. C., & Finney, S. J. (2011). Two Approaches for Identifying Low-Motivated Students in a Low-Stakes Assessment Context. Applied Measurement in Education, 24(2), 162–188. https://doi.org/10.1080/08957347.2011.555217 Wise, S. L., & DeMars, C. E. (2010). Examinee Noneffort and the Validity of Program Assessment Results. Educational Assessment, 15(1), 27–41. https://doi.org/10.1080/10627191003673216 Wise, S. L., & Kong, X. (2005). Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2 Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. https://www.nwea.org/content/uploads/2012/04/Setting-Response-Time-Thresholds-for-a-CAT-Item-Pool.pdf Zamarro, G., Hitt, C., & Mendez, I. (2019). When Students Don’t Care: Reexamining International Differences in Achievement and Student Effort. Journal of Human Capital. https://doi.org/10.1086/705799
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.