Session Information
09 SES 11 A, Exploring Gender Differences and Test-Taker Behaviour
Paper Session
Contribution
Achievement tests often provide important information, not only about students’ knowledge, but also about the efficacy of teachers, curricula, and educational systems as a whole. In research, test results are often used to gauge the effectiveness of an intervention or to compare individual students or groups of students (Van Der Flier, 1982).
One of the most important factors associated with accurate test responding is the examinees' effort to do well on the test. Motivation for students to exert maximum effort is mainly determined by the consequences of the test’s outcome. Wise and Kong (2005) found that when the test is low-stakes, many examinees may not exercise enough effort, and thus, the test results will generally not provide an accurate measure of the respondent’s knowledge.
With the intention of avoiding such distortions, a number of methods for detecting response effort have been developed. One such method is measurement of time spent on each test item. This method is based on the assumption that each item on a test requires minimal time to be understood and answered. Response times that are lower than the minimum are considered an indicator of an examinee’s low effort. Such behavior is characterized by responses that are so rapid that the respondent could not have had time to fully consider the item (rapid-guessing behavior) (Wise & Kong, 2005). When the test is low-stakes or has consequences that are not deemed important by the test taker, Wise and Kong (2005) found that rapid guessing behavior is indicative of an unmotivated respondent. Research has shown that such test taking behavior has significant consequences for the overall results of the examination. For example, Wise and DeMars (2005) compared the results of 12 empirical studies and found that on average unmotivated students performed more than one half standard deviation lower than motivated students.
When examining the relationship of test taking behavior and gender, previous research has indicated that gender does play a significant role. DeMars et al. (2013) found that males demonstrated consistently lower response time effort (a construct based on rapid guessing behavior), with males also demonstrating significantly more rapid guessing behavior than females on over half of the test items. Historically, females have been shown to outperform males in language related testing, but there remains a question of how much of this variation could be attributed to gender differences in test taking effort. Anaya and Zamarro (2020) reported that if PISA 2015 test scores were adjusted to account for effort, variation in reading scores across genders could narrow up to 39 percent of a standard deviation in favor of males.
In this conference contribution, we focus on English as a foreign language (EFL) test performance differences between girls and boys and analyze whether these differences can be explained by students’ effort as measured by response times. Our focus on student's effort on a low-stakes EFL test extends the current literature related to language skills which typically focuses on testing in students’ first language. Our research questions are as follows:
What is the difference in rapid guessing behavior based on gender as measured by response time on an English as a foreign language (EFL) test? Does this difference significantly change when different methods for the identification of rapid guessing behavior are applied?
How is rapid guessing related to EFL test scores for boys and girls?
Method
Data from a large-scale sample of Czech ISCED 2 students, including information about student background (e.g. gender) and results from an EFL test were collected. The EFL test, called “Test your English – For Schools”, is an online preparatory exam designed to provide information about English levels to potential Cambridge Exam test takers (Cambridge University Press and Assessment, 2022). The test consists of 25 multi-choice questions, asked in groups of five. We measured the time taken to complete each of the five groups of questions, as well as the total time taken to complete the test. Indication of rapid guessing behavior requires the determination of a time threshold for test items, such that all responses occurring faster than the threshold are considered rapid guesses. In order to identify rapid guessing behavior based on response times, we applied two different methods. In accordance with Wise and Ma (2012), we first used the normative threshold method. We calculated the mean time to complete a screen of five questions and determined the responses that fall 10 percent, 15 percent, and 20 percent (i.e. three thresholds) below the mean time respectively. We then repeated this process while calculating response time to all of the test questions. Using this method, all responses that occur before each respective threshold are considered rapid guesses, while responses that occur at or beyond the threshold are considered solution behaviors. Students with response times lower than the threshold for at least one of the five groups of questions are included in a group of students with rapid guessing behavior. We compared how the difference in rapid guessing behavior based on gender is related to each respective threshold. In general, it has been shown that identification of rapid guessing behavior based on the threshold of 10 percent of the mean response time has high accuracy in classification of rapid-guessing behavior (Wise & Ma, 2012). We also evaluated student response times using the visual method for identification of rapid guessing (e.g., Wise, 2006; Wise & Ma, 2012; Sahin & Colvin, 2020). This method consists of utilizing a visual representation of the data to identify spikes in response times at the early end of the response time distribution. If a student’s response time occurs before or during this early spike, the response is considered rapid. If a student’s response time occurs after this spike, the student’s response is classified as a solution behavior.
Expected Outcomes
We found differences in both rapid guessing and test performance between girls and boys. Concerning rapid guessing, in both the normative threshold and visual threshold method, the proportion of girls with rapid guessing behavior is lower than the proportion of boys with rapid guessing behavior. In other words, girls demonstrated more solution behavior than their male counterparts. We also found that girls spent more time responding to the EFL test overall. Concerning the test scores, girls performed significantly better than boys. In general, we found that the relationship between test scores and rapid guessing behavior is quadratic. Students with low and high test scores are faster than students with average test scores. However, only a few students with high test scores are classified as rapid guessors, whereas most rapid guessors achieved very low. Due to the fact that there are more boys who exhibit rapid guessing behavior than girls, rapid guessing has a bigger impact on the average test scores of boys than of girls. It is very important to support and encourage students to put more effort into answering test items, which consequently could lead to significantly higher test performance. Recommendations for further research include an investigation of the relationship between EFL testing effort and school type (private vs. public, etc.) or even class type (e.g. content language integrated learning (CLIL) vs. traditional EFL). It may also be of benefit to measure students’ EFL testing effort alongside effort in other academic areas in order to investigate how students’ testing effort may differ across domains.
References
Anaya, L., & Zamarro, G. (2020). The role of student effort on performance in PISA: Revisiting the gender gap in achievement. Education Reform Faculty and Graduate Students Publications. Retrieved January 29, 2022, from https://scholarworks.uark.edu/edrepub/116 Cambridge University Press and Assessment (2022). Cambridge Assessment English: Test your English. Retrieved January 29, 2022, from https://www.cambridgeenglish.org/test-your-english/ DeMars, C. E., Bashko, B. M., & Socha, A. B. (2013). The role of gender in test-taking motivation under low-stakes conditions. Research & Practice in Assessment,8(1), 69–82. https://files.eric.ed.gov/fulltext/EJ1062839.pdf. Sahin, F., & Colvin, K.F. (2020). Enhancing response time thresholds with response behaviors for detecting disengaged examinees. Large-scale Assessments in Education 8, 5. https://doi.org/10.1186/s40536-020-00082-1 Van Der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13(3), 267–298. https://doi.org/10.1177/0022002182013003001 Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1 Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2 Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95-114. https://doi.org/10.1207/s15324818ame1902_2 Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Annual meeting of the National Council on Measurement in Education, Vancouver, Canada. Retrieved January 29, 2022, from https://www.researchgate.net/publication/265407579_Setting_Response_Time_Thresholds_for_a_CAT_Item_Pool_The_Normative_Threshold_Method
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.