Session Information
09 SES 11 A, Exploring Gender Differences and Test-Taker Behaviour
Paper Session
Contribution
Self-reported rating scales are widely used in educational and psychological research to measure non-cognitive constructs. In international large-scale assessments (ILSA), rating scales are often used in background questionnaires to measure constructs that may explain achievement differences between students. However, it is not uncommon to find that respondents have preferences for specific response categories regardless of the content of the items (Paulhus, 1991; van Vaerenbergh & Thomas, 2013).
This phenomenon is known as response styles (RS). Three categories of RS can be distinguished: (a) extreme response styles (ERS) when the respondents tend to choose an extreme response category; (b) mid response styles (MRS) when respondents tend to choose the middle response category; and (c) acquiescence (ARS) and dis-acquiescence response styles (DARS) depending on whether respondent tends to agree or disagree with each item (Baumgartner & Steenkamp, 2001; Henninger & Meiser, 2020).
A traditional way of accounting for RS is a two-step approach, in which RS analysis is performed on the data after the primary trait is estimated. For example, it is possible to count the number of agreements or extreme responses or include reversed items and count double agreements (Bachman & O'Malley, 1984; Reynolds & Smith, 2010). A second method is to look at the consistency between different measures of the same trait and evaluate their consistency in MRS or ERS (Hox, de Leeuw, & Kreft, 2011; Johnson, Kulesa, Cho, & Shavitt, 2005). However, these methods require a large number of items or scales, but they cannot provide a specific measure of RS, i.e., they cannot rank respondents on their degree of ARS, MRS, or ERS.
Therefore, methodologies specialized in measuring latent variables under item response theory (IRT) have also been implemented to study RS. This approach permits RS to be modeled jointly with the primary trait. Moreover, IRT allows to analyze each item's behavior separately and to use an incomplete test administration design.
Multidimensional IRT models assume the existence of an RS latent trait in addition to the trait of interest (Jeon & De Boeck, 2015) and that the respondents answer an item through a sequential, multi-stage cognitive process. Given this framework, answers can be allocated as final nodes of an item response decision tree or IRTree.
The present study uses different IRTree models to explore ERS in attitudinal scales using the whole sample of the 2019 cycle of the Trends in International Mathematics and Science Study (TIMSS). The presence of ERS introduces systematic errors in the measurement of attitudinal scales and therefore reduces construct validity (Khorramdel, Davier, & Pokropek, 2019; Kim & Bolt, 2020). Furthermore, invalid scores can lead to invalid results when explaining student outcomes, ultimately resulting in void conclusions (Lu & Bolt, 2015). Therefore, it is necessary to study whether ERS play a role in TIMSS attitudinal scales and investigate whether they are linked to student background characteristics or student performance.
Method
The presence of ERS was evaluated via ERS IRTree models for three attitudinal scales retrieved from TIMSS students’ questionnaires for both science and mathematics: Selfefficacy, Intrinsic motivation, and Extrinsic motivation. Two ERS models (ERS-m (directional invariance) and BERS-m (directional variance)) were estimated separately for each of the six scales and compared with the partial credit model (PCM). The best models were selected by assessing their relative fit with the AIC and BIC indices. Directional and content invariance were analyzed for the ERS-m and the BERS-m by grouping the three scales of science and mathematics. Moreover, to evaluate which student and school characteristics explain individual differences in ERS, an explanatory model for the ERS trait of the ERS-m was estimated. Finally, the impact of using ERS on explanatory models for students’ achievement in science and mathematics in TIMSS 2019 was assessed by comparing the explained variance and the coefficients of the traits obtained by the PCM and the ERS-m as predictors.
Expected Outcomes
For the six scales, the ERS models had a better fit than the PCM, showing that students differ in their tendency to respond in an extreme way to these items. In most scales, the assumption of directional invariance had a lower fit, so it could be concluded that for the individual scales, the tendency to give a positive extreme response differs from giving a negative extreme response. Moreover, the correlations between ESR scores are not the product of related content between scales but seem to indicate a generalized tendency to give extreme answers. As for item-level content invariance, the models considered show that different item thresholds showed the best fit. Therefore, it is possible to infer that students’ tendency to choose extreme response alternatives, this selection is not entirely independent of the item content. Thus, the probability of giving an extreme response does not depend only on the student’s ERS trait but also on item characteristics. Finally, the relationship between TIMSS achievement scores and the latent traits estimated by the ERS-m was analyzed. Although statistically significant relationships were found between ERS latent traits and students’ scores, their inclusion did not increase the explained variance. This finding does not invalidate the analysis of ERS because it is expected theoretically that the studied constructs impact achievement, not individual ERS. However, the ERS models, by disentangling the primary trait from the ERS trait, may offer to some extent greater accuracy on the effect of the primary trait. Hence, the present study found systematic individual differences with respect to ERS in attitudinal scales in TIMSS corroborating previous findings. However, the present study also showed that this personality trait is not related to the students’ performance in science and mathematics.
References
Baumgartner, H., & Steenkamp, J.-B. E. M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38(2), 143-156. doi: 10.1509/jmkr.38.2.143.18840 Bachman, J. G., & O'Malley, P. M. (1984). Yea-saying, nay-saying, and going to extremes: Black-white differences in response styles. Public Opinion Quarterly, 48(2), 491-509. doi: 10.1086/268845 Henninger, M., & Meiser, T. (2020). Different approaches to modeling response styles in divide-by-total item response theory models (part 1): A model integration. Psychological Methods, 25(5), 560-576. doi: 10.1037/met0000249 Hox, J. J., de Leeuw, E. D., & Kreft, I. G. G. (2011). The effect of interviewer and respondent characteristics on the quality of survey data: A multilevel model. In Measurement errors in surveys (p. 439-461). John Wiley & Sons, Inc. doi: 10.1002/9781118150382.ch22 Jeon, M., & De Boeck, P. (2015). A generalized item response tree model for psychological assessments. Behavior Research Methods, 48(3), 1070-1085. doi: 10.3758/s13428-015-0631-y Johnson, T., Kulesa, P., Cho, Y. I., & Shavitt, S. (2005). The relation between culture and response styles. Journal of Cross-Cultural Psychology, 36(2), 264-277. doi: 10.1177/0022022104272905 Khorramdel, L., Davier, M., & Pokropek, A. (2019). Combining mixture distribution and multidimensional IRTree models for the measurement of extreme response styles. British Journal of Mathematical and Statistical Psychology, 72(3), 538-559. doi: 10.1111/bmsp.12179 Kim, N., & Bolt, D. M. (2020). A mixture IRTree model for extreme response style: Accounting for response process uncertainty. Educational and Psychological Measurement, 81(1), 131-154. doi: 10.1177/0013164420913915 Lu, Y., & Bolt, D. M. (2015). Examining the attitude-achievement paradox in PISA using a multilevel multidimensional IRT model for extreme response style. Large-scale Assessments in Education, 3(1). doi: 10.1186/s40536-015-0012-0 Paulhus, D. L. (1991). Measurement and control of response bias. In Measures of personality and social psychological attitudes (p. 17-59). Elsevier. doi: 10.1016/b978-0-12-590241-0.50006-x Reynolds, N., & Smith, A. (2010). Assessing the impact of response styles on cross-cultural service quality evaluation: A simplified approach to eliminating the problem. Journal of Service Research, 13(2), 230-243. doi: 10.1177/1094670509360408
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.