12 SES 13 B JS, Translation and Cross-cultural Comparability in Large Scale Assessments
Joint Paper Session NW 09 and NW 12
Respondents’ self-reports are often employed in international surveys (e.g. PISA, TIMSS) and are frequently used to compare different groups of respondents (based on country, socioeconomic status etc.). However, there is a serious concern about the comparability of such data, which may be hindered by bias. It occurs if the score differences on the indicator of a construct do not correspond to the differences in the underlying trait or ability (van de Vijver & Tanzer, 2004). One of the potential sources of scale score distortion is socially desirable responding (Kam, Risavy, & Perunovic, 2015). Socially desirable responding (SDR) is defined as a tendency for some people to self-enhance when describing themselves (Paulhus, Harms, Bruce, & Lysy, 2003). The overclaiming technique is a novel approach with the potential to improve the cross-cultural comparability of respondents’ self-reports of knowledge in diverse fields.
Several studies have documented notable differences in reporting behavior between respondents from different countries using different methods. For example, Chen, Lee, and Stevenson (1995) found differences in response styles between North Americans and East Asians. U.S. students were more likely to use extreme scale points while East Asian students were more likely to use midpoints. Buckley (2009) analyzed response styles using the PISA 2006 dataset and computed acquiescence, disacquiescence, and extreme response styles and noncontingent responding for 57 countries. Using the anchoring vignette method, He, Buchholz, and Klieme (2017) and Vonkova, Zamarro, DeBerg, and Hitt (2015) showed a substantial heterogeneity in student’s perceptions of teacher's classroom management reporting behavior across PISA 2012 countries. In some countries, students had higher standards for judging teacher behavior and therefore such countries improved their relative position in the ranking of teachers’ classroom management skills after adjusting for heterogeneity in reporting behavior. In contrary, students in other countries had lower standards and therefore such countries worsened their relative position after adjustment. He and van de Vijver (2016) in their analysis of PISA 2012 data focused on the motivation-achievement paradox in the case of Chinese students. They argue that the cultural influence of modesty and self-criticism is imprinted on the scale use preferences as measured by both response styles and overclaiming.
In this paper, we focus on the analysis of respondents’ reporting behavior using the overclaiming technique (more about the technique below in Methods/Methodology part). The technique has been applied to the area of students’ familiarity with mathematical concepts and was a part of PISA 2012 survey. Our analysis is done for 64 participating countries/regions. We identify similar patterns of responding in geographically and culturally close country-regions. We also validate the overclaiming scores using external variables like PISA math test scores, GDP and public expenditure in education.
The main research questions are:
(1) What are the responding patterns, as identified by the overclaiming technique, in different countries and world regions?
(2) What is the external validity of overclaiming scores in cross-country comparison?
Socially desirable responding is one of the potential sources of scale score distortion (Kam et al., 2015). Several approaches like social desirability scales (e.g. Balanced Inventory of Desirable Responding), various intrapsychic measures, or criterion discrepancy measures have been proposed to measure SDR. However, serious concerns have been raised about their validity or utility, for example, the problem researchers have in distinguishing valid personality content in response patterns from responses influenced by desirable responding (Paulhus, 2011; Paulhus et al., 2003). The overclaiming technique is a promising approach with the potential to overcome the disadvantages of previous methods. The overclaiming technique asks respondents to rate their familiarity with a set of items from a particular field of knowledge (e.g. astronomy, history, literature). Some of the items (usually about 20%), however, do not actually exist (foils). By using signal detection analysis, the technique allows us to measure respondents’ knowledge exaggeration (the overal tendency to report familiarity with both existent and nonexistent items) and accuracy (the ability to discriminate between existent and nonexistent items; Paulhus et al., 2003). In this paper we use the questions on familiarity with mathematical concepts used in PISA 2012 student questionnaire. It includes the observations of 275 904 students in 64 countries and economies. The question about familiarity with concepts is the following: Thinking about mathematical concepts: how familiar are you with the following terms? The list of concepts then follows in this order: exponential function, divisor, quadratic function, proper number, linear equation, vectors, complex number, rational number, radicals, subjunctive scaling, polygon, declarative fraction, congruent figure, cosine, arithmetic mean, and probability. The 5-point rating scale for each concept was: 1) never heard of it, 2) heard of it once or twice, 3) heard of it a few times, 4) heard of it often, 5) know it well, understand the concept. The list of concepts included 13 existing mathematical concepts and 3 foils (proper number, subjunctive scaling, declarative fraction). The foils were created by combining a grammatical term (proper, subjunctive, declarative) with a mathematical term (number, scaling, fraction; OECD, 2014).
In total, 61.7% of all students report a higher familiarity with existing concepts than non-existing ones, but only 1% of all students achieved the highest possible accuracy, i.e. familiarity with all existing concepts and no familiarity with non-existing concepts, which is basically the “correct” solution. Interestingly, in comparison with the low percentage of students achieving the highest possible accuracy (1%), 19.5% of all respondents reached the highest possible exaggeration, i.e. they report knowing all the concepts. We also found considerable differences in response patterns among PISA 2012 participating countries. According to their response patterns we categorized the countries/economies into groups with: a) high accuracy and exaggeration like Macao and Turkey, b) low accuracy and high exaggeration like Indonesia and Albania, c) low accuracy and low exaggeration like Luxembourg and Sweden, and d) high accuracy and low exaggeration like Korea and Finland. Also, there seems to be consistent response patterns in particular world regions. For example, East Asia (e.g. Chinese Taipei, Japan, Korea) is a consistent region where students tend to be accurate and don’t exaggerate. We investigated the unadjusted familiarity score (familiarity with only existing concepts) and the adjusted familiarity score using the OCT (familiarity with non-existing concepts subtracted from familiarity with existing concepts) and their relationships with external variables: math achievement, GDP, and public expenditure per student. We show that the unadjusted familiarity score correlates negatively with all the external variables (-0.04 with the math score, -0.22 with GDP, -0.39 with PEPS), which is contrary to what would reasonably be expected. The adjusted score, however, correlates positively with all the external variables (0.68 with math score, 0.10 with GDP, 0.22 with PEPS) indicating the validity of the adjusted familiarity score.
Buckley, J. (2009). Cross-national response styles in international educational assessments: Evidence from PISA 2006. New York University: Department of Humanities and Social Sciences in the Professions. Retrieved from: https://edsurveys.rti.org/PISA/documents/Buckley_PISAresponsestyle.pdf Chen, C., Lee, S. Y., & Stevenson, H. W. (1995). Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychological Science, 6(3), 170–175. He, J., Buchholz, J., & Klieme, E. (2017). Effects of anchoring vignettes on comparability and predictive validity of student self-reports in 64 cultures. Journal of Cross-Cultural Psychology, 48(3), 319–334. He, J., & van de Vijver, F. J. R. (2016). The motivation-achievement paradox in international educational achievement tests: Toward a better understanding. In R. B. King & A. B. I. Bernardo (Eds.), The psychology of Asian learners: A festschrift in honor of David Watkins (pp. 253–268). Singapore: Springer Science. Kam, C., Risavy, S. D., & Perunovic, W. E. (2015). Using Over-Claiming Technique to probe social desirability ratings of personality items: A validity examination. Personality and Individual Differences, 74, 177–181. Organisation for Economic Co-operation and Development (OECD). (2014). PISA 2012 Technical report. OECD Publishing: Paris. Retrieved from https://www.oecd.org/pisa/pisaproducts/PISA-2012-technical-report-final.pdf Paulhus, D. L. (2011). Overclaiming on personality questionnaires. In M. Ziegler, C. MacCann, & R. D. Roberts (Eds.), New perspectives on faking in personality assessment (pp. 151–164). New York, US: Oxford University Press. Paulhus, D. L., Harms, P. D., Bruce, M. N., & Lysy, D. C. (2003). The over-claiming technique: measuring self-enhancement independent of ability. Journal of Personality and Social Psychology, 84(4), 890–904. Van de Vijver, F., & Tanzer, N. K. (2004). Bias and equivalence in cross-cultural assessment: An overview. Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology, 54(2), 119–135. Vonkova, H., Zamarro, G., DeBerg, V., & Hitt, C. (2015). Comparisons of student perceptions of teacherʼs performance in the classroom: Using parametric anchoring vignette methods for improving comparability. EDRE Working Paper No. 2015-01. Retrieved from http://www.uaedreform.org/downloads/2015/05/comparisons-of-student-perceptions-of-teachers-performance-in-the-classroom-using-parametric-anchoring-vignette-methods-for-improving-comparability.pdf
00. Central Events (Keynotes, EERA-Panel, EERJ Round Table, Invited Sessions)
Network 1. Continuing Professional Development: Learning for Individuals, Leaders, and Organisations
Network 2. Vocational Education and Training (VETNET)
Network 3. Curriculum Innovation
Network 4. Inclusive Education
Network 5. Children and Youth at Risk and Urban Education
Network 6. Open Learning: Media, Environments and Cultures
Network 7. Social Justice and Intercultural Education
Network 8. Research on Health Education
Network 9. Assessment, Evaluation, Testing and Measurement
Network 10. Teacher Education Research
Network 11. Educational Effectiveness and Quality Assurance
Network 12. LISnet - Library and Information Science Network
Network 13. Philosophy of Education
Network 14. Communities, Families and Schooling in Educational Research
Network 15. Research Partnerships in Education
Network 16. ICT in Education and Training
Network 17. Histories of Education
Network 18. Research in Sport Pedagogy
Network 19. Ethnography
Network 20. Research in Innovative Intercultural Learning Environments
Network 22. Research in Higher Education
Network 23. Policy Studies and Politics of Education
Network 24. Mathematics Education Research
Network 25. Research on Children's Rights in Education
Network 26. Educational Leadership
Network 27. Didactics – Learning and Teaching
The programme is updated regularly (each day in the morning)
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.