Session Information
09 SES 05 A, Issues in Measurement and Sampling in Large Scale Assessments
Paper Session
Contribution
Introduction
Almost half a century of DIF studies has not produced a set of recommendations as to how to write items with little or no bias. We do not seem to have much control over sources of item bias. New approaches to design and analysis are needed to advance the bias field. Propensity score matching has the potential to be such a procedure to shed light on item bias in cross-cultural research. Propensity score matching can be used to produce comparable sample groups by equating groups on relevant background variables. Bias detection procedures and propensity matching procedures share an important characteristic in that they look for matches in different ethnic groups/countries on the basis of some background or psychological characteristic, such as socioeconomic status or total test score. The main difference, however, is that unlike bias detection procedures, propensity matching allows for multiple background variables to be factored in at the same time and that the matching variables do not need to be derived from the target instrument that is scrutinized for bias, such as an educational achievement test, which is typically the case in DIF studies. As a consequence, propensity scoring may provide us with a better tool to control sources of item bias. We examined the impact of propensity matching by comparing DIF and the size of cross-cultural differences before and after matching on student background variables, using PISA 2012 mathematics data.
When researchers employed randomized experimental designs, the comparison groups are formed to be only randomly different on all background covariates. However, in studies comparing intact groups or nations, randomization is impossible. Matching methods using propensity scores could then be used to compose comparable samples by equating the distribution of covariates in the comparison groups (Stuart, 2010). If the pre-existing achievement differences between countries would disappear after matching, it can be concluded that the country differences in achievement can be attributed to the background differences. When used this way, propensity matching can be seen as an advanced kind of covariance analysis (Van de Vijver & Poortinga, 1997).
DIF procedures are based on matching on test score. We argue that matching on additional, potentially bias-relevant background variables would be helpful to identify sources of DIF. What we do here can be seen as a combination of a procedure called thin matching (the use of total score as the matching variable) and thick matching (forming the matching variable by pooling total score levels) (Donoghue & Allen, 1993). In this study, using exact, nearest neighbor, and optimal matching methods, PISA 2012 mathematics items were analyzed in terms of DIF for Indonesian, Turkish, Australian, and Dutch students. In the study, Indonesian students were included to represent a low achieving country, Turkish students were included to represent a below average country, Australian students were included to represent an above average country and Dutch students were included to represent a high achieving country according to results of PISA 2012. By using various types of matching methods on data of these differentially achieving countries, we aim to evaluate effects of various matching methods that use propensity score methodology to study DIF results and to understand the nature of bias in the comparison of educational achievement of these four countries. So, we examined to what extent propensity score matching methods are effective in understanding nature of bias by reducing or eliminating the bias sources in the comparison of PISA mathematics achievement and to what extent propensity score matching is able to explain cross-national differences in mathematics performance.
Method
Expected Outcomes
References
References Donoghue, J. R., & Allen, N. L. (1993). Thin versus thick matching in the Mantel-Haenszel procedure for detecting DIF. Journal of Educational and Behavioral Statistics, 18, 131-154. Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011). MatchIt: Nonparametric preprocessing for Parametric Causal Inference. Retrieved from http://r.iq.harvard.edu/docs/matchit/2.4-20/matchit.pdf. OECD (2014b). PISA 2012 Technical Report. Paris, France: OECD Publishing. Schmidt, W. H., & Maier, A. (2009). Opportunity to learn. In G. Sykes, B. Schneider, & D. N. Plank (Eds.), Handbook of education policy research (pp. 541–559). New York, NY: Rutledge for American Educational Research Association. Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21. van de Vijver, F. J. R., & Poortinga, Y. H. (1997). Towards an integrated analysis of bias in cross- cultural assessment. European Journal of Psychological Assessment, 13(1), 29-37.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.