Session Information
09 SES 08 A, Theoretical and Methodological Issues in Tests and Assessments (Part 1)
Paper Session to be continued in 09 SES 12 A
Contribution
In international comparative studies such as TIMSS or PISA, confidence that the test measures the same construct and is fair for different countries or cultures, is particularly important. At the same time, there are many possible causes of bias in such large-scale assessments: problems of translation tests in different languages, cultural features of countries, differences in curricula and teaching methods. In addition, the use of unidimensional models for scaling these results leads to the fact that a large number of items show the bias. A deep analysis of the functioning of each item allows to better understand the country's position in the international ranking and to indicate the ways to improve it.
Item bias can be detected by a procedure of differential item functioning (DIF). DIF “occurs when examinees from groups R (reference) and F(focal) have the same degree of proficiency in a certain domain, but difference rates of success on an item” (Camilli, 2006). Depending on interaction between group membership and the ability levels two classes of DIF are distinguished: uniform and non-uniform. In case when there is no interaction found it is uniform DIF, otherwise non-uniform DIF is present.
Many statistical methods for determining DIF are developed. Some of them are based on observed scores, others are based on estimation of the ability level which is obtained using the models of IRT. Some of them identify better uniform DIF, others detect more efficiently non-uniform DIF. Among these methods there is no universal one, any of them has certain advantages and disadvantages. In this paper, we consider two nonparametric methods: Mantel-Haenszel (MH) and Simultaneous Item Bias Test (SIBTEST) and also two parametric methods: Item Response Theory Likelihood Ratio (IRT-LR) and Logistic Regression (LR). MH uniform DIF detection procedure (Holland & Thayer, 1988) is based on analysis of contingency tables. The MH statistic is a chi-square that tests the null hypothesis of no DIF between the groups. With SIBTEST (Shealy & Stout, 1993), the complete latent space is viewed as multidimensional, (θ, η), where θ is the unidimensional target ability and η is the extraneous abilities. True scores for both groups are estimated using linear regression and β statistics is used to test the null hypothesis of no DIF. Both methods allow to estimate the amount of DIF and to classify DIF as negligible, moderate, or large. IRT-LR (Thissen, Steinberg & Wainer, 1988) is based on comparison of the fit to IRT models using the likelihood test statistic. It may detect DIF that arises from differential difficulty, differential relations with the construct being measured, or even differential guessing rates. LR method (Swaminathan & Rogers, 1990) by successive comparison of regression models allows to reveal a uniform and non-uniform DIF.
Although the appearance of DIF is a necessary but not sufficient condition for bias, researching it is very useful for a better understanding of the items functioning in groups and detection of possible problems.
This research had two objectives. The first task was to compare the possibilities of different methods and tools of investigating DIF. The next purpose was to analyze the DIF study results obtained by various methods for mathematical test items of TIMSS 2011 in Ukrainian group compared to those from USA and Russian Federation.
Method
Expected Outcomes
References
1. Ayala, R.J. (2009) The Theory and Practice of Item Response Theory. New York, London: The Guilford Press. 2. Camilli, G. (2006) Test fairness. In R. Brennan (Ed.), Educational measurement, (pp. 221–256). Westport, CT: ACE/ Praeger series on higher education. 3. Crocker, L., Algina, J. (1986) Introduction To Classical And Modern Test Theory. New York: Holt, Rinehart and Winston. 4. Linacre, J. (2011) A user’s guide to Winsteps. Retrieved from: http://www.winsteps.com/winman/index.htm?guide.htm. 5. Ministry of Education and Science of Ukraine, (2012) Mathematics. The curriculum for students 5 - 9 classes of secondary schools. Retrieved from: http://www.mon.gov.ua/ua/activity/education/56/692/educational_programs/ 6. Mullis, I., Martin, M., Foy, P., Arora, A. (2012) TIMSS 2011 International Results in Mathematics. Retrieved from: http://timss.bc.edu/timss2011/downloads. 7. State Standard for basic and secondary education, (2004), Mathematics in School, № 2. 8. Stout, W., Roussos, L. (1995) SIBTEST manual. Champlain, IL: University of Illinois, Department of Statistics, Statistical Laboratory for Educational and Psychological Measurement. 9. Thissen, D. (2001) IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Retrieved from: http://www.unc.edu/~dthissen/dl.html 10. Yildirim, H.H. (2006) The differential item functioning (DIF) analysis of mathematics items in the international assessment programs. A thesis submitted to the graduate school of natural and applied sciences of Middle East Technical University. Retrieved from: http://etd.lib.metu.edu.tr/upload/12607135/index. 11. Zumbo, B. D. (1999) A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from: http://www.educ.ubc.ca/faculty/zumbo/DIF/index.html.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.