Session Information
14 SES 08 C JS, The Role of Language for Mathematics and Science Achievement and its Assessment
Joint Paper Session of NW 09, NW 14 and NW 24
Contribution
Introduction
Turkey, a member of OECD, participates PISA regularly since 2003. Turkey’s performance on mathematics was below average; 423 in PISA 2003, 424 in PISA 2006, 445 in PISA 2009, 448 in PISA 2012, and 420 in PISA 2015 (MEB, 2015; MEB, 2016). Through PISA 2012, Turkey had a trend of increasing their mathematics scores, however, in PISA 2015 the average mathematics score dropped dramatically. The possible reasons of this very low score on PISA 2015 are need to be investigated. One of the reasons could be the psychometric properties of mathematics items that were used in the PISA 2015 assessment. PISA is mainly developed in English first and then adapted to other languages including Turkish (OECD, 2017). Therefore, it is necessary to evaluate whether PISA mathematics items functioned differently for Turkish and English speaking students who answered adapted items and original items, respectively. Finding an evidence for fairness of items in terms of psychometric properties could help to eliminate one of the possible reasons of sharp decrease of Turkish students’ mathematics performance in 2015.
Differential item functioning (DIF) detection methods are widely used to evaluate the fairness and equality of tests on item level in investigating the comparability of translated and/or adapted measures (Zumbo, 2007). DIF occurs and threatens the comparability of scores if students with the similar ability level on the underlying construct, mathematics ability in this study, in different groups do not have the similar probability of getting the right answers for a specific item (van de Vijver & Leung, 1997; Zumbo, 2007). Evaluating items in terms of DIF is a necessary preliminary analysis before conducting any comparative study. Otherwise, if a test contains DIF items, observed differences in scores could be related to problems based on problematic items rather than true differences in the underlying trait or ability (He & van de Vijver, 2013).
PISA items are prepared very carefully under the guidance of the experts by international team of item developers. Translatability reviews are conducted considering translation, adaptation and cultural issues (OECD, 2017). However, many researchers reported that PISA mathematics items contained DIF items (Demir & Kose, 2014; Kankaras & Moors, 2014; Lyons-Thomas, Sandilands, & Ercikan, 2014; Yildirim & Berberoglu, 2009). Yildirim and Berberoglu (2009) reported that 5 out of 21 mathematics items in PISA 2003 flagged as having DIF in comparison of Turkish and American students (3 of these items favored Turkish students). Lyons-Thomas et. al (2014) found that there were gender DIF in PISA 2009 mathematics items of students in Canada, Finland, Shanghai, and Turkey. Demir and Kose (2009) identified many DIF items in PISA 2009 mathematics assessment when they compare answers of Turkish students with German, Finish and American students. Therefore, there is a possibility that PISA 2015 mathematics items might contain DIF items that might cause a decline in Turkish students’ mathematics scores. There is not any study that investigated whether PISA 2015 items contain DIF items or not in comparison of Turkish and English speaking students.
The research questions guided this study were
(1) Is item bias present in PISA 2015 mathematics items in comparing Turkish and English students?
Is item bias present in PISA 2015 mathematics items in comparing Turkish and American students?
Method
Method Participants The data of this study were obtained from the PISA 2015 data set. This study used all Turkish, English and American students who answered mathematics items in booklets 43, 45, and 47. These booklets were selected because these three booklets together included all the items and there was no overlap of items. Therefore, students who took one of these booklets were included in the study. The participants were 491 Turkish students, 1154 English students and 448 American students. Measures PISA 2015 used cognitive items and student questionnaire to collect information about students’ mathematics performance and students’ characteristics, respectively. The present study used all items of PISA 2015 to evaluate DIF. In PISA 2015, total of 69 mathematics items were used and a student answered around 23 mathematics items. Using these items PISA aims to measure mathematical literacy level of students which is defined as capacity of students to apply acquired knowledge and skills to different problems and challenges they encounter. The mathematical processes that are measured in PISA are formulate (formulating situations mathematically), employ (employing mathematical concepts, facts, procedures and reasoning), and interpret (interpreting, applying and evaluating mathematical outcomes) (OECD, 2016). These dimensions have a hierarchical order in which interpret represents the highest cognitive process. Data Analysis In the study, three different DIF detection techniques were used. These DIF techniques were logistic regression (LR), Mantel-Haenszel (MH) and structural equation modeling (SEM) methods. As each DIF method is based on different statistical calculation method, and researchers reported that there might be low to medium coherence among DIF identification methods (Atalay, Gok, Kelecioglu & Arsan, 2012), for this study, it was decided that an item that showed DIF in at least two different methods would be considered to behave differentially across language groups. Sixty-nine mathematics items were evaluated in terms of DIF for Turkish-English and Turkish-American student groups. SPSS 22.0 programs were used to conduct logistic regression analysis. DIFAS 5.0 program was used for MH DIF detection analysis. Mplus 7.4 program was used for SEM DIF detection procedure.
Expected Outcomes
Results DIF Results In comparing answers of Turkish and English student, 9 out of 69 items were flagged as having DIF by at least two methods. Among these 9 items, 6 of them favored Turkish students whereas 3 of them favored English students. When answers of Turkish and American student were compared 10 out of 69 items were flagged as having DIF by at least two methods. Among these 10 items, 5 of them favored Turkish students whereas 4 of them favored American students. One item had non-uniform DIF. Totally, 7 items favored Turkish students whereas 4 items favored either English or American students, out of 69 items. All the DIF items were in the open response format in which students constructed the answers and then the answers were rated. Also, among 7 items that favored Turkish students 4 of them were related to formulate cognitive process which is lowest cognitive process compared to employ and interpret. There was no formulate items that favored English or American students. Effects of DIF Items to Mathematics Performance Differences In this part, the change in the effect sizes excluding all DIF items and excluding individual DIF items were reported. Between Turkish and English students, there were .51 to .93 effect size differences originally in these booklets. However, when all DIF items were excluded, effect sizes did not change dramatically, even they were either increased or remained same. Similarly, between Turkish and American students, there were .28 to .85 effect size differences originally. When all DIF items were excluded, effect sizes were very close. The evaluation of change in effect size implied that DIF items generally balanced out each other and did not create any disadvantageous results for Turkish students.
References
References Atalay, K., Gok, B., Kelecioglu, H., & Arsan, N. (2012). Comparing different differential item functioning methods: A simulation study. Hacettepe University Journal of Education, 43, 270-281. Demir, S., & Köse, İ. A. (2014) An analysis of the differential item function through Mantel- Haenszel, SIBTEST and Logistic Regression Methods. Journal of Human Sciences, 11(1), 700-714. He, J., & Van de Vijver, F. J. R. (2013). Methodological issues in cross-cultural studies in educational psychology. In G. A. D. Liem & A. B. I. Bernardo (Eds.), Advancing cross-cultural perspectives on educational psychology: A festschrift for Dennis McInerney (pp. 39-56). Charlotte, NC: Information Age Publishing. Kankaraš, M., & Moors, G. (2014). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 45(3), 381-399. Lyons-Thomas, J., Sandilands, D. D., & Ercikan, K. (2014). Gender Differential Item Functioning in Mathematics in Four International Jurisdictions. Education &Science, 39(172), 20-32. MEB (2015). PISA 2012 Araştırması Ulusal Nihai Raporu. Ankara. Retrieved from https://drive.google.com/file/d/0B2wxMX5xMcnhaGtnV2x6YWsyY2c/view MEB (2016). PISA 2015 Ulusal Raporu. Ankara. Retrieved from http://pisa.meb.gov.tr/wp- content/uploads/2016/12/PISA2015_Ulusal_Rapor1.pdf OECD (2016). PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic and FinancialLiteracy. PISA, OECD Publishing, Paris. doi:10.1787/9789264255425-en OECD (2017). PISA 2015 Technical Report. Paris: OECD Publishing. Retrieved from http://www.oecd.org/pisa/data/2015-technical-report/ Van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of comparative research. Thousand Oaks, CA: Sage. Yildirim, H. H., & Berberoĝlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9(2), 108-121. Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language assessment quarterly, 4(2), 223-233.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.