24 SES 13 B JS, Assessing Mathematics Achievement
Joint Paper Session NW 09 and NW 24
Every year, the Italian National Evaluation System entrust the administration of large scale tests to INVALSI. The data collected and analysis of results by INVALSI highlight macro-scale phenomena. Specifically, regarding the standardized mathematics tests, students’ answers to some of the test items reveal behavioural attitudes that allow us to understand more in depth peculiarities of teaching and learning process and some causes of difficulty nationwide.
In this study we analyse results of INVALSI mathematical test (in different content areas from different scholastic levels) integrating quantitative analysis based on the Rasch Model and didactical interpretation. We use statistical methods to analyse the trend of each answer as function of the students’ math ability and we focus on specific items in which a wrong answer results particularly popular among medium/high level students and analyse this particular trend with the lenses of math education theories. The study reveals that these phenomena are particularly related to implicit and explicit rules governing classroom practices exist at all school levels and regard different mathematical content and skills.
Such phenomena are often linked to mistaken knowledge or the result of bad teaching practices; this aspect then is not connected to absence of knowledge or non-participation in classroom activity, common markers of low-level performance in the tests. The close link that these constructs have with classroom practices would appear to confirm the statistical data: further analysis of other types of items covering a wider range of knowledge and mathematical skills could confirm these initial results.
Classic Test Theory (CTT) offers important statistical tools for the assessment of tests (Barbaranelli & Natali, 2005), the analytical studies presented in this paper are mainly based on the more modern Item Response Theory. This latter solution makes use of various mathematical models to measure latent variables and allows us to overcome the principal limitations of CTT, such as the dependence between estimated student ability and item difficulty. In this context, we will consider the simplest IRT model: the Rasch model (Rasch, 1960) that is a one-parameter logistic model, and thus the simplest of the IRT models. It allows us to calculate the probability of correct response to a determined item, according to the ability of the student and the psychometric characteristics of the item itself (particularly, the item’s difficulty). From a strictly statistical point of view, it could be expected then that a higher level of student ability correlates with a higher percentage of correct answers for an item and, simultaneously, a lower percentage of wrong answers. The percentage of wrong answers given always decrease with the students’ ability but for some items it is possible to see answers’ trends which are not strictly decreasing. we call this phenomenon “humped performance”. Analysis of this phenomenon is complex as various interactive factors come into play: students with varying levels of ability may encounter different obstacles when faced with a task, supply wrong answers for different reasons, and favour one wrong answer over another as a result of different approaches and problems. In this research, results will be analysed from various school levels (from primary to high school) which display good measurement properties and in which at least one option of response demonstrates a “humped performance” that may be linked to teaching factors. In particular, in the following examples, one of the main constructs that can supply a key to reading statistical results of this type at a systemic level is the didactic contract (Brousseau, 1988; EMS-EC, 2012).
This study analysed some INVALSI test tasks by tackling different content areas from different scholastic levels. From a statistical point of view, all the items analysed display good statistical features and are coherent with the Rasch model used for the test analysis. In analysing the distractor plots of all the items, it may be noted that in each there is at least one distractor curve that displays a “humped performance”. A qualitative analysis of the items reveals that this particular statistical feature may be traced back to other well-known phenomena in mathematics education research, which are closely linked to classroom practices and the discipline’s character. The examples reported were analysed through the lens of mathematics education, and results emerged that point to implicit and explicit rules established in the classroom, especially regarding the didactic contract. The parallel between statistical analyses and didactic interpretation of the items allows us to verify the existence of the didactic contract and measure its effects; by analysing the distractor plots it is possible to identify which ability levels are most influenced by these phenomena. In particular, it can be seen that the effects result more evident regarding medium-ability level students. This initial study reveals that, regarding the items analysed, the effects of the didactic contract seem to affect particularly students of medium-level ability as opposed to other ability levels: the “humped performance” of the options displaying the phenomena under analysis may be due to the fact that students of low-level ability are not very keen on didactic practices, whilst better students manage to overcome the obstacles facing them thanks to their bond with the didactic method and their teacher.
Barbaranelli, C., & Natali, E. (2005). I test psicologici: teorie e modelli psicometrici. Carocci. Brousseau, G. (1988). Le contrat didactique: le milieu. Recherches en Didactique des Mathématiques, 9 (3), 309-336. EMS-EC (Education Committee of the EMS) (2012). What are the Reciprocal Expectations between Teacher and Students? Solid Findings in Mathematics Education on Didactical Contract. Newsletter of the European Mathematical Society, 84, 53-55. Rasch G. (1960), Probabilistic Models for Some Intelligence and Attainment Tests, Danmarks Paedagogiske Institut, Copenhagen.
00. Central Events (Keynotes, EERA-Panel, EERJ Round Table, Invited Sessions)
Network 1. Continuing Professional Development: Learning for Individuals, Leaders, and Organisations
Network 2. Vocational Education and Training (VETNET)
Network 3. Curriculum Innovation
Network 4. Inclusive Education
Network 5. Children and Youth at Risk and Urban Education
Network 6. Open Learning: Media, Environments and Cultures
Network 7. Social Justice and Intercultural Education
Network 8. Research on Health Education
Network 9. Assessment, Evaluation, Testing and Measurement
Network 10. Teacher Education Research
Network 11. Educational Effectiveness and Quality Assurance
Network 12. LISnet - Library and Information Science Network
Network 13. Philosophy of Education
Network 14. Communities, Families and Schooling in Educational Research
Network 15. Research Partnerships in Education
Network 16. ICT in Education and Training
Network 17. Histories of Education
Network 18. Research in Sport Pedagogy
Network 19. Ethnography
Network 20. Research in Innovative Intercultural Learning Environments
Network 22. Research in Higher Education
Network 23. Policy Studies and Politics of Education
Network 24. Mathematics Education Research
Network 25. Research on Children's Rights in Education
Network 26. Educational Leadership
Network 27. Didactics – Learning and Teaching
The programme is updated regularly (each day in the morning)
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.