A Model for the comparability of local Educational System Assessment and results of Program for International School Assessment (PISA)

Author(s):

Eva Expósito Casas(presenting / submitting)Esther López-Martín(presenting)Jose Luis Gaviria Soto Coral González Barbera

Conference:

ECER 2012

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 02 C, Issues in Test Development and Data Modeling

Parallel Paper Session

Time:

2012-09-18

15:15-16:45

Room:

FCT - Seminario 2

Chair:

Tobias C. Stubbe

Contribution

The interpretation of the results from different assessments of an educational system comes with the problem of the comparability of scores. This is why the information provided by autonomic, national or international assessment programs can, in some cases, show important discrepancies. Under these circumstances, the questions are: Are the results of these assessments comparable? Is it possible to improve their interpretation? How can we benefit from the information of these assessments? The usual answer implies an equating design involving anchor items. But when one of the assessments affects the whole population of students, it’s extremely hard to warrant the due security of the items used as anchor. So, is it possible to relate the scale of such an assessment program to the scale of a well known program as PISA, which is tested on a sample of the population?
In order to answer these questions this paper introduces a procedure for comparing results from different evaluations. More specifically, it has been proposed a model to compare the results of the Diagnostic Evaluation of Educational System, carried out in the Region of Madrid (Spain), with the information provided by the OECD Program for International Student Assessment (PISA).
According to the Organic Law of Education (2/2006, 3rd of may), the General Subdirection of Assessment and analysis of the Community of Madrid (Spain) (BOCM 31st of august) is responsible of the diffusion, the application and the correction of the tools in the Diagnostic Evaluation of this region and, for this purpose, it has been methodologically advised by the Education System Measurement and Assessment Group (Grupo MESE) of the Complutense University of Madrid. This assessment is intended to students of 4th year of Primary Education (EP) and 2nd year of Compulsory Secondary Education (ESO) of public, semi-private and private schools.
During the academic year of 2009/10, tests used in the program International Schools´ Assessment – ISA, designed and launched by the Australian Council for Educational Research (ACER) in 2001 - were adapted to the Spanish educational system (ESP-ISA tests). ISA is an international assessment program for students in schools over the world in Grades 3 to 10 (9 to 16 years old). It shares the theoretical framework of the PISA program. ESP-ISA tests were constructed from a pool of ISA items, linked to PISA tests. Additionally, ESP-ISA tests included released PISA items. This way, two sets of item parameters were involved, those from English version of ISA items, and those from the Spanish version of released PISA items. Ideally, fixing these parameters would allow to estimate the score for each individual based on their answers to the whole test. But to refine the process, a procedure was implemented to select, as fixing items only those showing invariance when a free calibration took place.

Method

The Rasch model (1960), was used in the equating process. The steps followed were: Horizontal equating.: Scores obtained by students of 2nd ESO were calibrated with the ISA items. This allows to compare ESP-ISA scores with PISA scores. Firstly, because ISA items are aligned with PISA items and, secondly, because 8 items applied in PISA test were included in ESP-ISA tests. With the goal of selecting the anchor items, ESP-ISA items were calibrated and, the parameters obtained in this calibration were compared with the parameters of ISA items. Thereafter, we calculated a Pearson correlation between both parameters' series for each item, assuming that item was eliminated. Afterward, we eliminated items with the highest correlation until the correlation between parameters was acceptable. Vertical scaling. PISA study does not evaluate students of Primary Education, so, to put on the same scale results of both courses, a vertical scaling process was included. Two students of 2º ESO were selected randomly in each school, to answer the most difficult items of math and reading comprehension of the 4th EP tests. Simultaneously a mixed form, composed by items from 4th EP and 2nd ESO was administered to a small sample of students of 6th EP.

Expected Outcomes

It have been possible to observe a high consistency between the ESP-ISA parameters and ISA parameters. In mathematics, we obtained a correlation coefficient of 0.951, with a 90.43% of shared variance. In reading comprehension, the correlation was equal to 0.973, and the shared variance was 0,947%. In both cases, theses high values allow confidence in the scaling results. We can also assert that the cognitive processes underlying ISA test and ESP-ISA test are basically the same, regardless of population that answers the items. Given a single instrument differences in the overall difficulty level reflect different capacities between students who answered the test. But potential deviations of the estimates parameters from the general pattern could mean contextual interactions that would seriously affect the interpretability of the results. However, this work demonstrates that the equating process used in the Diagnostic Evaluation of the Region of Madrid was correct and reliable, providing results very consistent with the information coming from independent sources as important as the provided by the PISA program, by allowing you to put the scores on the same scale and to compare results.

References

Baker, F. B. (1992). Item response theory : Parameter estimation techniques. New York: M. Dekker. Decreto 118/2007, de 2 de agosto, del Consejo de Gobierno de la Comunidad de Madrid, por el que se establece la estructura orgánica de la Consejería de Educación. BOCM núm. 207 (31 de agosto de 2007), 4-10. Embretson, S. E. (Ed.). (2000). Item response theory for psychologists. Mahwah, New Jersey: Lawrence Erlbaum associates. Hambleton, R. K. (Ed.). (1991). Fundamentals of item response theory. Newbury Park, California: Sage. Kolen, M. J., & Brennan, R. L. (1995). Test equating: Methods and practices. New York: Springer-Verlag. Ley Orgánica 2/2006, de 3 de mayo, de Educación. BOE num. 106 (4 de mayo 2006) 17158-17207. Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Copenhagen: Danish Institute for Educational Research. Wim J.van der Linden, Ronald K.Hambleton (Ed.). (1997). Handbook of modern item response theory. New York: Springer. Wu, M. L., Adams, R. J., Wilson, M. R., & Haldane, S. A. (2007). ACERConQuest Version 2.0: generalised item response modelling software. Victoria: ACER Press.

Author Information

Eva Expósito Casas (presenting / submitting)

National University of Distance Education, Spain

Esther López-Martín (presenting)

National University of Distance Education

Methods of Research and Diagnosis in Education

MADRID

Jose Luis Gaviria Soto

Complutense University of Madrid, Spain

Coral González Barbera