Competence Assessment Interviewer Effects in a Large-scale Educational Survey: a Replication Using NEPS Data

Author(s):

Andre Pirralha(presenting / submitting)Laura Löwe(presenting)

Conference:

ECER 2023

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 16 B, Exploring Methodological Advances in Educational Research and Assessment

Paper Session

Time:

2023-08-25

13:30-15:00

Room:

Gilbert Scott, 253 [Floor 2]

Chair:

Erika Majoros

Contribution

Large-scale educational studies are an important resource to inform policymakers and the general public about the reach and effectiveness of diverse aspects of educational systems in several countries. Competence assessment in institutional settings (e.g. schools) has been an essential factor to collect valid measurements of cognitive abilities or motivations, for example. In order to conduct the assessment sessions, a significant number of test administrators (TAs) are necessary to supervise and coordinate test groups in the participating schools. The TAs undergo specific training and follow a strict protocol to ensure that competence assessment sessions are standardized and comparable so that student achievement data can be meaningfully collected. The TA characteristics can affect the quality of assessment scores and survey data. Differences in their behavior can originate interviewer effects, systematically impacting the validity and comparability of competence assessment tests. While there has been a recent effort to change competence assessment testing to computer-assisted modes of data collection, there is very little research aimed to uncover whether the training sessions and protocols are effectively delivering the goal of preventing TA effects in the first place.

In this paper, we explore the presence and magnitude of interviewer effects on paper-and-pencil competence assessments for mathematics abilities and survey questions in a German nationally representative longitudinal educational survey (National Educational Panel Study - NEPS). For this purpose, we will replicate the Lüdtke et al. (2007) paper, to date the only empirical investigation of TAs interviewer effects we are aware of. Multilevel analyses for cross-classified data are taken to effect to decompose the variance associated with differences between schools and the variance associated with TAs. The results are of use to improve competence assessment testing procedures, particularly by unveiling whether interviewer training and protocols should be improved and to assess the existence and magnitude of interviewer effects in test assessment sessions under paper and pencil-based modes of data collection.

Method

To effectively study test administrator effects in educational assessments, it is necessary to have a cross-classified data structure. If one test administrator conducts the assessment in each school and does not conduct assessments in any other schools, it is not possible to distinguish test administrator effects from school effects – they are inseparably confounded. Therefore, a prerequisite for separating test administrator effects from school effects is having at least two test administrators administering the assessment to separate groups of students in each school, with the students being randomly assigned to these groups. There is even greater potential to disentangle test administrator and school effects when test administrators conduct assessments in different schools. We follow Lüdtke et al. (2007) statistical procedure. We estimate a cross-classified multi-level model with Markov Chain Monte Carlo (MCMC) estimators.

Expected Outcomes

Overall, like the original Lüdtke et al. (2007) paper we are replicating, the analysis found that a significant proportion of the variance in mathematics achievement and response behavior was at the school level, but much of this variance was explained by the type of school. In contrast, there were no differences in mathematics achievement or response behavior at the test administrator level. The results of the present study suggest that the procedures used to train test administrators and standardize test administration, which are largely the same procedures used in other large-scale assessment studies (e.g. PISA), were successful in ensuring that the tests were administered consistently to all student groups. This is a reassuring finding given the importance often placed on the outcomes of these kinds of assessments.

References

Blossfeld, H.-P. & Roßbach, H.-G. (Eds.). (2019). Education as a lifelong process: The German National Educational Panel Study (NEPS). Edition ZfE (2nd ed.). Springer VS. Lüdtke, O., Robitzsch, A., Trautwein, U., Kreuter, F., & Ihme, J. M. (2007). Are there test administrator effects in large-scale educational assessments? Using cross-classified multilevel analysis to probe for effects on mathematics achievement and sample attrition. Methodology, 3(4), 149–159. https://doi.org/10.1027/1614-2241.3.4.149 PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving | en | OECD. (n.d.). Retrieved December 20, 2022, from https://www.oecd.org/education/pisa-2015-assessment-and-analytical-framework-9789264281820-en.htm

Author Information

Andre Pirralha (presenting / submitting)

LIfBi

Center for Study Management

Bamberg

Laura Löwe (presenting)

Leibniz-Institut für Bildungsverläufe

Zentrum für Studienmanagement