Selection Bias and Teachers’ Grading Beliefs: An Accidental Discovery of a Hidden Teacher Belief in Grading

Author(s):

Vincent Schatz(presenting / submitting)Roman Zviagintsev Nele Kampa

Conference:

ECER 2025

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Ignite Talk (20 slides in 5 minutes)

Session Information

09 SES 09 B, Ignite Talk Session

Ignite Talk Session

Time:

2025-09-11

09:00-10:30

Room:

011 | Faculty of Philology | Gr. Fl

Chair:

Eugenio Gonzalez

Contribution

Teachers’ grading practices are a central topic in educational research, especially when evaluating their role in high-stakes assessments such as university entrance qualifications. A persistent research interest in this field is the discrepancy between grades assigned by teachers and scores achieved in standardized tests (Doz, 2023). Grades awarded by teachers often reflect a broader range of criteria than those included in standardized tests (Willingham et al., 2002), incorporating, for example, nonacademic achievement factors such as effort, attendance, and behavior alongside students’ academic performance (Brookhart et al., 2016).

In contrast to standardized tests, teacher grades are discussed as tied to teachers’ grading beliefs, which are recognized as important internal factors in teachers’ grading decisions (McMillan, 2003). These beliefs, which include (among others) views about the purpose of grades and the weight given to different nonacademic achievement factors of grades (Bonner & Chen, 2021), may play an even more decisive role in shaping grading practices than formal grading policies (McMillan, 2003). However, the extent to which such beliefs translate into observable grading practices remains controversial, with mixed results reported in the literature (Fives et al., 2012).

The four key teachers’ beliefs pull, manage, effort, and lenient (Bonner & Chen, 2021), are crucial for our focus on nonacademic factors of grading. Pull refers to a success orientation in grading, with teachers omitting the students’ lowest test results to elevate their grades. Manage highlights the utilization of grades as a tool for classroom management and discipline. Effort describes the consideration of the effort and hard work that students put into their tasks when assigning grades. With lenient, Bonner and Chen (2021) add a factor that does not have a specific educational goal; it instead describes a generally lenient approach in grading.

Despite a growing body of research on the effects on the teacher level (Coenen et al., 2018), less attention has been paid to contextual factors in explaining grade differences. Previous studies suggest that school type significantly predicts differences between teacher grades and standardized test scores (Doz, 2023; Watermann et al., 2013). In the context of Austria, where the current study is situated, school type has also been identified as an important predictor for grade discrepancies in the university entrance protocol (Author et al., in review).

Based on studies arguing that teachers’ beliefs and behavior are influenced by shared cultural beliefs at the systemic level (Fuller & Izu, 1986), our study examined how teachers’ grading beliefs contributed to these discrepancies across school types. In particular, the study investigated teachers’ grading beliefs about nonacademic achievement factors of grades and their predictive power for teachers’ tendency to assign higher classroom grades compared to standardized test scores. The focus was on differences between individual teachers and differences between two types of upper secondary schools (academic and vocational schools).

Given that teachers’ beliefs are not necessarily reflected in their actual practice (Fives & Buehl, 2012), we first investigated the extent to which differences between the two school types emerge regarding teachers’ beliefs about the use of nonacademic achievement factors in grades. Second, we aimed to examine the extent to which these grading beliefs are associated with the actual differences in grading behavior observed between teachers from the two school types.

Hence, our study aimed to answer two research questions:

RQ 1 How do the two Austrian school types (academic and vocational schools) differ regarding teachers’ grading beliefs on nonacademic achievement factors of grades?

RQ 2 To what extent do teachers’ grading beliefs explain the differences in grading behavior between teachers at academic and vocational schools?

Method

The sample consisted of matched grade pairs (classroom grades and standardized test scores) from a total amount of 8,581 students in three academic and four vocational schools in Vienna, Austria. These grades (both classroom grades and standardized test scores) were awarded by 154 teachers over seven years from 2017 to 2023 in three subjects German, mathematics, and English as a foreign language. In 2023, those same teachers were asked to respond to a questionnaire assessing their beliefs about the nonacademic achievement factors pull, manage, effort, and lenient (Bonner & Chen, 2021) described earlier. Out of the 154 teachers invited to participate, 36% (n = 56) responded to the questionnaire. To answer the first research question, a multiple analysis of variance (MANOVA) was conducted to investigate the association between school type (independent variable) and teachers’ grading beliefs (dependent variables). To answer the second research question, an analysis of variance (ANOVA) was planned to determine the association between teachers’ grading beliefs (independent variables) and their tendency to award higher classroom grades compared to standardized test scores (as dependent variable, Cohen’s d was used as a standardized effect size measure of the discrepancy between classroom grades and standardized test scores on the teacher level). Before that, however, we conducted a robustness check via ANOVA to test whether the teachers who responded to the questionnaire were random with respect to our research question regarding the relationship between school type and teachers’ grading behavior. The results of this robustness check showed a statistically significant effect (p = < .001) of the interaction between school type and the binary response variable (missing / not missing), indicating the presence of selection bias. Teachers who responded to our questionnaire on teacher beliefs showed significantly different grading behavior from teachers who did not respond (p = 0.005). Furthermore, among the non-respondents, teachers from the different school types showed grading behaviors that were opposite to each other (academic school teachers who did not respond showed a tendency to be biased towards higher classroom grading, whereas BHS nonrespondents awarded significantly lower classroom grades compared to standardized test scores); and those who responded, on the contrary, showed rather similar grading behavior regardless of school type. It became evident that our originally planned models for the second research question would not provide valid results. Our investigations, however, accidentally yielded intriguing results about the phenomenon of teachers’ grading beliefs.

Expected Outcomes

Regarding the first research question, MANOVA results showed that teachers in academic schools are significantly more likely than teachers in vocational schools to account for students’ effort when assigning grades (ptukey = < .001), with a considerable effect size of Cohen’s d = 0.863. The selection bias detected in the pre-analysis for our second research question showed that we may have accidentally discovered another teacher belief related to grading. However, we can only speculate about the specific content of this belief. One explanation is that teachers who are transparent in their grading (as indicated by the consistency of their awarded grades) may also be transparent in communicating their grading beliefs. Another possible explanation is that some teachers may feel it as their duty as “reliable” teachers to respond to a questionnaire when asked by their principals; a “reliability” which would also be reflected in their awarded grades. A third possibility is that teachers who award similar grades feel confident about being questioned on their grading because their grades are “by the book” and therefore confer a sense of validity. The latter interpretation is supported by informal feedback we received from some teachers during the data collection. Despite the official approval of the Ethics Commission and their anonymity, some teachers reported fear of being controlled by the Austrian Ministry. They expressed concerns about possible implications of their responses, particularly in terms of perceived control over their accuracy in grading. We hope to discuss any other possible explanations during the conference. The findings of our study have implications for researchers exploring sensitive topics such as grading. We showed that it is important to exercise caution in such contexts, as there is a possibility of selection bias. Future studies should investigate this phenomenon in more detail.

References

Bonner, S. M., & Chen, P. P. (2021). Development and Validation of the Survey of Unorthodox Grading Beliefs for Teachers and Teacher Candidates. Journal of Psychoeducational Assessment, 39(6), 746–760. https://doi.org/10.1177/07342829211015462 Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H [James H.], Smith, J. K., Smith, L. F., Stevens, M. T., & Welsh, M. E. (2016). A Century of Grading Research: Meaning and Value in the Most Common Educational Measure. Review of Educational Research, 86(4), 803-348. https://doi.org/10.3102/0034654316672069 Coenen, J., Cornelisz, I., Groot, W., van den Maassen Brink, H., & van Klaveren, C. (2018). Teacher Characteristics and their Effects on Student Test Scores: A systematic Review. Journal of Economic Surveys, 32(3), 848–877. https://doi.org/10.1111/joes.12210 Doz, D. (2023). Factors influencing teachers’ grading standards in mathematics. Oxford Review of Education, 49(6), 819–837. https://doi.org/10.1080/03054985.2023.2185217 Fives, H., Buehl, M. M., Zeidner, M., Graham, S [Steve], Urdan, T., Royer, J. M., Graham, S [Sandra], & Harris, K. R [Karen R.]. (2012). Spring cleaning for the “messy” construct of teachers’ beliefs: What are they? Which have been examined? What can they tell us? In K. R. Harris (Ed.), APA educational psychology handbook: Vol 2: Individual differences and cultural and contextual factors (pp. 471–499). https://doi.org/10.1037/13274-019 Fuller, B., & Izu, J. A. (1986). Explaining School Cohesion: What Shapes the Organizational Beliefs of Teachers? American Journal of Education, 94(4), 501–535. https://www.jstor.org/stable/1085339 McMillan, J. H [J. H.] (2003). Understanding and Improving Teachers' Classroom Assessment Decision Making: Implications for Theory and Practice. Educational Measurement: Issues and Practice, 22(4), 34–43. https://doi.org/10.1111/j.1745-3992.2003.tb00142.x Watermann, R., Nagy, G., & Köller, O. (2013). Mathematikleistungen in allgemein bildenden und beruflichen Gymnasien. In O. Köller, R. Watermann, U. Trautwein, & O. Lüdtke (Eds.), Wege zur Hochschulreife in Baden-Württemberg: TOSCA—Eine Untersuchung an allgemein bildenden und beruflichen Gymnasien (pp. 205–283). Springer-Verlag. Willingham, W. W., Pollack, J. M., & Lewis, C. (2002). Grades and Test Scores: Accounting for Observed Differences. Journal of Educational Measurement, 39(1), 1–37. https://doi.org/10.1111/j.1745-3984.2002.tb01133.x

Author Information

Vincent Schatz (presenting / submitting)

University of Vienna

Wien

Roman Zviagintsev

University of Vienna

Vienna

Nele Kampa

University of Vienna, Austria

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.