Session Information
09 SES 14 A, Exploring Factors Influencing Teaching Quality and Student Learning Outcomes
Paper Session
Contribution
Teaching quality has been researched extensively in the past years with a high number of empirical studies in educational sciences and psychology. To better understand how learning develops in the classroom, scholars are concerned with the reliable and valid measurement of teaching quality. In doing so, Helmke (2012) considers classroom observation as the “gold standard” amongst other ways of capturing teaching quality (e.g., student ratings in large-scale assessment) because of its direct assessment of teaching practices. However, classroom observation also draws on resources and can be prone to many sources of measurement error. Therefore, when performing classroom observation for any purpose (i.e., research, practical, policy) it is important to consider how to allocate (limited) resources such that high score reliability and valid conclusions about teaching quality are ensured.
Studies suggests that changing the presentation order of lesson segments could particularly affect score reliability (e.g., Mashburn et al., 2014). For instance, using the generic CLASS-Secondary observation system (Pianta et al., 2008), Mashburn et al. (2014) found that 20-minute lesson segments presented in a random order to raters achieved the best combination of reliability and predictive validity. In the present study, we used a different, hybrid observation system (i.e., comprising both generic and subject-specific aspects of teaching quality, Charalambous & Praetorius, 2018) that was first developed to capture teaching quality in German secondary mathematics classrooms, and that draws on the Three Basic Dimensions of teaching quality (classroom management, student support, and (potential for) cognitive activation, e.g., Klieme et al., 2009). The three basic dimensions have been shown to positively relate to students’ achievement in mathematics classrooms across several studies and various operationalizations (e.g., Baumert et al., 2010; for an overview see Praetorius et al., 2018).
Classroom management refers to teachers’ procedures and strategies that enable efficient use of time (time on task), as well as behavioral management (Kounin, 1970). Student support draws on self-determination theory (Deci & Ryan, 1985) and aims at both motivational and emotional support, as well as individualization and differentiation. Cognitive activation, finally, addresses opportunities for "high-order thinking" from a socio-constructivist perspective on teaching and learning (e.g., problem solving, Mayer, 2004).
Empirical evidence suggests that generic and subject-specific measures of teaching quality generate moderately correlated, but still unique information about classrooms (Kane & Staiger, 2012). Evaluating this finding, Charalambous and Praetorius (2018) conclude that subject-specific and generic measures together could explain more variance in student learning in mathematics than generic measures alone. Since subject-specificity might be considered a continuum rather than a binary characteristic, they argue that it could be meaningful for scholars to develop hybrid frameworks of teaching quality, which take both perspectives into account (i.e., generic and subject-specific, see also Charalambous & Praetorius, 2018).
The purpose of the present study is twofold: First, we aim at investigating the effect of presentation order on score reliability in two subjects. Second, we explore an optimal design for the implementation of our observation system in terms of score reliability. Towards this end, we assigned four trained raters to rate videotaped Norwegian mathematics and science lessons either in sequential 20-minute segments, or two nonsequential 20-minute segments.
Method
Data was obtained from schools from the Oslo metropolitan area in Norway, with teachers conveniently participating in the study. In total, 15 classrooms were sampled, and from each classroom one through six lessons are available that were videotaped over the course of several weeks. The length of the lessons varied between 24 and 106 minutes, and they were cut into 20-minute segments for analysis. For the purpose of this study, two segments from every mathematics classroom and two segments from every science classroom were analyzed, and the segments were scored under both study conditions (i.e., sequential and nonsequential). We applied the observation system from the Teacher Education and Development Study–Instruct (TEDS-Instruct, e.g., Schlesinger et al., 2018). Consequently, the framework and corresponding instrument involved four teaching quality dimensions with four to six items each that also used different indicators for mathematics and science classrooms. Raters were trained extensively over the course of one week by studying the rating manual, conducting video observations, and discussing the results with master raters. However, no benchmarks were applied. All raters were student teachers in mathematics and science programs, and they were at least in their fourth year. To analyze the effect of presentation order on score reliability, we designed our study as follows. For each lesson, we randomly assigned one rater to the sequential condition. The rater would then score both segments of this lesson. This condition is referred to as the static condition. At the same time, two different raters were assigned to the nonsequential condition. We had these raters randomly score either the first or the second segment of a lesson. This we refer to as the switching condition. Using this experimental design, raters were balanced across subjects and conditions. Since in this study we only analyzed one lesson for each teacher-subject combination, raters would not score the same teacher or classroom twice within the same condition or subject. However, there was a chance that raters could encounter the same teacher in a different subject. We applied Generalizability theory (GT, Cronbach et al., 1972) to estimate measurement error and reliability in our study. GT was developed specifically for complex measurement situations with many potential sources of error, such as classrooms, lessons, or raters. GT makes use of the linear mixed model to estimate variance components for each measurement facet of interest (G Study).
Expected Outcomes
Our results show that, overall, presentation order had little impact on score reliability. In more detail, score reliability was high for science lessons in both conditions, and acceptable for two out of four teaching quality dimensions in mathematics with slightly better results for the static condition. A low share of lesson variance and a relatively high share of within-lesson variation was found for cognitive activation. Correlation analysis and mean comparisons revealed no meaningful differences between conditions. Our results could be depended on the fact that we only sampled one lesson per classroom. Other studies show that particularly subject-specific aspects of teaching quality vary severely over time (e.g., Praetorius et al., 2014). However, we did not encounter similar issues in science classrooms, which suggests that (1) teaching quality in science and mathematics lessons varies on different time scales, (2) the observation system functions differently in mathematics and science lessons, or (3) raters have applied the measure differently between subjects.
References
Baumert, J., Kunter, M., Blum, W., Brunner, M., Voss, T., Jordan, A., . . . Tsai, Y.-M. (2010). Teachers' mathematical knowledge, cognitive activation in the classroom, and sudent progress. American Educational Research Journal, 47(1), 133–180. Charalambous, C., & Praetorius, A.-K. (2018). Studying Instructional Quality in Mathematics through Different Lenses: In Search of Common Ground. ZDM Mathematics Education, 50, 535-553. Cronbach, L. J., Glaser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. John Wiley. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Perspectives in social psychology. Plenum. Helmke, A. (2012). Unterrichtsqualität und Lehrerprofessionalität: Diagnose, Evaluation und Verbesserung des Unterrichts. Klett-Kallmeyer. Kane, T. J., & Staiger, D. O. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Bill & Melinda Gates Foundation. Klieme, E., Lipowsky, F., Rakoczy, K., & Ratzka, N. (2006). Qualitätsdimensionen und Wirksamkeit von Mathematikunterricht: Theoretische Grundlagen und ausgewählte Ergebnisse des Projekts "Pythagoras". In M. Prenzel & L. Allolio-Näcke (Eds.), Untersuchungen zur Bildungsqualität von Schule. Abschlussbericht des DFG-Schwerpunktprogramms (pp. 127-146). Waxmann. Kounin, J. S. (1970). Discipline and group management in classrooms. Holt, Rinehart & Winston. Mashburn, A. J., Meyer, J. P., Allen, J. P., & Pianta, R. C. (2014). The effect of observation length and presentation order on the reliability and validity of an observational measure of teaching quality. Educational and Psychological Measurement, 74(3), 400-422. Mayer, R. E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist, 59(1), 14–19. Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System™: Manual K-3. Paul H. Brookes Publishing Co. Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of Three Basic Dimensions. ZDM Mathematics Education, 50, 407-426. Schlesinger, L., Jentsch, A., Kaiser, G., König, J., & Blömeke, S. (2018). Subject-specific characteristics of instructional quality in mathematics education. ZDM Mathematics Education, 50, 475-491. Shavelson, R. J., & Webb, N. M. (1991). Generalizability Theory: A Primer. SAGE Publications.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.