Session Information
Paper Session
Contribution
Student Evaluation of Teaching (SET) is the procedure by which students evaluate and rate teaching performance. Usually, during a SET procedure students complete rating forms or questionnaires about different aspects related to their teachers, but mostly about their teaching practices. Universities or higher education institutions from all over the world implement SET procedures to achieve 3 main purposes. Generally, and from a practical point of view, the main purpose of implementing this type of procedure in most universities is the necessity of reporting SET results to quality assurance agencies. The other main goal of SET procedures, and surely the most important one from an educational perspective, is to provide feedback to academics about their teaching practices and/or to design teacher training programs focused on developing teaching skills. Another important use of SET results is related to evaluating evidence of teaching performance to use the results for academic career advancements or other ways of rewarding teaching effectiveness.
The topic of Student Evaluations of Teaching is one of the most researched ones in the domain of educational research, with over 2000 studies published in peer-reviewed journals over a period of a little more than 100 years (Spooren et al., 2017). One of the earliest debates in this field of research is about the validity of the SET scales and procedures. The main question was whether the measurement instruments applied to students during these procedures can accurately measure teaching effectiveness. Even if this debate was most active in the 1970s and the evidence was inclining more towards the affirmative answer to the question in case (see reviews from Richardson, 2005 and Marsh, 2007), a recently published meta-analysis (Uttl et al., 2017) presented some evidence which seriously threatens the validity of SET results. The results of the mentioned study strongly suggest that there is no relationship between the SET results of a teacher and the level of their students’ achievement/learning.
The existence of this relationship is vital to the SET validity debate starting from the premise that if SET results accurately reflect teaching effectiveness, then teachers identified as more effective should facilitate a higher level of learning and achievement among their students. Put simply, good teachers can help their students learn more and if SET results are valid, they should correlate with student achievement.
At the same time, several SET scales were rigorously developed from a theoretical and psychometrical point of view (e.g., SEEQ, CEQ, ETCQ). Also, there is a lot of evidence that those specific SET scales can accurately measure and offer support in developing teaching skills (Marsh, 2007; Richardson, 2005).
Starting from these, and referring to the meta-analytic results presented above, the main question that arises is whether the relationship between SET results and student learning is stronger when the utilized SET scale is more rigorously developed and validated.
Thus, the research questions that guide the present study are the following:
1. What is the average effect-size of the relationship between SET results and student achievement, in all the multi-section SET studies published to date?
2. Is the average effect-size of the relationship between SET results and student achievement different as a function of the SET measure validity evidence?
Method
To be included in the present meta-analysis a study had to pass the following inclusion criteria: (1) The study had to present correlational results between SET results and student achievement in higher education. (2) The study had to examine the relationship between SET results and student achievement in multiple sections of the same discipline. (3) Students from every section should complete the same SET and achievement measures. (4) The achievement results had to be collected through objective measures which focus on real learning, rather than students’ perception of it. (5) The correlation between SET results and student achievement had to be estimated using data averaged at the section level instead of the students’ level. The literature search was conducted by the means of three procedures. First, we analyzed the reference list of previous meta-analyses in the field. Second, we examined all the articles citing Uttl et al. (2017). And finally, we analyzed, using a search algorithm, the following databases: Academic Search Complete, Scopus, PsycINFO, and ERIC. After analyzing abstracts and reading the full text of the studies that showed promise, we identified and managed to extract 43 studies that passed the inclusion criteria described above. From each study, we extracted statistical information referring to correlation indices, the number of sections included in the study, and the number of students from the entire research sample. We also extracted information of interest about the following characteristics of the examined studies: psychometric properties of the SET measure, specific items of SET measures, type of achievement measure, and adjustment for prior achievement (where it was the case). For examining and coding the degree of available evidence for the reliability and validity of the SET measures used for gathering student responses, we adapted a specific framework of psychometric evaluation criteria, proposed by Hunsley & Mash (2008). In adapting the before-mentioned evaluative framework, we also considered the recommendations advanced by Spooren et al. (2013) in their SET validity review, by Onwuegbuzie et al. (2009) meta-validation model for assessing the score-validity of SETs, and by AERA, APA & NCME (2014) in their joint work on psychological and educational testing standards. The final criteria against which each SET measure was evaluated and coded from a psychometric perspective, are the following: (1) internal consistency, (2) inter-rater reliability; (3) test-retest reliability; (4) structural validity; and (5) relations with other variables of interest (convergent and/or predictive validity).
Expected Outcomes
Our results regarding the overall effect-size indicate a marginally statistically significant relationship between SET ratings and student achievement (r = .187, Z = 5.827, p < .058, k = 87) across all the 87 effects presented in the 43 examined studies. Results obtained by analyzing the above-mentioned 3 groups of studies indicate that the degree of reliability and structural validity of the SET measures is not statistically significantly related to the effect size reported in those studies (Q(2) = 3.960, p = .138). However, we can observe a tendency for higher effect sizes when SET measures have more evidence for reliability and validity, starting from no or bad evidence (r = .137, 95% CI [.053, .220]), increasing to some evidence (r = .215, 95% CI [.111, .315]) and increasing even more when adequate or good evidence is presented (r = .324, 95% CI [.145, .483]). This suggests that, thus far, the degree of reliability and structural validity of SET measures does not moderate the overall effect size between SET ratings and student achievement. The presented findings suggest that there is a tendency for higher associations between SET ratings and student achievement. The lack of statistical significance could come from the relatively slow number (k = 11) of effects for which we found adequate or good evidence of reliability and structural validity. Also, on a closer look, we found that the heterogeneity of effects is relatively similar inside each group based on the level of available evidence related to SET scales. This means that we have both small and large correlations between the SET ratings and student achievement inside each group, which suggests that this relation could be a function of something other than the presented evidence of the SET scale.
References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Hunsley, J., & Mash, E. J. (2008). Developing criteria for evidence-based assessment: An introduction to assessments that work. A guide to assessments that work, 2008, 3-14. Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In P.R., Pintrich & A. Zusho (Coord.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319-383). Springer, Dordrecht. Onwuegbuzie, A. J., Daniel, L. G., & Collins, K. M. (2009). A meta-validation model for assessing the score-validity of student teaching evaluations. Quality & Quantity, 43(2), 197-209. Richardson, J. T. (2005). Instruments for obtaining student feedback: A review of the literature. Assessment & evaluation in higher education, 30(4), 387-415. Spooren, P., Brockx, B., & Mortelmans, D. (2013). On the validity of student evaluation of teaching: The state of the art. Review of Educational Research, 83(4), 598-642. Spooren, P., Vandermoere, F., Vanderstraeten, R., & Pepermans, K. (2017). Exploring high impact scholarship in research on student's evaluation of teaching (SET). Educational Research Review, 22, 129-141. Uttl, B., White, C. A., & Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation, 54, 22-42.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.