Session Information
09 SES 09 A, Assessing Student Performance
Paper Session
Contribution
In Germany, there are only a few written tests within a school year, but having only a few tests poses performance difficulties (Eriksson, 2009). For example, the heterogeneity of students is only unsatisfactorily considered this way, since not all students can show their actual performance on written tests equally well (Böhm, 2008; Avenarius & Hanschmann, 2019). To counter this, medium- and short-term assessments, which are embedded in lessons using a wide range of performance types, can be used (Clarke, 2008).
While long-term assessments focus on learning-outcomes aggregated over a long period of time and hence are seldom, medium-term assessment consider weeks or month and short-term assessments are made in day-to-day lessons “on the fly” (Shavelson et al., 2008).
The last two types of assessment are indispensable for an educational diagnostic that promotes individual advancement (Clarke, 2008). To ensure this, all German school laws state that, in addition to written tests, other performances must be assessed (e.g. SchulG, 2005). Assessments arising from the teaching-learning process itself cannot be practicably standardized the same way as long-term assessments, which raises the question of how to ensure the quality of the diagnostic process (Clarke, 2008).
Particularly in recent years, many models describing teacher diagnostic competences have been presented, most of them referring only to the output and not to the whole diagnostic process (Förster & Karst, 2017). The four-component-model of van Ophuysen and Behrmann attempts to represent the entire diagnostic process (van Ophuysen & Behrmann, 2015) of teachers in Germany. The four components of the model are: Information acquisition, data/information, information processing, and evaluation. (1) Information acquisition includes e.g., assessment criteria and variety of methods and documentation. (2) The Data then should be as differentiated and valid as possible. (3) The following information processing should be flexible and revisable, as well as unbiased and resource efficient. Above all, (4) the final evaluation should be accurate, transparent, and fair. As with all diagnostic models, it should be noted that this model of educational diagnostic involves much more than just performance assessment. Thus, only parts of the model most relevant for performance assessment are presented below (performance types, documentation, final evaluation).
There are many studies looking at the evaluation of written long-term assessments, but only a few studies deal with the assessment of other performance types. These few studies focus mostly on oral participation in class or presentations, are often over twenty years old, and scientifically not satisfactory. In Germany, the type of performance with the highest influence on grades is the oral participation in class discussions (Krieger, 2003; Kirk, 2004). Studies indicate that there might be differences between STEM and non-STEM subjects, but the data are far from robust enough for definitive statements (Krieger, 2003; Kirk, 2004). Most German teachers evaluate their student’s oral participation either daily or at least once a week (Krieger, 2003; Kirk, 2004). Less frequent types of performance, such as verbal presentations, are used rarely, but are occasionally assessed in more differentiated ways than just using symbols (+/0/-) like in competence grids (Breidenstein, 2018). Due to the requirements of school legislation, the otherperformances usually account for 50% of the overall grade with differences between subjects (Langhammer, 1997).
To approach the research subject the following questions will be answered:
- What kind of information about students learning progress and outcome do teachers assess in addition to written tests? Are there differences between subjects?
- How do they document the variety of assessed performances?
- What weight do these performances have in the school report? Are there any discernible differences among the subjects?
Method
To examine the research questions, data were collected in Germany, where various student performances (described by the term other performances) are assessed in addition to written tests and are included in the final grades. Due to federal differences, especially regarding the definition of student performances that are to be assessed, this study focuses on the federal state of North Rhine-Westphalia (NRW). NRW is the only federal state in which all performances that do not count as written tests are grouped together in the category other performances. In order to improve the comparability of the data, the study focuses on the grades 7-9 in grammar schools (German: Gymnasium) and the subjects Mathematics and German. The subjects were selected because they must be taught as major subjects in every school and grade, and because it is to be examined whether different performance types are included in STEM and non-STEM subjects. To obtain data on a variety of questions and to be able to make comparisons between, for example, subjects, data were collected via an (online) questionnaire. Especially during the Corona pandemic, the online format offered the possibility of contactless data collection. In spring 2020, all 624 grammar schools in NRW were asked to participate in the 20 minutes survey. Responses were received from 272 teachers from 145 schools, with an average of 1.88 teachers per school participating (min. 1, max. 6 teachers per school). Relating to the 226 complete data sets, 133 Mathematics teachers (68 male, 65 female, 0 divers) and 93 German teachers (32 male, 61 female, 0 divers) participated. Data analysis is primarily descriptive, as the answers are formulated mostly open due to the lack of prior research on this topic. In addition, associations between both gender and work experience and different variables are examined.
Expected Outcomes
The data show that most teachers consider a variety of different student performances to be part of the other performance category, which are often oral but also practical and written (first research question). Differences between subjects are apparent: Written performances such as portfolios and protocols for example are counted significantly more often to other performances in German than in Mathematics. Regarding the frequency of documentation, previous empirical findings can be confirmed: more than 80% of the teachers document the performances at least once a week. The documentation is mostly undifferentiated in symbols and detailed comments are written rarely. This is especially important since other performances constitute for an average of 44% (min. 10%, max. 60%, median 50%) of the grade in the reports (third research question). The findings show that many teachers use a variety of performance types to increase the quality of their diagnostic evaluation. At the same time, there seems to be no agreement on which performances are to be included, to what extent, and how they should be documented and evaluated. This raises the important question to what extent the assessment systems of the teachers are comparable to one another. And do teachers consciously use the pedagogical leeway they are given to accommodate students' individuality? In general, the data collected are explorative. Due to the lack of research, they allow some important (preliminary) descriptive statements about the assessment of other performance. In order to confirm the findings, to uncover further correlations and, for example, elaborate teacher-assessment types, further investigations are needed; especially research that compares the evaluation systems across countries would be interesting.
References
References Avenarius, H., & Hanschmann, F. (2019). Schulrecht: Ein Handbuch für Praxis, Rechtsprechung und Wissenschaft (9., neu bearbeitete Auflage). Köln: Carl Link Verlag. Böhm, T. (2008). Grundkurs Schulrecht III: Zentrale Fragen zur Leistungsbeurteilung und zum Prüfungsrecht. Kronach: LinkLuchterhand. Breidenstein, G. [Georg] (2018). Ist "Leistungsgerechtigkeit" tatsächlich das Problem schulischer Leistungsbewertung? In T. Sansour, O. Musenberg, & J. Riegert (Hrsg.), Pädagogische Differenzen. Bildung und Leistung: Differenz zwischen Selektion und Anerkennung (pp. 59–69). Bad Heilbrunn: Verlag Julius Klinkhardt. Clarke, S. (2008). Active learning through formative assessment. London: Hodder Education. Eriksson, B. (2009). C2 Bildungsstandards - Mündliche Kommunikation. In M. Becker-Mrotzek & W. Ulrich (Hrsg.), Deutschunterricht in Theorie und Praxis: DTP; Handbuch zur Didaktik der deutschen Sprache und Literatur in elf Bänden / hrsg. von Winfried Ulrich ; Bd. 3. Mündliche Kommunikation und Gesprächsdidaktik (pp. 116–128). Baltmannsweiler: Schneider-Verl. Hohengehren. Förster, N., & Karst, K. (2017). Modelle diagnostischer Kompetenz: Gemeinsamkeiten und Unterschiede. In A. Südkamp & A.-K. Praetorius (Hrsg.), Diagnostische Kompetenz von Lehrkräften (pp. 63–65). Münster, New York: Waxmann. Kirk, S. (2004). Beurteilung mündlicher Leistungen.: Pädagogische, psychologische, didaktische und schulrechtliche Aspekte der mündlichen Leistungsbeurteilung. Bad Heilbrunn: Klinkhardt. Krieger, R. (2003). Das sogenannte Mündliche: eine Befragung von Beurteilten und Beurteilern zur Praxis und Problematik der Bewertung mündlicher Leistungen. Bildung Und Erziehung, 56(1), 75–92. SchulG (Schulgesetz für das Land Nordrhein-Westfalen). (2005). Fassung vom 15. Februar 2005 (zuletzt geändert durch Gesetz vom 6. Dezember 2016 (GV. NRW. S. 1052)). https://www.schulministerium.nrw.de/docs/Recht/Schulrecht/Schulgesetz/Schulgesetz.pdf. Langhammer, R. (1997). Mündliche Noten: Ergebnisse einer Umfrage unter Lehrkräften und Schülern. In M. Pabst-Weinschenk, R. W. Wagner, & C. L. Naumann (Hrsg.), Sprache und sprechen: Vol. 33. Sprecherziehung im Unterricht (pp. 80–93). München: E. Reinhardt. Shavelson, R. J., Young, D. B., Ayala, Carlos C., Brandon, P. R., Furtak, E. M., . . . Yin, Y. (2008). On the Impact of Curriculum-Embedded Formative Assessment on Learning: A Collaboration between Curriculum and Assessment Developers. Applied Measurement in Education, 21(4), 295–314. van Ophuysen, S., & Behrmann, L. (2015). Die Qualität pädagogischer Diagnostik im Lehrerberuf - Anmerkungen zum Themenheft "Diagnostische Kompetenzen von Lehrkräften und ihre Handlungsrelevanz". Journal for Educational Research Online, 7(2), 82–98.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.