Session Information
31 SES 11 A, Literacy; Writing
Paper Session
Contribution
In the context of teaching English as an Additional Language (EAL), feedback and grading are not merely routine practices but fundamental components of writing pedagogy (Hyland, 2003). These elements significantly contribute to learners’ development and improvement of writing skills as they encourage to understand and actively improve linguistic abilities. In writing assignments, the ability to use language in written form is examined, demanding not only strategic competencies but also fundamental linguistic competencies such as knowledge of pragmatics, sociolinguistics, and grammar (Keller-Bolliger, 2012). Therefore, communicative writing competence cannot be directly captured; it must instead be assessed through its realization by components like text quality, task fulfillment, and grammatical and lexical performance (Keller-Bolliger, 2012). Unlike the evaluation of listening and reading competencies, which are often assessed through tasks that can be rather objectively evaluated, writing tasks feature an open format that simulates or at least prototypically represents real-life writing scenarios (Grotjahn & Kleppin, 2017). This offers a wide scope for evaluations and underscores the need for a differentiated approach in assessing writing skills (Grotjahn & Kleppin, 2017). Additionally, grades alone are not sufficient to enhance learners’ writing skills (Hyland & Hyland, 2006; Porsch, 2010) but should be accompanied with specific (verbal and/or written) feedback by the teacher (Grotjahn & Kleppin, 2017). This necessity highlights the complexity and resource-intensive nature of the assessment process, as providing detailed, constructive feedback on written texts requires a substantial amount of time and cognitive effort from teachers. Since the advent of more sophisticated AI technologies, particularly with the emergence of models like ChatGPT since 2022, the practices of grading and feedback in the context of EAL (or EFL) have seen potential shifts (Crompton et al., 2024). AI chatbots like ChatGPT are recognized for their potential to function as “teacher-facing systems” (Pokrivčáková, 2019), which can reduce the workload of educators and enhance their output by semi-automating tasks such as grading and feedback (Pokrivčáková, 2019; Grassini, 2023). This advancement also offers the opportunity to mitigate human weaknesses, such as a lack of objectivity in diagnostics and consistency and fairness in evaluations (Zehner, 2019). From an empirical perspective, it remains unclear how reliable AI-supported grading and feedback of written learner products are and whether they could effectively complement the writing pedagogies of language teachers or lead to new challenges. The presented study therefore examined gradings and feedback generated by ChatGPT-4o on written learner texts from an EAL-class in Germany and compares these with assessment of the same texts by the class’s English teachers. Based on curricular guidelines and grading rubrics, the written texts were evaluated along the same criteria related to content (task requirements, text type, awareness of context and audience) and language (comprehensibility, awareness of context and audience), idiomatic usage, coherence/cohesion, vocabulary, grammar, and orthography). The objective was to identify how AI-generated assessments differ from human assessments and based on this to determine opportunities as well as limitations of AI as teacher-facing systems in writing pedagogies.
Method
The presented study was conducted at a German school where learners from diverse linguistic backgrounds and with varying levels of English language proficiency are educated. The data basis of the study consists of 23 digitized and anonymized student texts from a writing assessment in a 6th grade English class, along with the corresponding teacher gradings and feedback. The learners were asked to write an 80-words-long letter to a friend about a recent class trip. In a separate process, a prompt for ChatGPT-4o with relevant assessment criteria and background information on the writing task and learner group (age, level of language proficiency) was developed and tested. This prompt was then used to generate AI-supported gradings and feedback of the student texts with two different ChatGPT-4o accounts and incorporated into the data corpus. The grades and feedback given by the teachers were then compared with the evaluations by ChatGPT-4o, and similarities or notable differences were described. This was done using a tabular visual analysis technique (Miles & Huberman, 1994). Based on the extent to which both assessments agreed, considerations were then made concerning the usability of the chatbot in the context of writing assessment.
Expected Outcomes
The analysis of ChatGPT-4o’s performance in assessing written learner texts revealed notable discrepancies when compared to teacher evaluations. The AI failed to consistently consider critical elements such as the length of the written texts, despite clear instructions that texts should comprise a certain word count. Not only did it fail to penalize for shorter texts consistently, but it also varied in mentioning this shortfall across the two different ChatGPT-accounts. Significant challenges also arose in the evaluation of the content of the written products (level of detail, audience awareness, subjective experiences by different students) and the clear separation of the main criteria, content and language. This lack of attention to detailed criteria could lead to unfair grading outcomes where students are not evaluated on a uniform standard. Furthermore, the AI-generated assessments demonstrated issues in accurately assigning scores according to the established rubric. For instance, ChatGPT sometimes assigned maximum scores in categories where it had identified zero points in other criteria, suggesting a misunderstanding or misapplication of the grading scale. These errors indicate potential flaws in its logical processing or algorithmic interpretation of the grading criteria. Comparative analysis revealed that the marks ChatGPT assigned were frequently out of alignment with those given by teachers, with differences ranging from 0.5 to over 3 points. This variance further highlights the challenges in relying solely on AI for accurate and fair student assessment. Despite these issues, the integration of AI-chatbots in language teachers writing pedagogies, especially in grading and feedback, should not be outright dismissed. Further adjustments to the prompts provided to the AI could potentially (further) standardize the structure of the feedback- Moreover, incorporating more detailed and specific directives on certain criteria could mitigate the AI’s interpretive errors and reduce the chance of erratic grading and feedback.
References
Crompton, H., Edmett, A., Ichaporia, N., & Burke, D. (2024). AI and English language teaching: Affordances and challenges. British Journal of Educational Technology, 55, 2503–2529. https://doi.org/10.1111/bjet.13460 Grassini, S. (2023). Shaping the Future of Education: Exploring the Potential and Consequences of AI and ChatGPT in Educational Settings. Education Sciences, 13(692), 1-13. Grotjahn, R. & Kleppin, K. (2017). Gütekriterien bei der Evaluation von Schreibkompetenzen. In: B. Akukwe, R. Grotjahn & S. Schipolowski (Eds.), Schreibkompetenzen in der Fremdsprache. Aufgabengestaltung, kriterienorientierte Bewertung und Feedback (pp. 41-69), Narr Francke Attempto. Hyland, K. (2003). Second Language Writing. Cambridge University Press. Hyland, F. & Hyland, K. (2006). Feedback on second language students’ writing. Language Teaching, 39, 83-101. Keller-Bolliger, R. (2012). Kommunikative Schreibkompetenz in der Fremdsprache erfassen und beurteilen. Eine empirische Studie im Kontext des EDK-Projekts HarmoS, Holtzbrinck. Miles, M. B., & Huberman, M. A. (1994). Qualitative Data Analysis. Sage. Pokrivčáková, S. (2019). Preparing teachers for the application of AI-powered technologies in foreign language education. Sciendo. https://doi.org/10.2478/jolace-2019-0025 Porsch, R. (2010). Schreibkompetenzvermittlung im Englischunterricht in der Sekundarstufe I. Empirische Analysen zu Leistungen, Einstellungen, Unterrichts-methoden und Zusammenhängen von Leistungen in der Mutter- und Fremdprache. Waxmann. Zehner, F. (2019). Künstliche Intelligenz. Ihr Potenzial und der Mythos des Lehrkraft-Bots. Schulmanagement-Handbuch, 38(169), 6-30. https://doi.org/10.25656/01:17561.
Update Modus of this Database
The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.