Session Information
09 SES 05.5 A, General Poster Session
General Poster Session
Contribution
In a linguistically and culturally diversified Europe, the teaching, learning and assessment of foreign languages have gained attention over the last decades (European Commission, EACEA, & Eurydice, 2015). As a result, national tests in languages are currently administered in almost all European countries, and in a majority they carry high-stakes for students. The teaching and assessment of English as a foreign/second language (L2) has a particular position in Europe, where all but one of the countries administer national tests in L2 English for secondary students (European Commission et al., 2015). However, not even half of these countries include all four skills (reading, listening, writing and speaking) in national tests, and speaking is the least assessed language competence (European Commission et al., 2015). A plausible reason is the fact that speaking is the most difficult skill to assess in a reliable way (Alderson & Bachman, 2004), partly because raters need to consider numerous aspects simultaneously, and therefore, raters may pay attention to different aspects of speakers’ utterances (Bøhn, 2015). Moreover, the social situation in which a test is set affects assessment (Borger, 2019), making standardized testing of speaking particularly challenging. There is a gap in research regarding how raters of national and/or standardized tests orient to the challenges of assessing speaking when operationalizing assessment, something that could provide insight into raters’ assessment processes and in turn, how these can inform development of speaking tests.
In Sweden, which constitutes the empirical case for the present study, all students in grades 6 and 9 (12-13 and 15-16 years of age) take a national test in L2 English where students’ listening, reading, writing, and speaking skills are assessed. It is administered by the Swedish National Agency for Education but assessed by students’ own teachers. Speaking is included in part A, the National English Speaking Test (NEST). There is no specific rater training for teachers involved as raters of the NEST, but they are provided with extensive assessment guidelines from the Agency of Education. Despite these guidelines, teachers commonly construct their own note-taking document to use as a scoring template when assessing the NEST (Byman Frisén et al., 2021). Although scoring templates are recognized as mediators between the observed performances and the score awarded (McNamara, 1996), few studies have examined the role(s) they play in the assessment process. The aim of this study is to contribute to a clearer understanding of raters’ scoring processes when assessing L2 English speaking, as these emerge from raters’ reports of their note-taking practices in the assessment situation. Research questions adress how teachers take notes in the assessment situation, in what way they draw upon their notes when deciding the score, and reasons behind the creation and use of an own note-taking document.
The theoretical framework for the study is the Anthropological Theory of Didactics (ATD, Chevallard, 2007), and the idea of praxeologies that according to the theory need to be taken into account to examine ‘true’ knowledge (Chevallard, 2007, p. 133). Praxeologies consist of praxis and logos. Praxis is a type of task as well as the technique used to carry out the task, whereas logos is the logic behind using that particular technique for that particular task (the technology of the technique) as well as theory justifying the technology. Viewing note-taking documents as the technique used to carry out the task of assessing speaking in L2 English, the ATD framework is used in this study to analyze how teachers use this technique as well as analyzing the logos behind it – i.e., the discourse of why and how note-taking is beneficial for carrying out the task.
Method
Data consist of interviews (N=13) with teachers of English in Sweden, all women, that acted as raters of the speaking part of the NEST, in grade 6 and/or grade 9. Data were retrieved in two steps; a first step where five interviews were conducted, and a second step consisting of eight interviews. In the first step, interviews were conducted in connection with a previous project (Byman Frisén et al., 2021). After all five interviews in the first step had been conducted, new questions about teachers’ practices when using their note-taking document for assessment and scoring of the NEST arose, resulting in revisions of the interview guide allowing for more in-depth questions to interviewees in the second step. New interviewees were recruited from professional networks of teachers in year 6 and/or year 9. A semi-structured interview guide (Kvale, 1997) was used when interviewing participants in both steps, and all interviews were conducted individually (face-to-face or via a web-based program for online meetings). Interviewees came from both urban and more rural areas across Sweden. Twelve of the teachers had long experience from teaching English, between 11 – 25 years, whereas one of the teachers had taught English for 5 years. Since none of the teachers assessed the NEST every year, the numbers for teacher experience and times as rater of the NEST differed, where teachers reported having acted as rater for the NEST between 4 – 17 times. Several of the participating teachers worked in schools with both year 6 and year 9 students and thus had experience from teaching and assessing English for both groups of students. However, as most teachers were employed either as teachers for years 4–6 or years 7–9, they predominantly assessed either NEST year 6 or NEST year 9. Interviews from both steps of data retrieval were audio-recorded and transcribed ortographically. Data were analyzed using qualitative thematic analysis (Braun and Clarke, 2006) for which the software program NVivo 12 was used. Analysis was guided by the research questions for the study as well as the theoretical framework Anthropological Theory of Didactics (Chevallard, 2007).
Expected Outcomes
Preliminary results show that teachers applied note-taking documents in a two-step process. In the first step, notes were taken to capture observations of students’ performances. In the second step, teachers drew on these notes to decide the score. The process was not linear but reported to go back and forth. In addition, the numerous aspects to pay attention to simultaneously during the test situation called for a need for pre-printed criteria, so that one would attend to these alone. Nonetheless, additional comments were noted down by all of the interviewed teachers when listening to students. The complexity of the task at hand was mirrored in teachers’ talk about their assessment practices. Firstly, assessment of the NEST was carefully planned and prepared for, both in terms of preparing students for the assessment situation, and to prepare oneself for the role as rater by scrutinizing assessment guidelines. Secondly, in the second step of note-taking, teachers reported to discuss scoring decisions with colleagues or with oneself. For the most part, each rating criterion was then considered and negotiated before coming to a score decision. Thirdly, teachers reported to create a document for quick note-taking, where own symbolic systems were used for this purpose. This practice indicates a need for instant recording of one´s observations of speaking competences. In addition, although the outcome of the test was a summative score, both creation and use of note-taking documents indicated formative assessment practices. Thus, accountability was part of the discourse behind the use of the technique. Moreover, the study shows that teachers who are involved in assessment of the NEST acquire in-depth knowledge of the test construct that might contribute to their classroom-teaching of speaking skills.
References
Alderson, J., and Bachman, L. (2004). Series editors preface to Assessing Speaking. In J. Alderson and L. Bachman (Eds.), Assessing Speaking, pp. ix–xi. Cambridge University Press Borger, L. (2019). Assessing interactional skills in a paired speaking test: Raters’ interpretation of the construct. Apples–Journal of Applied Language Studies 13: 151–74. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology 3:77. doi:10.1191/1478088706qp063oa. Byman Frisén L., Sundqvist P., Sandlund E. (2021). Policy in Practice: Teachers’ Conceptualizations of L2 English Oral Proficiency as Operationalized in High-Stakes Test Assessment. Languages, 6(4):204. https://doi.org/10.3390/languages604020 Bøhn, H. (2015). Assessing spoken EFL without a common rating scale. SAGE Open 5: 1–12. Chevallard, Y. (2007). Readjusting Didactics to a Changing Epistemology. European Educational Research Journal, 6(2), 131-134. European Commission, EACEA, & Eurydice. (2015). Languages in Secondary Education: An Overview of National Tests in Europe – 2014/15. https://op.europa.eu/en/publication-detail/-/publication/62ac43c3-dac4-11e5-8fea-01aa75ed71a1/language-en Kvale, S. (1997). Den kvalitativa forskningsintervjun. Studentlitteratur McNamara, T. (1996). Measuring second language performance. Longman
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.