ERG SES E 02, Language and Education
In the European context, the use of national tests in languages have increased during the last two decades, and in a majority of countries they carry high-stakes for students (European Commission, EACEA, & Eurydice, 2015). In Sweden, all students in grades 6 (ages 11–12) and 9 (ages 15–16) take a national test in English as a foreign/second language (L2), administered by the Swedish National Agency for Education. A main objective of the test is to contribute to equity in assessment nationwide (Skolverket, 2010). It is a high-stakes, summative proficiency test and encompasses oral proficiency, receptive skills, and written proficiency.
Since 1998, oral proficiency in the form of the test construct oral production and interaction, has been part of the national test of L2 English. Students are divided into pairs or small groups and instructed to react to topics from the test material as well as to discuss these with their peers. In general, students’ own teacher also administers and assesses this part of the national test. The Agency for Education instruct teachers to assess students’ oral production and interaction of English holistically, taking all aspects of students’ oral production and interaction into account. In the instructions, some aspects are listed in support of the holistic assessment, and they encompass linguistic qualities (e.g., grammatical structures and vocabulary), ability to produce content (e.g., giving examples and providing different perspectives) as well as ability to interact (e.g., adaption to recipient and situation). Explicit instructions, such as what grammatical structures or vocabulary students should master or what constitutes adaption to recipient, are lacking, and so are instructions on how to summarize or relate aspects to the holistic assessment.
To facilitate assessment of oral production and interaction, many teachers construct their own scoring rubrics which are used for note-taking while students take the test. If equity in assessment is to be attained, it is of importance that there is high intra-rater reliability between examiners (Stemler, 2004). Thus, it is essential that they are in agreement as regards what constitutes oral production and interaction, and how to assess this test construct (Sandlund & Sundqvist, forthcoming 2018a). An important question is, therefore, whether teachers conceptualize this ability in a similar way in their own scoring rubrics, and also whether teachers’ conceptualizations are similar to how the term is defined in the policy documents from the National Agency for Education.
The aim of the present study is to examine teachers’ views on oral production and interaction in English as reflected in the creation of their own scoring rubrics. The following research questions guided the study:
1) How do teachers construct their interpretations of the instructions from the Agency for Education as regards the test construct in focus?
2) In what way(s) are these interpretations similar to or different from each other?
3) In what way(s) are teachers’ interpretations similar to or different from the syllabus and the current criteria?
4) To what extent is equity in assessment possibly affected by these interpretations?
The theoretical framework for the study is the Anthropological Theory of Didactics (ATD) (Chevallard, 2007) and the didactic transpositions that the content to be taught and learnt is subject to. According to ATD, there is a dialectic relationship between institutions and the people within these institutions, in which content is co-determined on a hierarchy of levels (Achiam & Marandino, 2014). In the study, the ATD framework is used to examine the didactic transpositions of assessment of oral production and interaction from how it is expressed in the policy documents to how it is interpreted when operationalized by teachers in their own scoring rubrics.
Data were retrieved between February and October, 2017, and consist of scoring rubrics made by teachers for assessment of the national test in grades 6 and/or grade 9. In addition to rubrics, data consist of a web questionnaire and interviews. Rubrics as well as questionnaires were collected through two closed groups on Facebook by posting a call for participation in the study. The first group is for teachers of English in grades 4–6, which had 4,406 members when data retrieval started in February, 2017. The second group is for teachers of English in grades 6–9 (4,394 members in February, 2017). Members that responded positively to participation in the study were contacted and were asked to submit their scoring rubrics. When contacted, participants were also given the link to the questionnaire. We received 24 scoring rubrics; 18 were unique (i.e., not identical to any other). To this collection, four scoring rubrics from a previous project (www.kau.se/testing-talk) could be added, yielding a total of 22 unique scoring rubrics. Five interviewees were selected based on this material, and all five interviews were conducted face-to-face. Nineteen of the Facebook group members answered the questionnaire. Data were analyzed both quantitatively (questionnaire) and qualitatively (scoring rubrics and interviews). In the qualitative analysis of the scoring rubrics we were guided by a number of terms related to the assessment of oral proficiency (e.g., vocabulary, language correctness, and strategies) (see Hasselgren, 1997) and construct, criterion and sub-criterion (see Bøhn, 2015), to examine how these were used and structured by teachers, as well as how they were interpreted and transposed (see Achiam & Marandino, 2014; Chevallard, 2007). Focus for the analysis of the questionnaire and interviews was to shed light on themes that emerged from the content analysis of scoring rubrics. Results from rubrics, questionnaires and interviews were compared, and comparisons were also made with the instructions for assessment of oral production and interaction from the National Agency for Education.
Preliminary results indicate that there are discrepancies between how teachers arrange criteria to be assessed in the scoring rubrics. Some of the teachers’ rubrics list two main criteria as basis for the assessment (content and language) whereas others list as many as ten (adaption to recipient, argumentation, clarity, fluency, grammatical structures, pronunciation/intonation, strategies, structure, variation and vocabulary). As a consequence, discrepancies are also found between arrangement of assessment criteria in the rubrics and how these are arranged by the National Agency for Education. However, in their rubrics, several teachers arrange the criteria to be assessed in close resemblance to the instructions from the Agency. Interestingly, several rubrics also list explicit examples of what constitute criteria or sub-criteria, such as specific vocabulary or specific grammatical structures. One part of the test construct, namely oral interaction, seems to give rise to many explicit examples, and this might indicate that oral interaction is particularly difficult to capture/assess. Explicit examples are often structured in accordance with the grading system, most likely to make it easier to decide what grade individual students’ oral production and interaction is equivalent to. Therefore, preliminary results indicate that scoring rubrics are used to exemplify the instructions from the Agency for Education, which are more general in character. Furthermore, explicit examples vary between scoring rubrics and although content analysis of the data is ongoing, different interpretations seem to emanate from the data which, as a consequence, will affect equity in assessment.
Achiam, M., & Marandino, M. (2014). A framework for understanding the conditions of science representation and dissemination in museums. Museum management and curatorship, 29(1), 66-82. Bøhn, H. (2015). Assessing Spoken EFL Without a Common Rating Scale. SAGE Open, 5(4). doi:10.1177/2158244015621956 Chevallard, Y. (2007). Readjusting Didactics to a Changing Epistemology. European Educational Research Journal, 6(2), 131-134. European Commission, EACEA, & Eurydice. (2015). Languages in Secondary Education: An Overview of National Tests in Europe – 2014/15. Retrieved from http://eacea.ec.europa.eu/education/eurydice/documents/facts_and_figures/187EN.pdf Hasselgren, A. (1997). Oral Test Subskill Scores: What They Tell Us About Raters and Pupils. In A. Huhta, V. Kohonen, L. Kurki-Suonio, & S. Luoma (Eds.), Current developments and alternatives in language assessment - Proceedings of LTRC 96 (pp. 241-246). Jyväskylä: University of Jyväskylä and University of Tampere Sandlund, E., & Sundqvist, P. (forthcoming 2018a). Doing versus assessing interactional competence: Contrasting L2 test interaction and teachers’ collaborative grading of a paired speaking test. In R. Salaberry & S. Kunitz (Eds.), Teaching and Testing L2 interactional competence: Bridging theory and practice. Oxon and New York, NY: Routledge. Sandlund, E., & Sundqvist, P. (forthcoming 2018b). Muntlig färdighet i engelska. Lund: Studentlitteratur. Skolverket. (2010). The Swedish National Agency for Education - a presentation. Stockholm: Skolverket. Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research and Evaluation, 9(4). Retrieved from http://pareonline.net/getvn.asp?v=9&n=4 Sundqvist, P., & Sandlund, E. (forthcoming 2020). Testing talk. Ways to assess second language oral proficiency. London: Bloomsbury Academic.
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.