Reading comprehension tests in Denmark, Norway, and Sweden. A comparative analysis of construct definitions, cognitive targets, and response formats.

Author(s):

Michael Tengberg(presenting / submitting)

Conference:

ECER 2015

Network:

31. LEd – Network on Language and Education

Format:

Paper

Session Information

31 SES 01, Reading and Spelling - Comparative Approaches

Paper Session

Time:

2015-09-08

13:15-14:45

Room:

395. [Main]

Chair:

Piet Van Avermaet

Contribution

There is a growing awareness that changes in the way we assess educational progress has major implications for the development of teaching and learning. Research points to the fact that standardized tests such as national tests should be seen as part of the curriculum rather than as external means to evaluate curriculum effects in terms of student learning (Forsberg & Lundahl, 2010). This is mainly due to wash-back effects from high-stakes test on policies and practicies in school. In order to strengthen educational equity and increase assessment objectivity, standardized testing has for a number of years been, and is still being, intensified in the Scandinavian educational systems, as well as in many other countries (Dobson, Eggen & Smith, 2009; Egelund, 2008; Wikström, 2009). This is well known. What different tests actually measure, however, is less known.

Reading comprehension tests are often assumed to measure the same, or at least similar, constructs. Yet, reading is not a single but a multidimensional form of processing (Duke, 2005), which means that variations in terms of reading material, response format and item construction may emphasize one side of the construct at the cost of another (Keenan, Betjeman & Olson, 2008; Leslie & Caldwell, 2009; Nation & Snowling, 1997). The educational systems of Denmark, Norway and Sweden share a number of traits, and in the recent decade, the development of national test instruments, especially for reading, have been highly influenced by international surveys of student achievement. In this study, national tests of reading comprehension at the end of compulsory school in the three Scandinavian countries are compared in order to reveal the range of commonality and difference in the three test domains.

The analysis presented has aimed specifically at determining

The variation of test domains and construct definitions in the national reading frameworks in Denmark, Norway and Sweden.
The variation of reading material, item construction and scoring guidelines within national reading tests in Denmark, Norway and Sweden.

Much research has been devoted to identify the factors that best explain variance in comprehension, as well as to explore the intercorrelations among items and subdivisions of the construct (Cutting & Scarborough, 2006; Davis, 1968; Keenan et al., 2008; Spearitt, 1972). While some aspects of the construct may be conceptually distinguishable, such as reading processes which are frequently marked as separate skill factors in reading comprehension tests, they are not necessarily psychometrically distinct from each other (van Steensel, Oostdam & van Gelderen, 2012). At the same time, comparative analyses of dimensionality in existing frameworks indicate that item properties, textual features, and sometimes scoring rubrics are important factors when determining the scope of cognitive processing to be tested (Cutting & Scarborough, 2006; Francis, Fletcher, Catts & Tomblin, 2005).

Research in the field has also examined the impact of item format and the range of achievement variation related to changes in testing framework and methodology. Although several studies have indicated that a lion’s share of the construct may be targeted equally well by multiple-choice (MC) and constructed response (CR) formats, there are some persistent beliefs about item differences that often pertain to tradition and custom rather than to a thorough body of empirically-based knowledge. For this reason, it may be particularly useful to compare the methods for reading comprehension assessments in neighboring countries.

Method

Comparisons in the study are based on reading and testing frameworks, text samples, task samples and scoring guidelines from national reading tests in Norway and Sweden and from final examinations of reading in Denmark. Data include tests in use from 2011 through 2014. Documentation concerning construct definitions and relationship between tests and national curricula was gathered chiefly from policy documents on national testing systems and administration published by the Danish Ministry of Education, the Swedish National Agency, and the Norwegian Directorate for Education and Training. The comparison of construct definitions is concerned with three levels: 1) general description of reading ability; 2) definitions of reading processes (cognitive targets) to be tested; and 3) reading material. The text sample includes all texts in use from 2011 through 2014. Categorization includes length, format, and text type. Tasks are categorized by item format and cognitive target. Items for testing reading comprehension are categorized as standard multiple-choice (SMC); short-answer constructed response (SACR); open-ended constructed response (OECR); gap-filling multiple-choice (GFMC); and items that combine multiple-choice and short-answer or open-ended response (CMS/O). SACR are tasks that call for concise writing performances of the student and scoring is more or less objective from a list of pre-specified correct responses. OECR items enables a more extended response and although scoring is related to rubrics, it involves interpretations on part of the rater. Cognitive target refer to the mental process or cognitive dimension underlying the reading comprehension required to solve a particular type of task (Alderson, 2000; Haladyna & Rodriguez, 2013; Khalifa & Weir, 2009). Since there is no common framework between the three tests by which tasks can be categorized, the cognitive target matrix used for the analysis was developed specifically to provide a balanced representation of the tasks included in the entire sample. It includes five different levels: focus and retrieve; make straightforward inferences; integrate and interpret; examine textual content and language; using knowledge and experience to reflect and evaluate. All items (n=465) were coded in order to provide comparable data. Coding included examination of item, text and scoring rubrics. Unfortunately, there was no second coder in this study to allow for inter-rater reliability check. Instead, an intra-rater reliability check was conducted by recoding a random sample of 60 items in total (5 from each year and country). Kappa statistics indicated a reasonable proportion of intra-rater consistency (κ = 0.78, p < 0.0001).

Expected Outcomes

While there are some obvious similarities between the three tests in terms of reading material and basic structure, the study exhibit a number of critical differences. These differences concern construct definitions (as defined by testing frameworks), test content (as defined by reading material and cognitive targets), and methods for measurement (as defined by the number, structure and format of items). In doing so, the study also indicate potential directions for future development of reading comprehension assessment in the three countries. As for construct and content, the testing of reading ability in Denmark appears more closely focused on techniques for quick and purposeful reading, and for students to be able to skim text and gather relevant information. In turn, Norwegian and Swedish tests appear more focused on analytic, interpretive and reflective aspects of reading. Findings from the study also indicate that the structuring of items, both within test and over time, is more systematically organized in Danish reading test than in Norwegian and Swedish tests. The selected features of text to be read and the number of items for each text are, for instance, more consistent over time in Denmark. On the other hand, items in the Danish test are not explicitly defined by expected cognitive targets, which make differentiated profiles related to sub skills of reading difficult to attain. Although the three tests are performed for slightly different purposes, the differences revealed by the study indicate areas of potential progress in the development of future reading tests. These areas include for example organizing the domain in terms of both cognitive target and reading material and avoiding construct-irrelevant variance by standardizing text length, item properties and scoring procedures.

References

Cutting, L. E. & Scarborough, H. S. (2006). Prediction of reading comprehension: Relative contributions of word recognition, language proficiency, and other cognitive skills can depend on how comprehension is measured. Scientific Studies of Reading 10(3), 277-299. Davis, F. B. (1968). Research in comprehension in reading. Reading Research Quarterly, 3, 499–545. Dobson, S., Eggen, A. B., & Smith, K. (Eds.) (2009). Vurdering, prinsipper og praksis. Oslo: Gyldendal akademisk. Duke N. K. (2005). Comprehension of what for what: Comprehension as a nonunitary construct. In S.G. Paris & S.A. Stahl (Eds.), Children’s Reading Comprehension and Assessment (pp. 93–104). Mahwah, NY: Lawrence Erlbaum Ass. Egelund, N. (2008). The value of international comparative studies of achievement – a Danish perspective. Assessment in Education: Principles, Policy & Practice, 15(3), 245–251. Forsberg, E. & Lundahl, C. (2010). Kunskapsbedömningar som styrmedia. Utbildning & Demokrati, 15(3), 7–29. Francis, D. J., Fletcher, J. M., Catts, H. W., & Tomblin, J. B. (2005). Dimensions affecting the assessment of reading comprehension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and assessment (pp. 369–394). Mahwah, NJ: Erlbaum. Haladyna, T. M. & Rodriguez, M. C. (2013). Developing and validating test items. New York: Routledge. Keenan, J. M., Betjemann, R. S. & Olson, R. K. (2008). Reading comprehension tests vary in the skills they assess: Differential dependence on decoding and oral comprehension. Scientific Studies of Reading, 12(3), 281–300. Khalifa, H., and Weir, C. (2009). Examining reading: Research and practice in assessing second language learning. New York: Cambridge University Press. Leslie, L. & Caldwell, J. (2009). Formal and Informal Measures of Reading Comprehension. In S. E. Israel & G. G Duffy (Eds.), Handbook of research on reading comprehension (pp. 403–427). New York, NY: Routledge. Nation, K., & Snowling, M. (1997). Assessing reading difficulties: the validity and utility of current measures of reading skill. British Journal of Educational Psychology, 67, 359–370. Spearritt, D. (1972). Identification of sub-skills of reading comprehension by maximum likelihood factor analysis. Reading Research Quarterly, 8, 92–111. van Steensel, R., Oostdam, R., & van Gelderen, A. (2012). Assessing reading comprehension in adolescent low achievers: Subskills identification and task specificity. Language Testing 30(1), 3–21. Wikström, C. (2009). National curriculum assessment in England – a Swedish perspective. Educational Research, 51(2), 255–258.

Author Information

Michael Tengberg (presenting / submitting)

University of Oslo

Department of Teacher Education and School Research

OSLO

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.