50 Years’ Legacy of Formative and Summative Evaluation and Assessment: A Critical Theoretical Review of Education Policy and Research

Author(s):

Sverre Tveit (presenting / submitting)

Conference:

ECER 2017

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 06 C, Discussing Social Impact in Education Research and Assessment Related Education Policy and Research

Paper Session

Time:

2017-08-23

15:30-17:00

Room:

W5.17

Chair:

Julia Gerick

Contribution

The objective of this paper is to critically examine the conceptual understanding underpinning the formative/summative distinction since the concepts were coined by Michael Scriven (1967) in relation to curriculum program evaluation 50 years ago. The distinction has since been elaborated and its meaning expanded both within the area of educational evaluation and educational assessment (Cizek, 2010, p. 5). While acknowledging that the concepts originated from the area of (system level) educational evaluation, this paper focuses on the use of the distinction in relation to processes of determining individuals’ attainment.

Scriven’s 1967 paper was a philosophical account related to studies of effectiveness of schools’ curriculum programs. He suggested the concept formative evaluation to describe the role in the “on-going improvement of the curriculum” and summative evaluation to “enable administrators [to make decisions with respect to] the finished curriculum” (p. 41). Scriven’s definition of “formative” was later extended by Bloom, Hasting and Madaus (1971), who proposed that “formative tests” should be administrated after the completion of appropriate learning units “to determine the degree of mastery of a given learning task and to pinpoint the part of the task not mastered” (p. 61). For students who mastered the task, formative tests were to “reinforce the learning and assure him that his present mode of learning and approach to study is adequate”, while for students who lacked mastery of the unit “the formative test should reveal the particular points of difficulty” (p. 54). According to Cizek (2010, p. 5), this definition is foundational for how formative assessment is understood in the United States.

Black and Wiliam’s (1998) research review on formative assessment received tremendous attention globally and has been commonplace used in Europe to call for a shift from summative to formative assessment in policy as well as in the practices of educational assessment (Kirton, Hallam, Peffers, Robertson and Stobart, 2007). To further clarify the intentions of formative assessment, the Assessment Reform Group’s (ARG) in the United Kingdom introduced a set of principles known as Assessment for Learning (AFL), aiming to further emphasize the need of assessment practices to support, rather than undermine, learning and instruction (ARG, 1999).

There is however no consensual distinction between the formative assessment and AFL concepts, and between these concepts’ relationship with summative assessment and Assessment of Learning respectively. This causes confusion with respect to what we are talking about when we use the concepts formative and summative.

Based on the analysis of five different conceptual understandings of formative and summative assessment (outlined below), the paper critically discusses the way formative assessment has been emphasised in Assessment for Learning programs worldwide. As Bennett (2011) points out, one cannot be sure of these program’s effects unless adequate definitions of the meaning of formative assessment and AFL is applied. In recent years the theoretical definitions of formative assessment, and national states’ implementation of AFL programs, have been criticised (Baird et al 2014; Bennett, 2011; Hopfenbeck, Tolo, Florez & Masri, 2013; Hopfenbeck, Petour & Tolo, 2015; Newton, 2007; Jonsson, Lundahl & Holmgren, 2015; Taras, 2007, 2009). Baird et al (2014) conclude that conceptual problems makes it difficult to identify effects of AFL policies.

The paper contends that these conceptual problems cause problems both with respect to policymaking and research deliberations. Firstly, shallow conceptual understanding in policymakers and researchers’ deliberations may cause misunderstandings with respect to the premises of assessment reform in other countries. Secondly, it may be difficult – if not outright impossible – to interpret effects of intervention studies that addresses formative assessment and/or Assessment for Learning due the vastly different conceptual understanding underpinning research participants and researchers’ terminology.

Method

The mode of inquiry is a review of research literature that have been foundational to educational research and policy deliberations over the past 50 years. The articles were identified through critical examination of literature on classroom assessment and assessment policy. These were classified in five different types of distinctions between formative assessment (FA) and summative assessment (SA): 1: SA and FA definitions distinguishing between timing and purposes: Bloom et al (1971), Sadler (1989) and Shepard (2005) followed Scriven who distinguished between timing or purpose. Bloom’s theory gave emphasis to the need of formative assessment to be more specific than summative assessment of ‘larger outcomes’. Shepard’s distinction is centred on the timing and improving teaching or learning, while Sadler’s definition gives more emphasis to the need of more targeted feedback approaches. 2: FA definitions that omit (explicit) SA definitions (Black and Wiliam): For Black and Wiliam (1998) formative assessment concerns teachers and students’ role in the assessment. A notable change can be traced from the initial 1998 definition to the 2009 definition. In their more developed theory, Black and Wiliam (2009) identify different stages in the formative assessment process (p. 9). Their conceptualisation of formative assessment was however not explicitly related to summative assessment. 3: FA definitions which explicitly distinguishes between two types of SA: In a book from 2004, however, Black, Harrison, Lee, Marshall and Wiliam (2004) also address accountability, competence ranking and certification, and thus distinguishes between individual and group levels usage of summative assessment. This difference was also recognised by Looney (2005) and Popham (2011). 4: SA defined as foundational to FA: Taras (2005) alleges that Sadler’s (1989) much-cited formative assessment theory and Scriven’s (1967) original paper implicitly perceive the goal (or the standard) as is intrinsic to formative assessment. According to Taras (2005, 2007), summative and formative assessment logically leads into each other as one continuous process. 5: SA defined as a judgment and not a purpose: Newton (2007) observes that “when referring to the alleged summative purpose (…) researchers tend simply to use the term as a catch-all expression for categorizing any of a variety of different purposes which are predicated on the use of individual summative assessment judgements” (p. 156). He argues that the term “summative” evokes the nature of the assessment judgment: summing up. What these judgments are used for, is rarely or scarcely addressed in text that addresses the prospects of formative assessment.

Expected Outcomes

The paper identifies conceptual problems in many research papers on formative assessment. Taras (2007) contends that “the true relationship between summative and formative assessment is never made clear” by Black and Wiliam (1998) and that they fail to specify the role of criteria and standards, despite defining these as integral part of formative assessment (Taras, 2007, p. 370). Following Taras’ lead, one may ask if there is any educational situation that is not based on a summative judgment? However, this interpretation fails to acknowledge the many different kinds and uses of SA. Newton (2007) concludes that the distinction between formative and summative is not grounded in the use to which assessment judgments are put, ‘simply because there is no meaningful distinction to be drawn’. He contends that while ‘the rhetoric appears to distinguish between two conceptually distinct types of use to which results can be put; in fact, it simply foregrounds one particular type, the formative use (Newton, 2007 p. 157). Newton (2007) therefore suggests that we use the summative term in relation to a judgment and not a purpose. Based on the review of research papers on formative and summative assessment, the paper suggests that it may be useful to avoid the formative and summative distinction entirely, and instead focus on theoretical aspects of Scriven’s (1967) original research paper that received less attention. Drawing on the reviewed distinctions, along with the work of Black (1998) and Stobart (2008), the paper makes new use of Scriven’s (1967) distinction between roles and goals of evaluation and promulgates novel terminology that describes three roles of educational assessment: Assessments used to certify learning, assessments used to govern learning and instruction, and assessments used to support learning and instruction.

References

Baird, J.-A., Hopfenbeck, T. N., Newton, P. E., Stobart, G., & Steen-Utheim, A. T. (2014). Assessment and Learning (No. 13/4697) (pp. 1–174). Norwegian Knowledge Centre for Education. Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2004). Working inside the black box: assessment for learning in the classroom.. Black, P., & Wiliam, D. (2005). Lessons from around the world: how policies, politics and cultures constrain and afford assessment practices. Curriculum Journal, 16(2), 249–261. http://doi.org/10.1080/09585170500136218 Black, Paul & Wiliam, Dylan (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5:1. Bloom, B. S.; Engelhart, M. D.; Furst, E. J.; Hill, W. H.; Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. New York: David McKay Company. Jonsson, A., Lundahl, C., & Holmgren, A. (2015). Evaluating a large-scale implementation of Assessment for Learning in Sweden. Assessment in Education: Principles, Policy & Practice, 22(1), 104–121. http://doi.org/10.1080/0969594X.2014.970612 Kirton, A., Hallam, S., Peffers, J., Robertson, P., & Stobart, G. (2007). Revolution, evolution or a Trojan horse? Piloting assessment for learning in some Scottish primary schools. British Educational Research Journal, 33(4), 605–627. http://doi.org/10.1080/01411920701434136 Newton, P. E. (2007). Clarifying the purposes of educational assessment. Assessment in Education: Principles, Policy & Practice, 14(2), 149–170. http://doi.org/10.1080/09695940701478321 Sadler, D.R. (1987). Specifying and Promulgating Achievement Standards Sadler, Royce (1989). Formative assessment and the design of instructional systems. Instructional Science 18: 119-144. Scriven, M. (1967). The methodology of evaluation. In Stake, R. E. Curriculum evaluation. Chicago: Rand McNally. American Educational Research Association (monograph series on evaluation, no. 1. Scriven, M. 1986. Evaluation as a paradigm for educational research. In New directions in educational evaluation, ed. R. House, 53–67. New York: Falmer Press. Scriven, M. (1990). Beyound formative and summative evaluation. In M. McLaughlin & D. Phillips (Eds.), NSSE Yearkbook, Evaluation & Education. Stobart, G. (2008). Testing Time. The uses and abuses of assessment. London: Routledge. Taras, M. (2005). Assessment – Summative and Formative – Some Theoretical Reflections. British Journal of Educational Studies, 53(4), 466–478. http://doi.org/10.1111/j.1467-8527.2005.00307.x Taras, M. (2007). Assessment for learning: understanding theory to improve practice. Journal of Further and Higher Education, 31(4), 363–371. http://doi.org/10.1080/03098770701625746 Taras, M. (2009). Summative assessment: the missing link for formative assessment. Journal of Further and Higher Education, 33(1), 57–69. http://doi.org/10.1080/03098770802638671

Author Information

Sverre Tveit (presenting / submitting)

University of Agder, Norway

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.