Attaching Meaning to Grades and Other Banded Performance Indicators

Author(s):

Sandra Johnson(presenting / submitting)Rod Johnson

Conference:

ECER 2015

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 06 A, Relating Assessment Policies and Performance Interpretations to School and Student Variables

Paper Session

Time:

2015-09-09

15:30-17:00

Room:

326. [Main]

Chair:

Monica Rosén

Contribution

How validly can meaning be attached to perfomance band indictaors (grades, proficiency levels, and the like)?

------------------------------------------

The inherent function of performance banding (or grading or labelling) is to identify individuals or groups of individuals who have produced higher-meriting performances than others in some defined sense on some given assessment. Banding information might be used for one or more specific high-stakes purposes: in particular, for selection of individuals for further courses of study and, ultimately, for certification; and for aggregation to produce performance distributions for use in school accountability or system monitoring. For these reasons, performance banding is one of the most challenging and one of the most contentious areas of assessment activity, rendering choice of banding strategy, i.e. ‘standard setting’ (Cizek and Bunch 2007; Hambleton and Pitoniak 2006), as well as choice of banding labels (Burt and Stapleton 2010), critical elements in any high-stakes assessment system.

Banding outcomes for individuals can be letter grades (e.g. grade B in the English A-level), numeric grades (e.g. ‘4’ in the International Baccalaureate), or verbal labels of some sort (e.g. ‘bien’ in the French Baccalauréat, ‘proficient’ in US statewide assessments, ‘proficiency Level 5’ in PISA), determined on the basis of marks, or scaled scores, achieved on tests and examinations, or by human judgement of performance on tasks in the classroom, laboratory or workplace.

Banding strategies and techniques can take many forms, choices about the number of performance bands to associate with different examinations and qualifications are many, and the labels attached to bands vary widely around the world. A primary influence on banding practice is the fundamental nature of the assessment in question. If the assessment application is norm-referenced then a simple conversion from marks to performance bands will serve, the mark scale being used to divide the candidature into percentiles or some other preferred proportional grouping. Banding becomes a little more complicated even in this context should marks be weighted and combined across a number of different assessment components, such as written tests and performance assessments in one particular subject (as in the British single-subject A-levels), or marks achieved in assessments across a number of different subjects (as in the French Baccalauréat), or scaled scores achieved on mixed-subject tests (such as in the international survey programmes).

Criterion-referenced assessment is more challenging, as this requires attainment marks, or observed achievements, including practical performances, to be identified judgmentally as indicating qualitative differences in merit in terms of the subject skills, knowledge and understanding being assessed. Grade descriptions and grade-related descriptors are devices often used here as alternatives to boundary marks. Hybrid systems, employing aspects of both approaches, exist.

But what methods are used to identify grade boundary marks in a distribution of examination marks, whether for a single subject or across several subjects? How are appropriate proficiency bands identified on the basis of the scaled scores that result from multiple matrix sampling applications in large-scale attainment surveys. How valid are the resulting reifications of the grades, labels or levels? And how consistently can written ‘performance descriptors’ be applied by teachers and others when judging the merits of examination candidates’ performances? In other words, how much ‘absolute meaning’ about performance achievement can actually be carried in proficiency bands, whatever their nature?

The aim of the research to be described in the presentation was to explore the range of approaches to establishing ‘meaningful’ performance bands in high stakes assessments, such as national/state assessment and national qualifications, noting the rationales for each approach as well as any anomalies that might have arisen in practice, and evaluating each approach in terms of its ultimate value in providing meaningful outcomes for university selectors, employers and policy makers.

Method

This is essentially desk-based research. Our goal is to identify and review current research literature and other information sources in relation to approaches to grading/banding in high-stakes assessments, searching for accounts of practice, identifying common rationales, evaluating claimed advantages and disadvantages of each approach in terms of differentiation, classification reliability, and other matters. The literature review is supplemented by personal contacts with key individuals working in international surveys programmes, national assessment programmes and school leaving qualifications systems around the world. The principal focus of the review will be an evaluation of the merits – in theory and in practice – of the different strategies, and their ‘fitness for purpose’ as conveyors of meaning about relative and absolute achievement (Lissitz and Wei 2008).

Expected Outcomes

We have already discovered that a wide variety of grading/banding practice exists internationally, and that anomalies have emerged in several different countries, leading to changes in grading/banding strategies. One principal anomaly that has been reported in several countries in Europe and elsewhere is continually rising performance standards over time, e.g. in England (Acqua 2013) and Sweden (Ramstedt 2005). This phenomenon is typically attributed to unjustified grade inflation resulting from the methodology for determining mark boundaries, where tests are used, or to pressure on teachers to show learning improvement, where teacher judgement is the basis for system monitoring in an accountability context. The research findings will be of general interest and value to assessment system designers, and in particular those currently working under OECD encouragement on the implementation of first-time national assessment systems in numerous countries around the world (Greaney and Kellaghan 2008; Eurydice 2009).

References

Acquah, D.K. (2013) An analysis of the GCE A* grade. The Curriculum Journal, 24(4), 529-552. Burt, W.M. and Stapleton, L.M. (2010) Connotative meanings of student performance labels used in standard setting. Educational Measurement: Issues and Practice, 29(4), 28-38. Cisek, G.J. and Bunch, M.B. (2007) Standard Setting. Thousand Oaks, CA: Sage Publications. Eurydice (2009). National testing of pupils in Europe: Objectives, organisation and use of results. (http://www.eurydice.org) Greaney, V. and Kellaghan, T. (2008). Assessing National Achievement Levels in Education. Volume 1. The World Bank. Hambleton, R.K. and Pitoniak, M.J. (2006) Setting Performance Standards. In R.L. Brennan (ed.), Educational Measurement, pp433-470. Westport, CT: Praeger Publishers. Lissitz, R.W. and Wei, H. (2008) Consistency of standard setting in an augmented state testing system. Educational Measurement: Issues and Practice, 27(2), 46-55. Ramstedt, K. (2005) National assessment and grading in the Swedish school system. Oslo: Swedish National Agency for Education.

Author Information

Sandra Johnson (presenting / submitting)

Assessment Europe

Castelnaudary

Rod Johnson

Assessment Europe, France

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.