Session Information
09 SES 06 A, Relating Assessment Policies and Performance Interpretations to School and Student Variables
Paper Session
Contribution
How validly can meaning be attached to perfomance band indictaors (grades, proficiency levels, and the like)?
------------------------------------------
The inherent function of performance banding (or grading or labelling) is to identify individuals or groups of individuals who have produced higher-meriting performances than others in some defined sense on some given assessment. Banding information might be used for one or more specific high-stakes purposes: in particular, for selection of individuals for further courses of study and, ultimately, for certification; and for aggregation to produce performance distributions for use in school accountability or system monitoring. For these reasons, performance banding is one of the most challenging and one of the most contentious areas of assessment activity, rendering choice of banding strategy, i.e. ‘standard setting’ (Cizek and Bunch 2007; Hambleton and Pitoniak 2006), as well as choice of banding labels (Burt and Stapleton 2010), critical elements in any high-stakes assessment system.
Banding outcomes for individuals can be letter grades (e.g. grade B in the English A-level), numeric grades (e.g. ‘4’ in the International Baccalaureate), or verbal labels of some sort (e.g. ‘bien’ in the French Baccalauréat, ‘proficient’ in US statewide assessments, ‘proficiency Level 5’ in PISA), determined on the basis of marks, or scaled scores, achieved on tests and examinations, or by human judgement of performance on tasks in the classroom, laboratory or workplace.
Banding strategies and techniques can take many forms, choices about the number of performance bands to associate with different examinations and qualifications are many, and the labels attached to bands vary widely around the world. A primary influence on banding practice is the fundamental nature of the assessment in question. If the assessment application is norm-referenced then a simple conversion from marks to performance bands will serve, the mark scale being used to divide the candidature into percentiles or some other preferred proportional grouping. Banding becomes a little more complicated even in this context should marks be weighted and combined across a number of different assessment components, such as written tests and performance assessments in one particular subject (as in the British single-subject A-levels), or marks achieved in assessments across a number of different subjects (as in the French Baccalauréat), or scaled scores achieved on mixed-subject tests (such as in the international survey programmes).
Criterion-referenced assessment is more challenging, as this requires attainment marks, or observed achievements, including practical performances, to be identified judgmentally as indicating qualitative differences in merit in terms of the subject skills, knowledge and understanding being assessed. Grade descriptions and grade-related descriptors are devices often used here as alternatives to boundary marks. Hybrid systems, employing aspects of both approaches, exist.
But what methods are used to identify grade boundary marks in a distribution of examination marks, whether for a single subject or across several subjects? How are appropriate proficiency bands identified on the basis of the scaled scores that result from multiple matrix sampling applications in large-scale attainment surveys. How valid are the resulting reifications of the grades, labels or levels? And how consistently can written ‘performance descriptors’ be applied by teachers and others when judging the merits of examination candidates’ performances? In other words, how much ‘absolute meaning’ about performance achievement can actually be carried in proficiency bands, whatever their nature?
The aim of the research to be described in the presentation was to explore the range of approaches to establishing ‘meaningful’ performance bands in high stakes assessments, such as national/state assessment and national qualifications, noting the rationales for each approach as well as any anomalies that might have arisen in practice, and evaluating each approach in terms of its ultimate value in providing meaningful outcomes for university selectors, employers and policy makers.
Method
Expected Outcomes
References
Acquah, D.K. (2013) An analysis of the GCE A* grade. The Curriculum Journal, 24(4), 529-552. Burt, W.M. and Stapleton, L.M. (2010) Connotative meanings of student performance labels used in standard setting. Educational Measurement: Issues and Practice, 29(4), 28-38. Cisek, G.J. and Bunch, M.B. (2007) Standard Setting. Thousand Oaks, CA: Sage Publications. Eurydice (2009). National testing of pupils in Europe: Objectives, organisation and use of results. (http://www.eurydice.org) Greaney, V. and Kellaghan, T. (2008). Assessing National Achievement Levels in Education. Volume 1. The World Bank. Hambleton, R.K. and Pitoniak, M.J. (2006) Setting Performance Standards. In R.L. Brennan (ed.), Educational Measurement, pp433-470. Westport, CT: Praeger Publishers. Lissitz, R.W. and Wei, H. (2008) Consistency of standard setting in an augmented state testing system. Educational Measurement: Issues and Practice, 27(2), 46-55. Ramstedt, K. (2005) National assessment and grading in the Swedish school system. Oslo: Swedish National Agency for Education.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.