Session Information
Contribution
Description: The use of pupil results from testing is widespread. With such high stakes being placed on pupil examinations it is desirable to have an objective standard for marking against which all pupils are assessed (see Moss 1994). However, in disciplines such as English where pupils are generally required to write longer answers or essays, markers are required to make subjective judgements about how well a question has been answered. Under such conditions maintaining consistency between markers may be difficult. Different markers may prefer different styles of writing or attach greater weight to different elements of a pupils answer. The aim of this paper is to examine the extent to which such variability can occur.
A new methodology is proposed to answer this question based on recent developments in statistical modelling.
Methodology: Reading and writing examination papers from 49 students were each graded by 9 separate markers. The reading examination consisted of 35 questions using both closed and open-response question styles. For the writing test each student had to supply two pieces of writing on specified subjects. These pieces of writing were assessed on 6 different criteria such as composition, punctuation and text organisation.
A particular aim of the research was to provide an assessment of the extent to which variability in examination scores was dependent upon three separate influences. These influences are described below.
" Pupil ability is the most obvious source of variation. More able pupils will tend to get higher marks than less able pupils. If all markers were to consistently agree on what mark to give then this would account for 100% of the variability in pupil scores.
" Marker leniency is defined as the extent to which a marker consistently awards higher marks for a question than other markers.
" Marker variance is defined as the extent to which markers may be attracted to different elements of pupils responses. In our conceptualisation this is defined as being separate from leniency in that this does not imply that certain markers will consistently award higher or lower marks. It is recognition of the fact that for one pupil a particular marker may advocate a higher number of marks than other markers but may advocate a lower number of marks than other markers for the next pupil.
Using recently developed statistical techniques in the field of multilevel modelling (see Hill and Goldstein 1998) it is possible to disentangle these three sources of variation and calculate the percentage of variation between scores that are accounted for by each of these. Cross-classified multilevel models were used to identify the extent to which different elements of each test as well as total test scores were affected by each of the three influences.
Conclusions: Whilst it was found that levels of bias between markers tended to be quite low the analysis clearly shows the importance of exploring marker consistency in test design. Even closed-response questions that would be expected to show complete consistency can display higher than expected levels of variation between markers.
In the case of writing the analysis shows the difficulty in achieving objective measurement of writing ability. Assessments of handwriting displayed particularly large amounts of marker variance.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.