Session Information
Session 5A, Formative assessment, peer assessment, external assessment
Papers
Time:
2003-09-18
17:00-00:00
Room:
Chair:
Sarah Howie
Contribution
Setting standards, especially in national tests, is a process that demands the greatest attention and ideally a range of methods, both statistical and judgemental, to ensure that the best evidence is available to support the decisions made. One of the judgemental approaches taken in the past has been based on that introduced by Angoff, but there are some perceived shortcomings to this method. A new approach, termed the 'Bookmark', has begun to be used in preference to the Angoff approach, mainly in the USA. The main difference between the two approaches is that in Angoff judges have to assign probabilities for borderline pupils getting each item right, whereas in the Bookmark items are presented to judges in ascending difficulty order and they merely have to judge the most difficult item which a borderline pupil would pass with a given probability. The Bookmark approach has the advantage that the judges have to make fewer, more tightly-focused decisions than those required by Angoff. However, the success and reliability of the method depends on the judges' accuracy in agreeing the criteria for the standard to be set, and on their ability to judge test items against these criteria. The Bookmark method was originally developed by CTB/McGraw-Hill and is now used in many states of the USA. The usual stages in the procedure are:· using item response theory, the difficulty of each item on the test is estimated;· a booklet is prepared, displaying each item from the test on a different page, beginning with the easiest and ending with the hardest;· judges place a bookmark after the last item which they think students at the required level should be able to do;· the estimate of difficulty for the item below the bookmark is taken to represent the ability of a student at the target level;· a statistical procedure is employed to convert each judge's bookmark location into a cut-score recommendation;· final cut-score recommendations are based on the median of panel members' cut scores.In the USA, Bookmark judges are generally asked to decide on what the standard for mastery should be, which means that descriptions of the performance criteria for mastery are developed after the cut score has been set. In the UK, a more common approach is that features of performance at different levels are defined by the curriculum. The Bookmark approach therefore needs to be adapted to ensure that judges have a common understanding of the performance profile of pupils at the borderline of levels. Judges also need to be able to apply these criteria consistently. In addition, the statistical model used for analysis needs to be able to accommodate the varied types of open and closed response items often used in UK tests. This paper will describe the results of a trial using this approach for standard setting in an English reading test for 11-year-old pupils. The theoretical background and the decisions made on the statistical approach used to identify item difficulty and set cut scores will be briefly described. The conclusions reached about the type of training needed for the judges and the procedures to be used for a Bookmark session will be outlined, and suggestions will be made about how the method can be adapted for use in other contexts.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.