Session Information
Contribution
When tests are used for making decisions, quality of measurement becomes very important. Whenever testing isn't unique but takes place regularly like public examinations for example, improvements are always welcome and often necessary to ensure highest quality of measurement before decision-making process takes place. In effort to increase validity and content coverage of test, items are becoming increasingly complex as they are trying to simulate real- life problems. If tests are composed of complex items that demand scoring by human raters, errors increase due to raters' subjectivity. Magnitude of error due to rating process is usually estimated through correlation between two independent ratings of same tests - so called objectivity index. Use of objectivity index is widespread and it is easy to understand, but of limited use when we are planning how to improve our test or wish to know where to start with our improvements. Author suggests another statistic - standard error of rating process (SERP), which essentially stems from generalizability theory and represents square root of error variance due to subjectivity of raters. This statistic provides additional information and should be used along objectivity index. Since generalizability theory often demands substantial computational strain author also suggests alternate computational approach in the special case of two raters for each subject which is faster and easier to compute and should therefore facilitate the use of above mentioned statistic. Standard error of rating process as indicator of rating process quality provides information additional to objectivity index and can help us when deciding between priorities of improvement. Should we focus our effort on rating process and improve marking scheme, scoring instructions or raters' training or should we spend our resources on improving things like representativeness of test content, standardization of test taking procedure, familiarization of candidates with test procedure and content, etc. SERP can be used comparatively with standard error of measurement (SEM) that usually provides information on overall error associated with test results. If we can demonstrate that errors due to rating process are independent of other errors of measurement, ratio of (error variance due to rating process)/ (total error variance) could be calculated.If time permits, use and interpretation of standard error of rating process will be illustrated by examples of Slovene Matura exams - public examination exams at the end of secondary education. Their function is twofold - they serve for completion of secondary education and also for entrance exams at the same time and therefore can be regarded as 'high-stakes' tests.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.