Session Information
09 SES 09 A, Investigating Challenges in High-stakes Exams and Assessment Policies
Paper Session
Contribution
In educational testing students sometimes have to choose among several items, topics, etc. and proceed to solve only selected few. This practice in testing is called examinee choice and it seemingly improves student motivation and performance since students should choose most suitable items and therefore present their strongest performance. On the other hand, such practice leads to different students taking different combinations of items and raises questions of equivalency of test results.
This topic is not widely explored mainly because it is rarely practiced in high stakes testing. Bridgeman, Morgan and Wang (1996) succinctly point out that the choice of essay topic should be left to examinee only when the objective of testing is the proficiency to organize facts, shape solid arguments, etc. about a topic they are familiar with and not when the actual knowledge of topic is of interest. Burton (1993) similarly points out that choice should be offered when we want to measure ability to choose and not otherwise. Fitzpatrick & Yen (1995) on the other hand advocate use of choice in testing to increase authenticity of assessment. Gordon (1992) goes even further and claims that choice is essential for fairness in testing. Wiggins (1993) similarly promotes choice as a way to increase motivation of students since they can demonstrate their strengths. Examinee choice seems more democratic and seemingly shifts some of the control to the examinee. It also raises several problems like questions of equivalence of test between students, questions of fairness, validity and reliability.
Slovenian general Matura examinations consists of five independent subject examinations (mother tongue, mathematics, first foreign language and two subjects students choose from broad list of available subjects). Biology, Physics, History of Art and to some extent Mathematics could be an example of subject examinations in Slovenian general Matura that include choice in their examination. Since Matura is a high stakes examination as results are used for the admission to university we should ensure that exams are indeed fair. In the context of testing fairness implies that items available to choose from are equivalent.
Items under review are mostly contructed response items marked by human raters and have usually multiple points. As items can differ in many characteristics (content, format, difficulty, discrimination) we will start with most basic psychometric characteristic – difficulty of the item. Research question will be raised along equivalence of different items, among which students choose in a test in different subjects. Null hypothesis is stated as "there’s no statistical significant differences in difficulties of the items students have to choose from".
Method
Data will include several specific subject examinations from general Matura in Slovenia for years 2018-2020 (Biology, Physics, History of Arts, Mathematics). Using data from National Examinations Centre we will analyse results for whole cohorts in selected subjects that range in number from 400 to 1500 students. Data will be analysed using Item response theory framework procedures. As we will be primarily interested in difficulty parameter only, 1PL or Rasch models will be used (Bond & Fox, 2007). Statistical environment R and package TAM will be used for most analyses. Latest research on parameter estimation of choice items (Wang et al., 2012; Liu & Wang, 2017) shows difference between estimation algorithms and sets conditional maximum likelihood estimation (CMLE) as gold standard and joint maximum likelihood estimation (JMLE) as worst performing. We used marginal maximum likelihood (MML) algorithm as it was readily available and Liu et al (2017) reported it performed well in empirical situation.
Expected Outcomes
Since items are not pretested they will most likely differ in their difficulties. We will explore the size of the differences and draw conclusions depending on the findings. Authors will discuss the implications for practice. As Salecl (2011) and Schwartz (2009) point out choice is not always helpful. They might present the issue in the field of consumerism but the same principle can be applied in testing situation. When tests offer choice, students must choose. We measure knowledge and ability to choose at the same time, through same items. What happens if student lacks in ability to choose well? S/he chooses suboptimally, selecting the items where her/his score will not be the highest. Choice is welcome when it doesn't interfere with measurement objectives. In other instances, it is a hindrance that can raise serious questions of reliability and validity.
References
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Second Edition (2. ed., p. 352). Lawrence Erlbaum. Bridgeman, B., Morgan, R., & Wang, M. (1996). Choice among essay tonics: Impact on performance and validity (ETS Research Report 96-4). Princeton, NJ: Educational Testing Service. Fitzpatrick, A.R. in Yen, W.M.(1995). The psychometric characteristics of choice items. Journal of Educational Measurement, 32(3).243-259. Gordon, E. W. (1992). Implications of diversity in human characteristics for authentic assessment. (CSE Technical Report 341). Los Angeles: National Center for Research on Liu, C. & Wang, W. (2017). Parameter Estimation in Rasch Models for Examinee-Selected Items. Journal of Educational Measurement, 54(4), 518–549. Salecl, R.(2011). Choice. London: Profile Books. Schwartz, B.(2009). The paradox of Choice- why more is less. Pymble: Harper Collins. Wang, W., Jin, K., Qiu, X. & Wang, L. (2012). Item Response Models for Examinee-Selected Items. Journal of Educational Measurement, 49(4), 419–445.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.