In educational testing students sometimes have to choose among several items, topics, etc. and proceed to solve only selected few. This practice in testing is called examinee choice and it seemingly improves student motivation and performance since students should choose most suitable items and therefore present their strongest performance. On the other hand, such practice leads to different students taking different combinations of items and raises questions of equivalency of test results.
This topic is not widely explored mainly because it is rarely practiced in high stakes testing. Bridgeman, Morgan and Wang (1996) succinctly point out that the choice of essay topic should be left to examinee only when the objective of testing is the proficiency to organize facts, shape solid arguments, etc. about a topic they are familiar with and not when the actual knowledge of topic is of interest. Burton (1993) similarly points out that choice should be offered when we want to measure ability to choose and not otherwise. Fitzpatrick & Yen (1995) on the other hand advocate use of choice in testing to increase authenticity of assessment. Gordon (1992) goes even further and claims that choice is essential for fairness in testing. Wiggins (1993) similarly promotes choice as a way to increase motivation of students since they can demonstrate their strengths. Examinee choice seems more democratic and seemingly shifts some of the control to the examinee. It also raises several problems like questions of equivalence of test between students, questions of fairness, validity and reliability.
Slovenian general Matura examinations consists of five independent subject examinations (mother tongue, mathematics, first foreign language and two subjects students choose from broad list of available subjects). Biology, Physics, History of Art and to some extent Mathematics could be an example of subject examinations in Slovenian general Matura that include choice in their examination. Since Matura is a high stakes examination as results are used for the admission to university we should ensure that exams are indeed fair. In the context of testing fairness implies that items available to choose from are equivalent.
Items under review are mostly contructed response items marked by human raters and have usually multiple points. As items can differ in many characteristics (content, format, difficulty, discrimination) we will start with most basic psychometric characteristic – difficulty of the item. Research question will be raised along equivalence of different items, among which students choose in a test in different subjects. Null hypothesis is stated as "there’s no statistical significant differences in difficulties of the items students have to choose from".