Can We Identify Guessing in Student Response Patterns? – An Analysis of Simulated Data and Case Studies.

Author(s):

Chris Freeman(presenting / submitting)

Conference:

ECER 2016

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 10 C, Methodological Issues in Tests and Assessments

Paper Session

Time:

2016-08-25

15:30-17:00

Room:

NM-F107

Chair:

Jan-Eric Gustafsson

Contribution

This research builds on the investigations conducted in 2015 and is grounded in the intersecting theoretical frameworks of validity of assessments (Messick, 1989), and the use of Item Response Theory (IRT), to analyse and report student achievement tests. The area of interest is the psychometric impact of guessing in large scale assessments, and in particular cases where guessing is an acknowledged, and in some cases encouraged, in student response strategies for multiple choice tests, but not accounted for in the assessment analysis techniques.

Many of major large-scale assessments cited below use Item Response Theory as the underlying theoretical and conceptual framework to estimate student achievement. For instance, PISA and NAPLAN use the Rasch model (Rasch, 1960), which requires that the probability of a student correctly responding to the cognitive demands of any particular test question is a function of the difficulty of the question and the ability of the student in relation to the characteristic (or trait) being assessed. In contrast, TIMSS and PIRLS apply variants of the Item Response Theory model that attempt to take account of specific characteristics of the item-student interaction in regard to the discrimination of the items that comprise the test, and in some cases an attempt to account for guessing.

In large scale assessments that involve multiple choice items assessed using Item Response Theory, guessing is either unaccounted for (Rasch, 1960) or treated as a property of the item calibration model, (Birnbaum, 1968, Hambleton et al,1985,1991). This paper is underpinned by a multi-facetted approach that considers the issue of guessing in simulated data and then in case study data collected in the field. The purpose of the field study is to identify the extent to which the case study data replicate the theoretical outcomes generated by the simulated data. The case study data will also attempt to identify patterns of responses, and characteristics of analyses that may assist in the identification of guessing in a student’s response pattern and hence inform the development of a mechanism to validly account for any potential mis-information or statistical errors in reporting student performances impacted by guessing.

Given the lack of clarity regarding how, and to what extent guessing is accounted for in these various models this research will further investigate the impact of guessing on the estimation of item difficulty and its impact on the consequent estimation of student ability.

The research has three major data sources to investigate the problem:

Simulated Guttman-like data in which guessed items are defined and specifically identified so that the item and person parameters can be calibrated without/and with accounting for defined guessing by independent analyses of the raw and conditioned data sets;
Simulated Rasch-like data in which guessed items can be ‘identified’ by a comparison of the relative item location to the person ability estimate (Andrich, et al (2011) and then re-estimation of item parameters and person ability estimates conducted using the modified data set that accounts for ‘identified guessing’;

and

Data collected by fieldwork in which students engage with two instruments. These instruments are designed to be curriculum content and grade appropriate. The first is presented in a foreign language and students are encouraged to guess from the contexts of the items. The second instrument is exactly the same items delivered in English so that students can present their responses in the familiar environment. It is assumed that, as the second instrument is targeted to the sample and in a familiar context and language, the outcomes of this interaction will provide a “true estimate” of the student ability in the domain of interest.

Method

The proposed methodology is a three stage plan to develop a thorough theoretical understanding of the problem and then progress to the practical environment using the specifically designed field data sets. STAGE 1 engages the relationship between Rasch (1960) and Guttman by simulating Guttman-like data in which the ‘incorrect responses’ have been corrupted by guessing. The correctly guessed item/person interactions (the 'corruption') will be defined and known to the researcher. In a model that takes no account of guessing, the corrupted data that has ‘correct guesses’ in areas that Guttman suggests are beyond the student ability. These items will be initially scored as correct (1) which theoretically will under-estimate item difficulty and over-estimate person ability for correctly guessed items. The responses are then re-analysed with the ‘guesses’ coded as ‘missing’. These calibrations assume a ‘True Score’ is known for each student and that the true score has been corrupted by guessing (a Traditional Test Theory concept). STAGE 2 generates Rasch data using the Conquest analytical program(Adams, Wu et al (2015)) and specifically developed program code that allows for response data to, initially stochastically identify guessing, and then following data conditioning, to manipulate the responses data to model the calibration of item difficulties and person abilities with and without taking account of the identified ‘guessing’ in the data streams. These calibrations assume that the ‘true ability’ and the ‘true item difficulty’ parameters are known and these parameters are corrupted by guessing. STAGE 3 method involves introducing students to a Mathematics instrument that requires students to guess or apply problem solving techniques to respond to Curriculum and Grade - appropriate items that will be presented in the Arabic language to English speaking students. The second stage of the data collection presents the same instrument to the same students in English. Comparisons of the student response patterns will be made based on the self-reported’ random guessing, informed guessing for students of varying abilities in mathematics at the target Grade levels. Data will be collected from a convenience sample of two groups of students; one from Year 5 and the other from Year 7. There are no assumptions regarding parameters in these data and the estimates of the item and person parameters will be derived from the analyses of the data.

Expected Outcomes

For the simulated data, (Stage 1 and Stage 2) multiple random data sets will be generated and analysed to confirm the reliability of the inferences and conclusions derived from these methodologies. Analysis of these data will be conducted using Conquest and RUMM as common Rasch model analysis programs. The outcomes of the calibration of item difficulty estimates (with and without accounting for guessing) and person abilities will be compared. The response data will also be analysed using a 3PL analysis program and these results compared to the dual outcomes of the Rasch analyses. In collection the student response data (Stage 3), two variables will be collected on each item: (i) the data response and (ii) the strategy used to respond – ability (known answer); and ‘educated guess’ – an elimination strategy; or a random guess. Both of these variables will be used in the analysis of the data to calibrate item and person estimates with/and without accounting for self-reported guessing. Student response patterns will also be interrogated to gain insight into how response patterns, for students in grouped ability levels, may assist in identifying guessing and hence how guessing manifests itself in the analysis programs and common IRT graphical output. Conclusions will be drawn on the basis of the relative applicability of different models and the reliability of the data and information produced as valid indicators of student performance on which to base critical educational policy and interventions. The final objective of the research is to produce a report outlining the findings of the focus questions and identify any further areas of research that might be precipitated by the outcomes. The data and findings will be shared with the participants and preliminary conclusions discussed.

References

1. Andrich, D., Marais, I., & Humphry, S. (2011) Using a Theorem by Anderson and the Dichotomous Rasch Model to Assess the Presence of Random Guessing in Multiple Choice Items. Journal of Educational and Behavioural Statistics, 37:417. 2. Frary, A.B., Cross, L.H. & Lowry, S.R. (1977) Random Guessing, Correction for Guessing and Reliability of Multiple-Choice Test Scores. The Journal of Experimental Education. Vol. 46, No. 1 (Fall, 1977), pp. 11-15. 3. Lau, P. N. K., Lau, S. H., Hong, K. S., & Usop, H. (2011). Guessing, Partial Knowledge, and Misconceptions in Multiple-Choice Tests. Educational Technology & Society, 14 (4), 99–110. 4. Paek, I. (2015). An Investigation of the Impact of Guessing on Co-efficient a and Reliability. Applied Psychological Measurement 2015, Vol 39 (4) 264 - 277. 5. Waller, M.I. (1974) Removing the Effects of Random Guessing from Latent Trait Ability Estimates. Educational Testing Service, Princeton N.J. ETS-RB-74-32. 6. Zand Scholten. A. (2011) The Guttman-Rasch paradox in item response theory. Downloaded from UvA-DARE. University of Amsterdam; http://hdl.handle.net/11245/2.86877

Author Information

Chris Freeman (presenting / submitting)

Australian Council for Educational Research

Alexandria

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.