Explanatory Item Response Modelling Of An Abstract Reasoning Assessment: A Case For Modern Test Design.

Author(s):

Fredrik Helland(presenting / submitting)Johan Braeken

Conference:

ECER 2016

Network:

09. Assessment, Evaluation, Testing and Measurement

Format:

Paper

Session Information

09 SES 10 C, Methodological Issues in Tests and Assessments

Paper Session

Time:

2016-08-25

15:30-17:00

Room:

NM-F107

Chair:

Jan-Eric Gustafsson

Contribution

General description

Reasoning tests are popular components of the assessment toolbox for selection and admission into higher education and job employment (Leighton, 2004; Stanovich, Sá, & West, 2004). Abstract reasoning tests tap into a core reasoning ability (Carpenter, Just, & Shell, 1990) with tasks that require the examinee to generate and apply rules (Wüstenberg, Greiff, & Funke, 2012), but require neither explicit prior contents-specific knowledge of the examinee nor specific language skills (Raven, 2000). Traditionally, test construction and assembly have been the product of creative item writing processes and post-hoc psychometric evaluations, without explicit consideration of cognitive theory (Hunt, Frost, & Lunneborg, 1973). Yet, abstract reasoning provides a case that in principle is ideally suitable for modern test design (e.g. Embretson, 1998; Mislevy, Almond, & Lukas, 2003), combining cognitive theory with a more systematic approach to construction and assembly of test items.

Objective and Research Questions. This study is part of a larger project aimed at reverse engineering an existing abstract reasoning test from a modern test design perspective to setup a virtual item bank that does not store individual items, but instead uses automatic item generation rules based on cognitive complexity (see e.g., Gierl & Haladyna, 2013). The objective of the current study represents one step towards such a virtual item bank with research questions focusing on (i) identifying the cognitive relevant item features (i.e. “radicals”) that impact the behaviour of the test and of the participants and (ii) identifying the merely “cosmetic” irrelevant item features (i.e., incidentals).

The test. The abstract reasoning test is composed of testlets consisting of items related to the same problem situation from which a set of rules need to be derived that are necessary to solve the individual items. Each testlet is structured around a problem set consisting of a varying number of rows each consisting of a specified input stimulus configuration, an activated set of action buttons and a resulting output stimulus configuration. This problem set allows the examinee to derive the transformations that will happen to the input when a specific action button is activated. This rule knowledge is necessary to solve the connected items. An item consists of a single row with a specified input stimulus configuration, the activated set of action buttons for that item, and four alternative output stimulus configuration possibilities of which the examinee has to decide on the correct one.

Theoretical framework. A rational task analysis of the abstract reasoning test proposes an artificial intelligent algorithm (see Newell & Simon, 1972) that consists of 4 core steps. (1) Inventorisation: all the characteristics of input stimulus configurations and output stimulus configurations of the problem set are registered; (2) Matching: an input/output dissimilarity matrix is computed; (3) Rule finding: computationally this would be similar to solving a system of equations or a more greedy version using elimination; (4) Rule application. The test has some characteristics built in by design that can be directly connected to the artificial intelligent algorithm and the related (i) cognitive load of the stimulus material and (ii) cognitive complexity of the rules that need to be derived. Examples of the former characteristics can be as simple as the number of symbols in the input stimulus configuration, examples of the latter characteristics can be whether or not the transformation caused by a specific action button can be derived on its own (i.e., independent of the other action buttons in the problem set). Some theoretically irrelevant item features can also be defined such as the type of symbols used in a stimulus configuration (e.g., triangle or circle).

Method

In a first stage to supplement our theoretical framework, a cognitive laboratory procedure is run based on the existing abstract reasoning test to further investigate and define important cognitive mechanisms as well as potentially relevant/irrelevant structural item properties. The cognitive lab (n = 6 participants) consists of a think-aloud study, a retrospective interview (see Ericsson & Simon, 1993) and a usability questionnaire (the NASA task load index, see Hart & Staveland, 1988).Both quantitative duration data (i.e. time spent on task) and qualitative data (i.e. externalisations made by the participants during the think-aloud and interview) are collected. In a second stage we will be modelling the item response data of n = 6689 Belgian nationals that took the abstraction reasoning test consisting of 10 testlets of 4 items each. The baseline model is a descriptive item response model with crossed-random effects for person and items with an additional local dependence structure to account for the testlet design of the test. The relations between the hypothesized incidental / radical item properties and test behaviour will then be further investigated by means of explanatory item response modelling (De Boeck & Wilson, 2004). Hence, item parameters of interest (e.g., item difficulty, lower asymptote, LID parameter) will be regressed on item level predictors. The models will be estimated using the STAN probabilistic language for Bayesian statistical inference (Stan Development Team, 2015) in the open statistical software R (R Core Team, 2014).

Expected Outcomes

Cognitive lab. The cognitive lab procedure is expected to yield relevant information on the test taking behaviour and experiences of the examinees. This will supplement the proposed theoretical framework and expand or modify a priori defined incidental and radical item features that on their turn will inform the subsequent statistical modelling. Explanatory item response modelling. The explanatory item response-modelling is expected to assess the quality and practical relevance of the theoretical framework by validating the proposed incidentals and radicals. To which extent are the theoretically motivated relevant item features acting as “radicals” (i.e., impacting test behaviour), and are the theoretically irrelevant item features indeed merely “incidentals”? Relevance. This study will lead to further studies where the explanatory item response model will be cross-validated on new test data to finalize the proof-of-concept for new abstract reasoning tests. A proof-of-concept would require that changes to irrelevant “cosmetic” features (incidentals) of the test do not impact the psychometric behaviour of the test nor participants’ test performance, whereas manipulating cognitive relevant item features (radicals) does change both the psychometric test behaviour and participants’ performance on the test, and this in the direction and extent predicted by the derived test framework. Given the massive costs of creating and maintaining item banks for sustained use, both in small and large scale enterprises, a modern test design framework can be considered a major push forward for a broad application of this abstract reasoning test.

References

Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: a theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97(3), 404–431. http://doi.org/10.1037/0033-295X.97.3.404 De Boeck, P., & Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer. Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396. http://doi.org/10.1037/1082-989X.3.3.380 Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: verbal reports as data (Revised). Cambridge: The MIT press. Gierl, M. J., & Haladyna, T. M. (2013). Automatic item generation: an introduction. In M. J. Gierl & T. M. Haladyna (Eds.), Automatic item generation: theory and practice. New York: Routledge. Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock & N. Meshkati (Eds.), Human Mental Workload. Amsterdam: North Holland Press. Hunt, E., Frost, N., & Lunneborg, C. (1973). Individual Differences in Cognition: A New Approach to Intelligence. Psychology of Learning and Motivation - Advances in Research and Theory, 7(C), 87–122. http://doi.org/10.1016/S0079-7421(08)60066-3 Leighton, J. P. (2004). The Assessment of Logical Reasoning. In J. P. Leighton & R. J. Sternberg (Eds.), The Nature of Reasoning (pp. 291–312). Cambridge: Cambridge University Press. Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A Brief Introduction to Evidence-centered Design, (July). Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, N.J: Prentice-Hall. R Core Team. (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.r-project.org/ Raven, J. (2000). Psychometrics, cognitive ability, and occupational performance. Review of Psychology, 7(1-2), 51–74. Retrieved from http://mjesec.ffzg.hr/revija.psi/vol 07 no 1-2 2000/Raven_2000-7-1-2.pdf Stan Development Team. (2015). Stan Modeling Language Users Guide and Reference Manual, Version 2.8.0. Retrieved from http://mc-stan.org/ Stanovich, K. E., Sá, W. C., & West, R. F. (2004). Individual Differences in Reasoning. In J. P. Leighton & R. J. Sternberg (Eds.), The Nature of Reasoning (pp. 375–409). Cambridge: Cambridge University Press. Wüstenberg, S., Greiff, S., & Funke, J. (2012). Complex problem solving - More than reasoning? Intelligence, 40, 1–14. http://doi.org/10.1016/j.intell.2011.11.003

Author Information

Fredrik Helland (presenting / submitting)

University of Oslo

CEMO: Centre for Educational Measurement

Oslo

Johan Braeken

CEMO: Centre for Educational Measurement, Faculty of Educational Sciences, University of Oslo, Norway

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.