Session Information
09 SES 02 A, Assessing Students’ 21st Century Skills
Paper Session
Contribution
The need for digital literacy (DL) has never been as acute as in the current situation with the COVID-19 pandemic. 1.6 billion students around the world at the peak of the crisis studied at home or did not study at all - this is about 85% of the world's school population. Home schooling and remote teaching as a consequence of COVID-19 has become a global phenomenon in recent years. This has led to the need for students, teachers and and parents to use digital tools and the acquisition of DL skills has become a priority.
To answer the call, we developed our own DL instrument that can be used for both formative and summative assessment.
Digital literacy (Mislevy, 2018) is a difficult construct to measure, as it includes not only the ability to use digital technologies, but security, ethics, and the ability to evaluate a large amount of information. In addition, digital literacy is associated with critical thinking, communication, and collaboration, which imposes additional difficulties for its measurement. Therefore, digital literacy cannot be measured by traditional computer literacy tests, which primarily measure technical computer skills.
We define DL as the ability to safely use digital technologies for searching, analyzing, creating, managing information, communication, and collaboration to solve problems in a digital environment to meet personal, educational, and professional needs. DL is a complex latent construct. Our framework contains five key sub-constructs: 1) technical literacy, 2) information literacy, 3) computational literacy, 4) digital communication and 5) digital security as well as 15 sub-elements.
- Technical literacy (TL) is defined as a set of general knowledge and skills related to digital devices/applications/services/ tools, regardless of the platform or interface, which one has to apply to solve a specific task. It includes skills for hardware, software and networks.
- Information literacy (IL) is basic competencies of information processing (search, analysis, creation and management), necessary for working with information and solving problems in a digital environment.
- Computational literacy (CL) - the ability to understand, reformulate and generate information in order to develop, implement and optimize algorithms for solving a problem.
- Digital communication (DC) is defined as the skills needed to interact with and transmit information in a digital environment in compliance with the norms and rules of network etiquette.
- Digital security (DS) is defined as a set of skills for safe work in a digital environment.
We propose to assess DL using an instrument consisting of scenario-based performance tasks that simulate features of students’ real-life situations, following the Evidence-Centered Design methodology (ECD)(Mislevy, R; Almond, R; Lukas, J., 2003; Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., & Levy, R., 2012). The task model describes continuous actions that occur over time, just like they occur in real life (De Klerk et al., 2016; Razzouk, 2011) and provides a framework for constructing an assessment environment that forces students to demonstrate critical thinking skills. The assessment is considered low-stakes and intended to inform students, teachers, and policy-makers about digital literacy development.
In 2021 we finished task development, design work and programming the tasks with all their interactive elements for 16 total tasks. This pilot version of the assessment, included 4 tasks with over 70 hidden performance indicators. The objective of this study was to model the complex construct of DL and to assess the psychometric properties of this new instrument.
Method
We expected the factor structure to be highly complex, because the assessment consists of scenario-based tasks. Each scenario simulates a different set of digital tools (such as a text editor, browser, chat window, etc.), and on top of that, all indicators in a task are connected by a singular theme (for example, buying tickets online). Anything that makes performance on a given indicator similar, but is not related to the main psychometric factor for this indicator, will create noise in the model. This noise may be treated as local dependency between indicators or as additional latent constructs that the instrument measures, which is essentially the same for our purposes. Additionally, DL is conceptualized as a complex construct, itself consisting of 5 sub-constructs. For modeling purposes we applied confirmatory factor analysis (CFA) because of how well it lends itself to improving a model in cases where there is some uncertainty as to the true factor structure of the data. We chose the mean- and variance-adjusted weighted least squares method for parameter estimation, since it is appropriate for ordered response data (WLSMV, Muthén & Asparouhov, 2002). We considered the following critical values in order to assess model fit: RMSEA≤0,06; SRMR≤0.08; CFI>0.95; TLI>0.95 (Yu & Muthen, 2002). Based on the instrument’s theoretical framework, we developed the initial structural model. This model included all 5 subconstructs as correlated latent factors, and since we did not know in advance which task features would be significant enough to elicit local dependency, all indicators were exclusively related to the subconstructs for which they were developed. After assessing the initial model, we computed modification indices to find significant correlations between indicator residuals. We worked together with subject matter experts to make sure that all model modifications were theoretically interpretable and did not contradict the assessment framework. In cases where indicators were correlated in groups of 3 or more, we created orthogonal latent factors to model the construct-irrelevant skills, attitudes or knowledge that created the local dependency. The sample for this pilot study consisted of 627 6th-grade students from several city schools.
Expected Outcomes
The initial CFA model had moderate correlations between the 5 sub-construct factors, but poor fit to the data (RMSEA=0.023; SRMR=0.097; CFI=0.789; TLI=0.781). Guided by modification indices, we iteratively improved the model until acceptable fit was reached. Involving subject matter experts at this stage allowed us to identify and interpret instances of local dependency as we expected, but also functional dependency (one of each pair of indicators had to be removed). We also identified a problem in one of our tasks, leading to the removal of a full task and one of the sub-constructs (CL). As we modeled more instances of local dependency, correlations between the sub-constructs of DL started approaching 1. It became clear that the moderate correlations between sub-constructs in the initial model were due to the many construct-irrelevant dimensions introduced by special instruments and themes present in each task. In reality, the sub-constructs were related much closer than we expected. In the final model we combined the closest 3 sub-constructs of DL (IL, TL and DC) into one factor of general DL. DS still remained as a sub-construct (0.77 correlation with the general factor). CL was fully removed, leaving us with a 2-factor model for DL. The general DL factor contained 43 indicators, while the DS factor contained 12. The fit of the final model was acceptable (RMSEA=0.011; SRMR=0.08; CFI=0.965; TLI=0.963). Unfortunately, only the general DL factor had high enough reliability to report it as a valid measure of skill. These results demonstrate that like correlations, contrasts can also be spurious, as we discovered in the case of DL sub-constructs. However, the work we did on identifying structures of local dependency within tasks lead to task improvements and a deeper understanding of how DL manifests in performance, and we did arrive at one reliable total score for DL.
References
de Klerk, S., Eggen, T. J., & Veldkamp, B. P. (2016). A methodology for applying students' interactive task performance scores from a multimedia-based performance assessment in a Bayesian Network. Computers in human behavior, 60, 264-279. Mislevy, R. J. (2018). Sociocognitive foundations of educational measurement. NY: Routledge. Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence‐centered design. ETS Research Report Series, 2003(1), i-29. Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., & Levy, R. (2012). Design and discovery in educational assessment: Evidence-centered design, psychometrics, and educational data mining. Journal of educational data mining, 4(1), 11-48. Razzouk, R. (2011). Using Evidence-Centered Design for developing valid assessment of 21st century skills. Advancing education for 21st century success. Bellevue, WA: Edvation.com Muthén, B., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. Mplus web notes, 4(5), 1-22. Yu, C.-Y., & Muthen, B. (2002). Evaluation of model fit indices for latent variable models with categorical and continuous outcomes. Annual meeting of the American Educational Research Association, New Orleans, LA.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.