Session Information
09 SES 14 B, Psychometric Approaches to Fair and Valid Achievement Assessment
Paper Session
Contribution
Assessment is recognized as a fundamental principle that underpins the curriculum in numerous educational systems globally. It plays a critical role in teacher professionalism and the quality of teaching. Regardless of grade level or subject area, assessment skills are essential for educators, who rely on assessment data to make important instructional and evaluative decisions about their students. Numerous studies have indicated that effective assessment can enhance teaching and learning by promoting student engagement, motivation, and achievement (Brookhart, 2011; Kessels et al., 2024; Ozan & Kıncal, 2018; Popham, 2009). Therefore, a thorough exploration of the assessment knowledge and skills that teachers are expected to possess has become increasingly important.
At this point, the notion of assessment literacy (AL), first coined by Stiggins (1991), serves as an umbrella term encompassing a broad range of educational assessment practices. It is essential for teachers to possess assessment literacy, as it facilitates the use of student learning information for effective teaching and addressing students’ learning needs (Pastore & Andrade, 2019). Thus, measuring and promoting teachers' assessment literacy has been a significant focus over the last two decades (DeLuca et al., 2015). Although the construct of assessment literacy has been discussed extensively in the literature (DeLuca et al., 2015; Deneen & Brown, 2016; Herppich et al., 2017; Popham, 2009), there is a lack of an inclusive assessment framework for this construct. The current study aims to address this gap by extending a comprehensive AL framework into an assessment framework for AL that can serve as a foundation for future research and the development of innovative instruments that meet contemporary assessment requirements. A conceptual framework named teacher assessment literacy in practice (TALiP), developed by Xu and Brown (2016) through a thorough review and synthesis of 100 studies on teacher AL, forms the foundation of this study. In contrast to other studies, TALiP considers a broader range of factors that influence teachers' assessment practices and recognizes AL as a situated, dynamic, and evolving system. It highlights the reciprocal relationships among its components, emphasizing how changes in one component can impact others.
Secondly, the current study emphasizes the need for a new measurement tool that aligns with current advancements in educational assessment and improvements in teacher competencies. Most AL instruments have been developed primarily based on the 1990 Standards for Teacher Competence in the Educational Assessment of Students (AFT et al., 1990). However, Brookhart (2011) acknowledged that while the 1990 Standards were beneficial for supporting teacher learning and valuable assessment practices, they do not fully encompass the range of assessment activities or the necessary knowledge required by today's teachers. Moreover, there have been numerous innovative developments in assessment methods, such as the increased use of e-portfolios, the encouragement of metacognitive skills, and the development of adaptive testing models. The current study aims to develop a measurement instrument covering the extended scope of AL. Furthermore, the instrument is a multistage test (MST) that allows for individualized assessments and a more precise measurement (Yan et al., 2016). Thus, the following research questions were investigated in the current study:
● What are the scope and dimensions of assessment literacy that align with contemporary assessment practices?
● To what extent does the developed multistage test demonstrate reliability evidence for measuring assessment literacy?
Method
Participants The participants in the assessment framework phase included 19 experts from various stakeholder groups, such as academicians and assessment specialists, who volunteered to contribute to the study. The test development phase involved preservice teachers (PSTs) from 11 programs, selected based on their completion of at least one assessment course. Instrument Firstly, a draft of the framework was created based on the knowledge base of the TALiP framework (Xu & Brown, 2016), which encompasses the core components of AL, drawn from an extensive review of numerous studies. Experts' opinions on the draft were gathered individually through semi-structured online interviews. The suggestions derived from these interviews were synthesized with literature findings, resulting in the preparation of an assessment framework for AL. Based on the final version of the framework, an item pool was created, including 105 multiple-choice items. All items underwent pilot testing, which included a debriefing session with 10 PSTs. A Multiple Matrix Sampling (MMS) design was employed to regulate item exposure, minimizing the burden on students and enhancing the efficiency of the assessment process (Shoemaker, 1973). In line with MMS, seven booklets were created, each containing 10 items in the non-rotating section and 10 items serving as rotating anchors. The booklets, formatted for paper-and-pencil testing (PPT), were administered to 1223 PSTs from 11 different universities in Turkey. According to the item analysis of this field test, a multi-stage test (MST) on the Concerto platform utilizing a 1-3-3 panel design was developed. The MST comprises three stages and seven modules. Each module includes one question from each of the eight dimensions, incorporating items that meet the requisite psychometric properties. The routing module was constructed with items of medium difficulty. When selecting questions for the additional modules, care was taken to include items within each sub-dimension that covered a range of difficulty levels—easy, medium, and difficult—based on their respective b parameters. The selection of modules was guided by Maximum Fisher Information (MFI). The expected a posteriori (EAP) method was favored for ability estimation due to its advantages over alternative methods, including reduced computational demands (Embretson & Reise, 2000). The MST was administered to 106 PSTs. Data Analysis Item Response Theory (IRT) served as the foundation for the analysis of test results from both the PPT and MST formats using R Studio. One-parameter logistic model was utilized for item analysis. The reliability was assessed via Standard Error of Measurement (SEM).
Expected Outcomes
For the first research question, revisions were made in the light of expert reviews. The dimensions of the framework have been expanded to encompass competencies beyond mere knowledge, and the scope of subdimensions has been enlarged accordingly. Two new dimensions that are not included in the TALiP framework have been proposed, namely “Competency in Technology-Based Assessment” and “Competency in Developing Assessment Tools.” Furthermore, the framework places a heightened emphasis on the appropriateness of assessment practices in relation to targeted cognitive levels and learning objectives, computerized adaptive tests, and the use of artificial intelligence. To enhance the clarity of the dimensions’ scope, several illustrative examples have been incorporated into specific subdimensions. For example, some experts indicated that the phrase “Knowing/ Applying assessment ethics” may not be sufficiently precise for all stakeholders; consequently, the statement “Avoiding sharing test results, providing fair test conditions for every student, and not using assessment as a means of punishment, etc.” has been integrated into the framework. For the second research question, firstly the results of the PPT were analyzed.The b parameters in PPT ranged from -2.64 to 2.57, indicating a broad measurement scope to develop MST. For MST, the standard error exhibited an inverse relationship with the information function, consistent with the principles of the IRT. It was concluded that the mean standard error value decreased as the test stages progressed (Mean SE1 = 0.43, Mean SE2 = 0.32, Mean SE3 = 0.27). This trend suggests that the reliability of the test results improved, allowing for a more precise measurement of ability levels. The average SEM was calculated to be 0.27, and the reliability coefficient derived from the mean SEM was 0.93. This finding indicates that the scores obtained through the MST demonstrate a very high level of consistency.
References
Brookhart, S. M. (2011). Educational assessment knowledge and skills for teachers. Educational Measurement Issues and Practice, 30(1), 3–12. https://doi.org/10.1111/j.1745-3992.2010.00195.x DeLuca, C., LaPointe-McEwan, D., & Luhanga, U. (2015). Teacher assessment literacy: a review of international standards and measures. Educational Assessment Evaluation and Accountability, 28(3), 251–272. https://doi.org/10.1007/s11092-015-9233-6 Deneen, C. C., & Brown, G. T. (2016). The impact of conceptions of assessment on assessment literacy in a teacher education program. Cogent Education, 3(1), 1225380. https://doi.org/10.1080/2331186x.2016.1225380 Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates Publishers. Herppich, S., Praetorius, A., Förster, N., Glogger-Frey, I., Karst, K., Leutner, D., Behrmann, L., Böhmer, M., Ufer, S., Klug, J., Hetmanek, A., Ohle, A., Böhmer, I., Karing, C., Kaiser, J., & Südkamp, A. (2017). Teachers' assessment competence: Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education. https://doi.org/10.1016/J.TATE.2017.12.001 Kessels, G., Xu, K., Dirkx, K., & Martens, R. (2024). Flexible assessments as a tool to improve student motivation: an explorative study on student motivation for flexible assessments. Frontiers in Education, 9. https://doi.org/10.3389/feduc.2024.1290977 Ozan, C., & Kıncal, R. Y. (2018). The Effects of Formative Assessment on Academic Achievement, Attitudes toward the Lesson, and Self-Regulation Skills. Educational Sciences Theory & Practice. https://doi.org/10.12738/estp.2018.1.0216 Popham, W. J. (2009). Assessment Literacy for teachers: faddish or fundamental? Theory Into Practice, 48(1), 4–11. https://doi.org/10.1080/00405840802577536 Schumacker, R. E. & Lomax, R. G. (2004). A Beginners Guide to Structural Equation Modeling. New Jersey: Lawrence Erlbaum Associates, Inc. Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappan, 72 , 534–539. Xu, Y., & Brown, G. (2016). Teacher assessment literacy in practice: A reconceptualization. Teaching and Teacher Education, 58, 149–162. https://doi.org/10.1016/j.tate.2016.05.010American Federation of Teachers, National Council on Measurement in Education, & National Education Association (AFT et al.). (1990). Standards for teacher competence in educational assessment of students. Washington: National Council on Measurement in Education Yan, D., Von Davier, A. A., & Lewis, C. (2016). Computerized multistage testing. In Chapman and Hall/CRC eBooks. https://doi.org/10.1201/b16858
Update Modus of this Database
The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.