Session Information
09 ONLINE 29 B, Trials of New Assessment Methods in Post-secondary Education
Paper Session
MeetingID: 859 3745 6622 Code: Z5HZrz
Contribution
The need for high-stakes examinations to be proctored remotely grew exponentially from the onset of the COVID-19 pandemic, which brought with it the closure of traditional testing sites. This disruption led to many credentialing examinations, scheduled to take place in testing centres throughout 2020, being moved online. As such, this involved a shift from a standardised to a non-standardised administration of tests. Live remote proctoring is associated with a range of benefits, including increasing accessibility for international test candidates and those located in rural areas. Remote proctoring enables candidates to take a test, using their own device (with some limitations) from a location of their own choosing (most often at home). However, there remains a lack of empirical evidence to support the comparability of outcomes from examinations that are proctored in different ways. It has been highlighted that pervious research in this area is “overly simplistic” and focuses on comparing pass rates only (ATP, 2021, p.19). Testing organisations are also aware that there is limited empirical evidence available to support decision making on several important issues and this is especially apparent in high stakes credentialling and professional licensure contexts.
The existing literature focusing on remote proctoring can be split into three related categories. The first is research examining the use of remote proctoring as a safeguard against online cheating. The second relates to research concerning candidate usability and the user experience of different remote proctoring programmes. The third is the limited body of research that examines the psychometric equivalence of proctoring modes. The third category of research most closely matches the focus of the present study.
When considering this literature, it is evident that empirical research in all three areas largely focuses on comparing completely un-proctored to proctored examinations. There is very little research that specifically compares across different proctoring methods, for example, in-person versus live remote proctoring (Weiner and Hurtz, 2017). Furthermore, research in this area often draws on data from higher education examinations. This focus makes it difficult to generalise the results and shows that there is a need for further empirical research concentrating on high-stakes professional licensure and credentialling examinations. While research comparing across proctoring methods remains scarce, particularly in the field of credentialling, there are two studies that are noteworthy. The first examines graduate student performance in an online economics course (Wuthisatian, 2020), while the second utilises data from three US licensing examinations (Weiner and Hurtz, 2017). However, contradictory evidence is reported across these two studies. Wuthisatian (2020) found evidence that students taking an exam under in-person proctoring scored statistically significantly higher than those taking the exam via remote online proctoring. Comparatively, Weiner and Hurtz (2017) found that candidate scores and test psychometric properties were equivalent across in-person and live remote proctoring modes. In light of these contradictory results and given the high-stakes nature of licensure and credentialing testing, there is a need to expand research in this area and provide further empirical evidence.
As a response to the lack of research in this area, this paper examines not only the equivalence of candidate outcomes but also the comparability of test psychometric properties from credentialling examinations proctored using live remote proctoring (LRP) technology or in traditional testing centres (test centre proctoring or TCP).
The objective of this research was to examine if candidate outcomes and test psychometric properties are equivalent across high-stakes professional licensure examinations taken under TCP and via LRP. Specifically, this research endeavoured to answer the following two research questions:
RQ1: Are outcomes for candidates equivalent across TCP and LRP modes?
RQ2: Are the psychometric properties of tests equivalent across TCP and LRP modes?
Method
This research drew on data from 11 different professional licensing examinations in the field of insurance, administered between May and December 2020 across 4 US states. Each examination included multiple choice single-response items and test length ranged from 90 to 150 items. Data was available for n = 14,097 test candidates, with large sample sizes also available for each individual examination. Sample sizes across the two modes of administration were equally satisfactory, ranging from 484 to 1070 TCP candidates, and 213 to 831 LRP candidates. The examinations were administered via a computer at either an authorised testing centre, supervised by on-site proctors, or remotely using live remote proctoring software and supervised by a live proctor. In addition to the physical presence of proctors in testing centres, candidates were aware that their test sessions were also being audio and video recorded. Those candidates taking an examination via live remote proctoring were permitted to complete the test on either a laptop or a desktop device however, tablets and cell phones were not allowed. Prior to the LRP examination, each candidate was required to verify that their webcam, internet access and bandwidth capabilities either met or exceeded the minimum specifications required for remote administration. On the day of testing, candidates had to verify their identity and use their webcam to show the proctor a 360-degree view of the room in which they were taking the test. During the examination candidates were aware that they could not leave the view of the proctor and that all browsers were locked down. All LRP sessions were also audio and video recorded.To answer Research Question 1, candidate test scores (average percent correct) were compared across proctoring modes using mean comparisons and independent samples t-tests. Associated Hedge’s g effect sizes were also calculated. In addition, the proportion of candidates passing across testing modes was examined, and associated Cramer’s Phi effect sizes calculated. In terms of Research Question 2, five psychometric issues of key interest to accrediting agencies were focused on: reliability (KR-20), decision consistency (Subkoviak’s Coefficient of Agreement), test item difficulty (P+), test item discrimination (point biseral) and time taken to complete (in minutes).
Expected Outcomes
This research provides strong evidence of comparability for high-stakes professional licensing examinations administered across TCP and LRP modes. In addressing RQ1, the results from this study provide no evidence that candidate outcomes are different across the two modes of administration. Considering the 11 examinations in combination reveals that the average scores achieved were only slightly lower for TCP candidates (82.96) compared to LRP candidates (83.21). Furthermore, the average effect size was small (Hedge’s g = 0.19). The average proportion of TCP candidates passing (0.53) also did not differ greatly to LRP candidates (0.56). Again, the average effect size was small (Phi= 0.08). In addressing RQ2, the psychometric properties of TCP and LRP examinations were found to be comparable. Average reliability was almost identical across modes, at 0.90 in TCP administrations and 0.91 in LRP. While average decision consistency was 0.84 in TCP sessions and 0.85 for LRP. Average item difficulty was found to be identical at 0.69. The average discrimination index was found to be lower in TCP administrations (0.39), compared to LRP (0.45), however, both values are considered satisfactory. Finally, this study found that TCP candidates took more time on average (almost 5 minutes) to complete the examination, compared to LRP candidates. However, the mean effect size was once again small (Hedge’s g = 0.25). While some statistically significant differences were found for individual examinations, the differences were small, and no pattern was observed in favour of either testing mode. Taken together, the results of this research provide empirical evidence of similarities between candidate outcomes and test psychometric properties of examinations that are proctored in test centres and via live remote proctoring. These results will be of interest to testing organisations/providers who are concerned that proctoring method may led to differences in test outcomes.
References
Alessio, H. M., Malay, N., Maurer, K., Bailer, A. J. and Rubin, B. (2017). Examining the Effect of Proctoring on Online Test Scores. Online Learning, 21(1). ATP (Association of Test Publishers). (2021). Test centres are dead, long live test centres. Webinar audio transcript retrieved from: https://cdn.atphub.org/wp-content/uploads/2021/05/25144003/Audio-Transcript-1.pdf Karim, M. N., Kaminsky, S. E. and Behrend, T. S. (2014). Cheating, Reactions, and Performance in Remotely Proctored Testing: An Exploratory Experimental Study. Journal of Business and Psychology, 29(4), 555–572. Reisenwitz, T. H. (2020). Examining the Necessity of Proctoring Online Exams. Journal of Higher Education Theory and Practice, 20(1), 118–125. Weiner, J. A. and Hurtz, G. M. (2017). A Comparative Study of Online Remote Proctored versus Onsite Proctered High-Stakes Exams. Journal of Applied Testing Technology, 18(1), 13–20. Wuthisatian, R. (2020). Student exam performance in different proctored environments: Evidence from an online economics course. International Review of Economics Education, 35(October 2019), 100196. https://doi.org/10.1016/j.iree.2020.100196
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.