Integrating AI in Qualitative Educational Research: A Comparative Analysis of Human and AI-Based Coding in National AI Strategies

Author(s):

Zoltán András Szabó(presenting / submitting)Emese Schiller Ágnes Csipke Anita Olga Graur László Horváth

Conference:

ECER 2025

Network:

12. Open Research in Education

Format:

Paper

Session Information

12 SES 12 A, Literature, Reviews and AI

Paper Session

Time:

2025-09-11

15:30-17:00

Room:

Room 1 | SANU main entry 1. Fl

Chair:

Contribution

Natural Language Processing (NLP) has gained increasing prominence in qualitative research due to its ability to process large volumes of text efficiently and systematically (Turobov et al., 2024; Zhang et al., 2023). The advent of advanced AI models, particularly Large Language Models (LLMs) such as ChatGPT, has demonstrated significant potential in enhancing qualitative analysis by improving efficiency in tasks such as text labeling and clustering (Jalali & Akhavan, 2024). These models possess the capability to synthesize vast amounts of textual data, identify emerging themes, and support qualitative researchers in broadening and deepening their analyses through automated processes (Kantor, 2024). Empirical studies have demonstrated that LLMs can partially replicate key aspects of inductive thematic analysis, effectively extracting primary themes from semi-structured interviews and other qualitative data sources (Paoli, 2023). However, existing research highlights several inherent challenges, including the potential loss of nuanced interpretations, variability in model performance across different analytical approaches—both deductive and inductive—and the broader implications for code diversity and consistency (Kasperiuniene & Mazeikiene, 2024; Siiman et al., 2023).
Optimizing the performance of LLMs in qualitative research thus necessitates meticulous prompt engineering, as well as the careful integration of domain-specific expertise to ensure meaningful and contextually appropriate outputs (Zhang et al., 2023). For this reason, researchers are strongly advised to maintain a comprehensive and critical understanding of their data, rigorously cross-validating AI-generated results to guarantee both reliability and validity of their investigation (Rostam et al., 2024). In light of these considerations, this study aims to explore and critically evaluate the potential of integrating AI models, particularly ChatGPT, into qualitative research methodologies. The educational significance of this research is underscored by its focus on AI, a transformative technology within education, and its methodological foundation in content analysis, a widely recognized and established approach in social science research (e.g. Krippendorff, 2018), including the field of education. . Based on the outlined challenges and opportunities, the study formulates the following research questions:

To what extent do inductive codes generated by researchers align with those produced by AI?
How do similarity measures between human and AI coding vary in inductive content analysis?
What specific content areas exhibit extreme (exceptionally high or low) similarity in AI-human coding comparisons?

Project no. 146998 has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the OTKA-FK funding scheme.

Method

The overarching aim of this research is to examine the integration of AI in qualitative educational analysis, building on previous studies in the field (Turobov et al., 2024; Zhang et al., 2023). The study utilized national education policy documents from Central European countries as primary sources, with a particular focus on UNESCO's (2019) recommendations on AI in education. These recommendations encompass multidisciplinary approaches, policy integration, teacher roles, curriculum transformation, AI tool integration, adaptive learning, AI-powered assessment, AI's societal impact, and equal opportunity. These thematic areas served as deductive codes in the analysis. Each national education policy document represented a sampling unit, while subchapters within the documents formed the coding units of analysis. A hybrid coding approach, combining deductive and inductive analysis (Authors, 2020; 2023), was employed to identify key patterns within the policy texts. The primary objective was to compare coding outcomes generated by human experts with those derived from AI-based coding tools, thereby assessing AI’s efficacy in qualitative data analysis. To establish a baseline for comparison, manual coding was conducted by independent field experts. Two weeks after the initial coding, two other experts reviewed the dataset to ensure accuracy and reliability of human coding. Coders analysed the entire text, aligning their findings with the deductive categories. Overlapping codes were permitted when multiple deductive units applied. To prevent redundancy, executive summaries and introductions were excluded. Building upon recent advancements in NLP, particularly the development of LLMs like ChatGPT, this study employed Natural Language Generation (NLG) systems such as GPT-4o. The AI was provided with pre-identified text segments enriched with contextual background information, research questions, and predefined deductive codes. It was instructed to propose subordinate inductive codes where applicable and to avoid duplicating codes across different deductive units. AI-generated coding was subsequently reviewed by another coder, who identified correspondences between human- and AI-generated codes based on semantic similarity. A text comparison software leveraging semantic folding was employed to facilitate efficient natural language processing and semantic analysis (Cortical.io, n.d.).

Expected Outcomes

Results indicate that the semantic similarity between human-generated and AI-generated inductive content analysis (aggregated across coding units) ranged from 47% to 55%, with individual values spanning from 26% to 64%. Furthermore, the standard deviation ranged from 5.46 to 7.71, suggesting that neither the average similarity nor the variance in results substantially differed across the analysed documents. Thus, the semantic proximity between human and AI coding appears to be only marginally influenced by the individual performing the labelling or by the specific national AI policy under examination. In addition to that, we analyzed the content areas corresponding to the lowest and highest semantic similarity scores—representing the greatest divergence and closest alignment between human and AI coding, respectively. With one exception, these cases were exclusively associated with the first deductive code, Planning AI in Education Policies. Expanding the scope to include not only the lowest and highest scores but also very low and very high similarity values, the above-noted deductive code remains influential; however, other theory-driven categories gradually gain prominence. This indicates that while the first deductive code plays an important role in determining semantic similarity, variations in similarity scores may also be influenced by the increasing relevance of additional theory-driven categories as divergence or alignment intensifies. These findings suggest that human coders, following predefined instructions, joint preparation, and expert review, were able to create a coding scheme that aligns substantially with AI-generated patterns. Notably, both human coders and AI independently identified meaningful inductive content that did not appear in the other’s results. This indicates that a combined human-AI approach could offer significant advantages in exploratory research, as each contributes unique strengths in uncovering latent content. Furthermore, this hybrid approach may mitigate the limitations associated with inductive analysis, which is often more challenging to quantify compared to deductive content analysis (Sántha, 2012).

References

De Paoli, S. (2023). Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? An exploration and provocation on the limits of the approach and the model. In arXiv [cs.CL]. https://doi.org/10.48550/ARXIV.2305.13014 Jalali, M. S., & Akhavan, A. (2024). Integrating AI language models in qualitative research: Replicating interview data analysis with ChatGPT. SSRN. https://doi.org/10.2139/ssrn.4714998 Kantor, J. (2024). Best practices for implementing ChatGPT, large language models, and artificial intelligence in qualitative and survey-based research. JAAD International, 14, 22–23. https://doi.org/10.1016/j.jdin.2023.10.001 Kasperiuniene, J., & Mazeikiene, N. (2024). Artificial Intelligence in Qualitative Research: Methodological Turn. Paper presented at the Proceedings of the 8th World Conference on Qualitative Research. Krippendorff, K. (2018). Content analysis: An introduction to its methodology. Sage Publications. Rostam, Z. R. K., Szénási, S., & Kertész, G. (2024). Achieving peak performance for large language models: A systematic review. arXiv. https://arxiv.org/abs/2409.04833v1 Siiman, L. A., Rannastu-Avalos, M., Pöysä-Tarhonen, J., Häkkinen, P., & Pedaste, M. (2023). Opportunities and Challenges for AI-Assisted Qualitative Data Analysis: An Example from Collaborative Problem-Solving Discourse Data. In: Y-M. Huang, & T. Rocha (Eds.) Innovative Technologies and Learning. ICITL 2023. Lecture Notes in Computer Science, vol 14099. Springer, Cham. https://doi.org/10.1007/978-3-031-40113-8_9 Sántha, K. (2012). Numerikus problémák a kvalitatív megbízhatósági mutatók meghatározásánál. Iskolakultúra, 22(3), 64–73. https://www.iskolakultura.hu/index.php/iskolakultura/article/view/21248/21038 Turobov, A., Coyle, D., & Harding, V. (2024). Using ChatGPT for thematic analysis. arXiv preprint arXiv:2405.08828. UNESCO. (2019). Beijing Consensus on Artificial Intelligence and Education. Outcome document of the International Conference on Artificial Intelligence and Education ‘Planning education in the AI era: Lead the leap.’ UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000368303 Webber, F. D. (2015). Semantic Folding Theory And its Application in Semantic Fingerprinting. ArXiv, abs/1511.08855. Zhang, H., Wu, C., Xie, J., Lyu, Y., Cai, J., & Carroll, J. M. (2023). Redefining qualitative analysis in the AI era: Utilizing ChatGPT for efficient thematic analysis. arXiv preprint arXiv:2309.10771.

Author Information

Zoltán András Szabó (presenting / submitting)

Institute of Education, ELTE Eötvös Loránd University, Budapest, Hungary

Emese Schiller

Institute of Research on Adult Education and Knowledge Management, ELTE Eötvös Loránd University, Budapest, Hungary

Ágnes Csipke

Institute of Research on Adult Education and Knowledge Management, ELTE Eötvös Loránd University, Budapest, Hungary

Anita Olga Graur

Institute of Research on Adult Education and Knowledge Management, ELTE Eötvös Loránd University, Budapest, Hungary

László Horváth

Institute of Education, ELTE Eötvös Loránd University, Budapest, Hungary

Update Modus of this Database

The current conference programme can be browsed in the conference management system (conftool) and, closer to the conference, in the conference app.
This database will be updated with the conference data after ECER.

Search the ECER Programme

Search for keywords and phrases in "Text Search"
Restrict in which part of the abstracts to search in "Where to search"
Search for authors and in the respective field.
For planning your conference attendance, please use the conference app, which will be issued some weeks before the conference and the conference agenda provided in conftool.
If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.

Session Information

Contribution

Method

Expected Outcomes

References

Author Information

Update Modus of this Database

Search the ECER Programme

Navigation

Info for