The ‘Big Data’ era has dramatically increased the availability of documents written and stored digitally. While these empirical materials represent a formidable object of analysis, they remain underused in education research. In this paper, I will show that text mining – a set of computerized methods for extracting and quantifying information from large textual databases – offers promising prospects for understanding inequalities of access to higher education. More specifically, I will highlight how such methodology can be used to complement results in more traditional research designs.
Using a corpus of 16,000 college admission essays from French applicants to pre-medical studies at a Parisian university in 2020, I will argue that the application of text mining makes it possible to reveal differentiated socializations as well as to explain unequal educational and professional outcomes according to gender, social class, academic achievement and school environment. Despite the growing popularity of “holistic” admissions processes in higher education, I will thus insist that qualitative application materials like personal statements are no exceptions to biases produced by social inequalities.
First, I will underline the strategic differences in how students introduce themselves: writing style, motivations and qualities put forward for being a good medical student, and narratives about personality and past experiences. I will explain that these are “traces” of an unequal access to information and guidance about higher education and of variations in socializations and representations. Second, I will analyze the disparities in terms of how students project themselves into the future: degrees of precision of the professional project and expected health specializations. In particular, I will highlight that tastes and preferences for academic and career fields are already strongly predetermined before enrollment in postsecondary education.
Since this research is part of a larger mixed-methods project on pre-medical studies, I will finally spend time discussing the contributions, complementarities and limits of text mining compared to other more traditional materials I used (surveys, ethnographic observations, interviews) when studying students’ higher education choices. Notably, I will talk about the importance of interdisciplinarity in educational research and how ‘Big Data’ technologies might offer new research pathways in the future.