04 SES 06 C, Education Evaluation, Absenteeism And The Right To Inclusive Education & SDGs
Randomised control trials (RCTs) have often been presented as the ‘gold standard’ of educational evaluation (Goldacre 2013), providing what is often seen as the most robust evidence available to educational practitioners and policy makers (see for example the UK Education Endowment Foundation toolkit). Although evaluation research has a role in educational research, a side effect of this ‘what works’ culture is that experimental research often does not discuss explicitly that RCT findings are particular to specific contexts (e.g. area, school and children’s characteristics) and mediational factors (such as teachers’ and pupils’ self-efficacy, relationship or motivation). These factors are interacting with each other in complex ways that programmes that were found to be successful in some contexts might prove less so in different ones. This issue has been more openly discussed (Vaughn et al. 2016) and become even more apparent when there is no difference between the treatment and control group (null findings), and the researchers have to explore the possible reasons behind their inconclusive findings. This paper echoes some of the arguments and concerns expressed in the literature about the complexity associated with educational evaluation (indicatively Goodman et al. 2018; Hammersley 2015; Thomas 2016) based on the findings of the recent trial of the Integrated Group Reading (IGR) programme, an inclusive targeted early reading intervention.
The IGR programme is a tier 2 intervention targeting Year 2 and 3 pupils (6-8 years of age) who are delayed in reading and is taught by class teachers in small groups of struggling readers during the existing (in English primary schools) small group organisation of lessons. It is part of a classroom-wide model, with all pupils being in groups receiving teacher attention over a period of a week, supported by a teaching assistant. The programme was designed on the assumption that this arrangement would not disadvantage the other children in the classroom despite the investment of teacher time to few struggling readers, since it was built on the existing organisation.
The IGR programme uses a range of 52 specially written reading books with simple illustrations and accompanying story-specific games, developed with the narrative requirements of later-learning readers in mind, and deliberately short so that one story could be completed in each lesson. IGR was trialled by the Graduate School of Education of the University of Exeter with Year 2 and 3 pupils in 34 English schools in five varied local authority areas across two years (2015-2017), with funding from Nuffield Foundation.
The project involved an RCT with a process evaluation and found that children in schools using IGR made the same degree of progress in reading accuracy, comprehension and attitudes compared to similarly struggling children in control schools. This was consistent across both phases of implementation. However, there was great variation in the way teachers implemented the programme and the recorded gains (Norwich et al. 2018). This latter finding (and the recognition that the IGR programme was a complex intervention implemented in real classrooms) drew attention to the context in which IGR was implemented. The crucial question was whether the null RCT findings meant that IGR was not an effective programme.
Drawing on the findings of the IGR trial, the process evaluation and the teacher case studies, the paper discusses current perceptions and approaches in educational evaluation; and uses this example of an RCT to illuminate questions about the design and associated assumptions of complex teaching and evaluation approaches.
The IGR programme was trialled by the Graduate School of Education of the University of Exeter with Year 2 and 3 pupils in 34 English schools in five varied local authority areas across two years (2015-2017). The project was funded by Nuffield Foundation. The programme and evaluation teams were both based in Exeter but operated separately. IGR was run for 28 weeks (i.e. 7 months) in both phase 1 and 2. The study explored the immediate and long-term effects (defined as 6 months after the programme implementation) of the IGR programme in reading accuracy and comprehension, reading attitude and overall attitude to school. In addition, the context of the programme implementation was explored, as well how reading was taught in the control schools. The programme evaluation had a mixed methodological design, involving a clustered randomised control trial (clusters at the school level) with the comparison group in control schools on a waiting list to use the intervention (in phase 2); and a process evaluation of implementation and teachers’ and pupils’ IGR experiences. This involved in-depth school level case studies, and a 2-weekly log to monitor the fidelity of implementation. For the process evaluation, 14 schools (8 in phase 1 and 6 in phase 2) (mixed range of rural, sub/urban schools), each acting as different cases, were visited across the four local authority areas. In each school one (or more) teacher-led IGR session/s was observed and one (or more) teacher/s was interviewed. As part of the process evaluation, we also conducted a number of teacher case studies. We expected that high fidelity of IGR teaching would be associated with greater reading gains for the IGR groups—yet, this was not confirmed in all cases. The selection of cases was based on a combination of fidelity and mean IGR group reading gain scores. Using the fidelity and reading scores, different combinations were selected to represent teachers, so that we could explore the reasons behind match (high programme fidelity/ high reading scores – low fidelity/ low scores) and mismatch teacher cases (high fidelity/ low scores – low fidelity/ high scores).
The analysis suggests that IGR is not a simple intervention that can be applied or not irrespective of its teaching context. Its introduction as a programme involved a complex web of interactions, resulting in what has been called a complex intervention (Moore et al. 2015) and there were a variety of local factors that can be seen to affect programme implementation and outcomes. The paper concludes that with regards to programme evaluation, it might be too simple to seek to answer whether a programme works or not, while being silent about the circumstances. As Pawson and Tilley (2004) note, it is better to ask the question: ‘for whom, in what circumstances, in what respects, and how?’ (p. 2). This might not provide the certainty that some researchers might aspire to nor policy makers would prefer, but it does capture the complexity associated with programme evaluation. It can also give an insight into the factors that make a programme more or less successful and give directions for revisions and further development. This acknowledgement about the nature of programme evaluation can further be related to the place of research in the development of teaching approaches. A distinction can be made between a traditional research and development (R&D) model in which knowledge is established (a summative type of knowledge) and then applied to practice (EEF 2016) and a development and research (D&R) model in which an innovative teaching approach is developed and then evaluated formatively (Bentley and Gillinson 2007). There is no reason why policy makers could not come to favour both D&R and R&D approaches to improving teaching and learning once researchers come to see RCT as one and not always the first choice amongst experimental approaches.
Bentley, T., & Gillinson, S. (2007). A D&R system for education. London: Innovation Unit. Education Endowment Fund (EEF) (2016). The EEF at 5. London: EEF. https://educationendowmentfoundation.org.uk/public/files/Publications/5th_Anniversary_Brochure_Final.pdf Goldacre, B. (2013). Building evidence into education. London, Department for Education. Goodman, L. A., Epstein, D. & Sullivan, M. (2018). Beyond the RCT: Integrating rigor and relevance to evaluate the outcomes of domestic violence programs, American Journal of Evaluation, 39 (1), 58–70. Hammersley, M. (2015). Against ‘gold standards’ in research: On the problem of assessment criteria. Paper presented at Was heißt hier eigentlich ‘Evidenz’? Fruhjahrstagung 2015 des AK Methoden in der Evaluation Gesellschaft fur Evaluation (DeGEval), Fakultat fur Sozialwissenschaften, Hochschule fur Technik und Wirtschaft des Saarlandes, Saarbrucken, Germany, May. Available online at: http://www.degeval.de/fileadmin/users/Arbeitskreise/AK_Methoden/Hammersley_Saarbrucken.pdf Moore, G. F., Audrey, S., Barker, M., Bond, L., Bonell, C., Hardeman, W. et al. (2015). Process evaluation of complex interventions: Medical Research Council guidance. BMJ, 350, h1258. Norwich, B., Koutsouris, G. & Bessudnov, A. (2018). An innovative classroom reading intervention for Year 2 and 3 pupils who are struggling to learn to read: Evaluating the Integrated Group Reading (IGR) programme – Project Report. Available online at: http://www.integratedgroupreading.co.uk/evaluation-project/ Pawson, R. & Tilley, N. (2004). Realist evaluation. Available online at: http://www.communitymatters.com.au/RE_chapter.pdf Thomas, G. (2016). After the gold rush: Questioning the ‘gold standard’ and reappraising the status of experiment and randomized controlled trials in education. Harvard Educational Review, 86 (3), 390–411. Vaughn, S., Solis, M., Miciak, J., Taylor, W. P. & Fletcher, J. M. (2016). Effects from a randomized control trial comparing researcher and school-implemented treatments with fourth graders with significant reading difficulties. Journal of Research on Educational Effectiveness, 9 (suppl. 1), 23–44.
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.