Session Information
09 SES 16 B, Exploring Methodological Advances in Educational Research and Assessment
Paper Session
Contribution
Causal inference is a crucial topic in empirical education research, as well as in other social sciences (Murnane & Willet 2011). In particular, addressing endogeneity (selection bias caused by unobserved confounders) is arguably the most important issue. However, existing methods in applied research have significant limitations in terms of applicability and policy implications.
First, despite the development of causal inference methods such as panel fixed effects model, difference-in-differences, regression discontinuity design, and instrumental variable regression, the data available for their application are limited. Omitted variable bias is a common problem in observational data, and methods that address this issue have high data requirements. While large-scale survey data used in educational research, such as PISA, TALIS, can provide valuable information, the applicability of causal inference methods is limited or non-existent.
Second, even if these methods could be used, many applied studies are limited to those that assume a linear model or a dichotomous treatment variable. If the true relationship between the outcome variable and treatment variable is nonlinear, the policy implications of the analysis by existing methods are limited or misleading. This is especially relevant in the field of education, where there are many continuous or multi-value discrete treatment variables with nonlinear effects. Class size, school size, years of teacher experience, and teachers’ working hours are typical examples(Jerrim & Sims, 2021; Kraft & Papay, 2014). As the vast amount of past empirical research and accompanying discussion on the educational production function has shown, empirical findings on the nonlinear effects of class size and years of teacher experience will have direct implications for the financial resources available to implement educational policy.
The question is, how can we address challenges like these that we often face?
It is necessary to develop a realistic identification strategy that can address endogeneity and nonlinearity. In this paper, I extend a model-based approach that uses identification via conditional heteroskedasticity (Klein & Vella, 2010) to address the above limitations on causal inference in education research.
Methods using conditional heteroskedasticity are not commonly addressed in applied research, but have been discussed in theoretical literatures. This method models the structure of error terms of equations, and differs from those based on usual design-based identification strategies, but has the significant advantage of having relatively realistic side information requirements for identification. Additionally, this approach can be easily combined with various types of existing regression models, providing more options for empirical research using observational data. I extend the linear model with the novel identification strategy to a semiparametric model (partial linear model) within Bayesian framework and demonstrate the effectiveness of the proposed model using simulated and real data.
Method
We propose a model that extends the control function approach discussed in Klein & Vella (2010) to a semiparametric regression model within Bayesian framework. After discussing the model and its estimation using MCMC methods, we evaluate its performance by using simulated data. The simulation considers both cases where the effects of endogenous treatment variables are linear and nonlinear. In addition to these simulated data, we also demonstrate the usefulness of the model in application to real data. Using real data from the Teaching and Learning International Survey (TALIS) 2018, an international survey on teachers’ working environments, we analyze the impact of teachers' long working hours on well-being, job satisfaction, and efficacy by the proposed model. Although TALIS provides useful information for the policy regarding teachers, it is difficult to apply the usual identification strategies of causal inference. Empirical research on teachers' subjective well-being and working environment has been conducted in several academic disciplines, including psychology, education, and epidemiology, but existing studies are highly flawed in terms of causal inferences. Specifically, workload is assumed to be one of the important factors when job satisfaction, sense of efficacy, and other well-being index are used as outcome variables, but the possibility that workload is an endogenous variable and correlated with unobserved confounding factors has been rarely considered. As to the nonlinearity, the question of what range of working hours has a greater impact on welfare has direct implications for the regulation of working hours and other issues. In particular, the detection of nonlinear effects of working hours (e.g., the impact increases rapidly above a certain threshold) is very important. Using the proposed model, we will analyze the effect of working hours on teachers' well-being, taking into account endogeneity and nonlinear effects.
Expected Outcomes
Our proposed semiparametric model, which uses identification strategies based on conditional heteroskedasticity, offers several advantages over existing standardized causal inference methods. This approach is less limited in terms of the range of data it can be applied to, and has the ability to detect nonlinear effects of treatment variables. The results from both simulated and real data have demonstrated its ability to successfully contribute to research on policy-relevant questions. In particular, the analysis of TALIS data applying the proposed model revealed that existing studies underestimate the impact of teachers’ long working hours on well-being and overlook nonlinear effects. Furthermore, the proposed model is more flexible due to the adoption of Bayesian modeling. An example is the random effects model (hierarchical model) used in the real data analysis in this paper. In future research, we may consider relaxing various restrictions and extending the model to a heterogeneous treatment effects model, which would allow for the treatment effect to vary among individuals. In addition, applying this model to various other research topics is also an important avenue for future research.
References
Jerrim, J. and Sims, S. (2021). When is high workload bad for teacher wellbeing? Accounting for the non-linear contribution of specific teaching tasks, Teaching and Teacher Education,105:103395. Klein, R., and Vella, F. (2010). Estimating a class of triangular simultaneous equations models without exclusion restrictions. Journal of Econometrics, 154(2), 154-164. Kraft, M. A., & Papay, J. P. (2014). Can Professional Environments in Schools Promote Teacher Development? Explaining Heterogeneity in Returns to Teaching Experience. Educational Evaluation and Policy Analaysis, 36(4), 476-500. Murnane, R. J., and Willett, J. B. (2011). Methods Matter: Improving causal inference in educational and social science research. Oxford; New York: Oxford University Press.
Search the ECER Programme
- Search for keywords and phrases in "Text Search"
- Restrict in which part of the abstracts to search in "Where to search"
- Search for authors and in the respective field.
- For planning your conference attendance you may want to use the conference app, which will be issued some weeks before the conference
- If you are a session chair, best look up your chairing duties in the conference system (Conftool) or the app.