School Funding and Test Scores: Do Cross Sectional and Causal Designs Disagree?
A short review of the evidence
On X, there has recently been a debate surrounding the efficacy of increased school funding, which began with Emil Kirkegaard posting a chart showing a very low correlation between school-level per-student funding and achievement test scores. Many economists responded to him by saying that cross-sectional studies are bad, and that he should have cited causal studies, which, they claimed, contradict the cross sectional data. However, there is a major issue with their response: the causal estimates they cited found essentially the same results as the cross-sectional data! Here, I summarize the best cross sectional and causal data, and show how they support each other.
Cross-Sectional Studies Using Achievement Test Score Outcomes
The Civil Rights Act, signed by President Johnson in 1964, required that the Office of Education conduct a study of educational inequality. This study, known as the Coleman Report after its leading author, examined the achievement test scores of schoolchildren between the first and twelfth grades, from hundreds of thousands of schools across America. Coleman summarized the report’s conclusions as follows:
In assessing the importance of various school factors for achievement, the school factors were grouped into three large clusters…The clusters were teachers’ characteristics, all school-facility and curriculum characteristics excluding teachers, and characteristics of the student environment, obtained by aggregating student characteristics in his grade in school. The general result was that the factors that, under all conditions, accounted for more variance than any others were the characteristics of the student’s peers; those that accounted for the next highest amount of variance were teachers’ characteristics; and, finally, other school characteristics, including per pupil expenditure on instruction in the system, accounted for very little variance at all (Coleman, 1990, p. 73).
When controlling for basic demographics, the per pupil expenditure (PPE) of a school only explained 2-2.5% of the variance in black test scores and under 1% of white scores. Taking the square root of the figures in the table below gives correlations of .16, .16, and .15 for blacks in grades 12, 9, and 6, respectively. For whites, the correlations for the same grades were .09, .08, and .06; clearly, the relationship was very weak for both races.
Fifty years after the Coleman report was published, Stephen Morgan and Sol Jung (2016) replicated this finding using the Educational Longitudinal Study, which recruited a representative sample of tenth graders in its initial 2002 wave. This study was the one originally cited by Kirkegaard. Specifically, he posted this graph:
The variance explained for the math test by PPE was 0.2%, and, for the reading test, 0.1% (Table 6), equivalent to correlations of 0.04 and 0.03. There are a few plausible explanations for why these correlations are lower than the ones found in the Coleman Report. First, there was no form of demographic adjustment in the ELS study; if sex, age, etc. are correlated with the outcome measure (test scores), but not with PPE, adding them as covariates in the regression will artificially increase the correlation. The second, more boring explanation is simply that between-school funding differences decreased since the 1960s, thus giving the variable less variation by which it could explain test scores. Either way, it is obvious that the correlation between PPE and test scores is small today, and was small in the 1960s, even if the absolute level of smallness is different.
Randomized Control Trials Using Test Score Outcomes
Randomized control trials (RCTs) are a staple in econometrics. They estimate the effect size by randomly assigning a treatment (e.g., half of a school’s students receive free tutoring) and comparing the treated sample to the controls. Since these interventions typically involve similar changes to what would be expected from more funding, findings from these studies can tell us something about the causal effect of school funding increases. A recent meta-analysis of RCTs was conducted by Hugues Lortie-Forgues and Matthew Inglis (2019). Their main results looked like this, with an overall median of 0.03 and weighted mean of 0.04:
These effect sizes represent the SD difference between the treatment and control groups, with an effect size of 0.04 meaning that being treated increases your score by one twenty-fifth of a standard deviation or, in IQ terms, by 0.6 points. This is obviously quite small, but is also difficult to interpret because the RCTs must have been heterogenous in terms of the extent of their treatment.
Quasi-Experimental Studies Using Test Score Outcomes
A recent meta-analysis of quasi-experiments was published by Jackson and Mackevicius (2024). The review included studies that analyzed the impact of policy changes, typically using methods similar to the case study approach described in my recent article about the Minimum Wage literature. The overall effect size, a little over 0.03, corresponds to the SD increase in test scores for every $1,000 increase in PPE. Since correlation is just SD change divided by SD change, and the standard deviation of school-level PPE is about $3,750,1 this represents a correlation of .11—a very small effect size that is roughly in line with the cross-sectional analyses.
Conclusion
Cross-sectional analyses of school funding and test scores show very low correlations. A meta-analysis of randomized control trials found that the weighted average effect size was just 0.04, meaning that being treated with the intervention predicted only a 0.04 SD increase in your score on an academic achievement test. And, finally, a meta-analysis of policy-induced funding increases found an effect size equivalent to a correlation of just 0.11. All of these findings converge on one fact: school funding has very little influence on achievement. This is assuming, of course, that gains on test scores reflect anything real—my guess is that they do not, and that the true effect is zero. But the goal of this post is not to prove that; rather, it is to point out, as Crémieux repeatedly has during this debate, that the cross sectional and causal effect sizes are roughly equivalent!
References
Coleman, J. (1990). Equality and Achievement in Education. Taylor & Francis.
Jackson, C. & Mackevicius, C. (2024). What Impacts Can We Expect from School Spending Policy? Evidence from Evaluations in the United States. American Economic Journal: Applied Economics, 16(1), 412-446.
Lortie-Forgues, H. & Inglis, M. (2019). Rigorous Large-Scale Educational RCTs Are Often Uninformative: Should We Be Concerned? Educational Researcher, 48(3), 158-166.
Morgan, S. & Jung, S. (2016). Still No Effect of Resources, Even in the New Gilded Age? The Russell Sage Foundation Journal of the Social Sciences, 2(5), 83-116.
Murray, S. & Rueben, K. (2008). Racial Disparities in Education Finance: Going Beyond Equal Revenues. Report for the Urban Institute.
The standard deviation of PPE is surprisingly hard to get. Rueben and Murray (2008) report that the coefficient of variation, that is, the SD divided by the mean, for PPE was between about 0.25 and 0.3 from 1972 to 2002. In 2022, the average public school spent $15,591 per student; because Jackson and Mackevicius used 2018 dollars, this needs to be adjusted down to about $13,500. Multiplying this by 0.25 gives a product of $3,375, and, by 0.3, $4,050. Therefore, I use a nice “middle” number of $3,750.







You beat me by 1 hour. https://www.emilkirkegaard.com/p/against-the-economists-on-school
The convergence between cross sectional and causal studies is really intresting here. Its almost like people jump to dismiss correlational data without actually checking if experimental designs tell a diferent story. The Coleman Report findings from 1964 holding up today suggests we've been throwing money at a problem that doesnt respond to funding in the way policymakers assume.