“It Pays to Be Ignorant: A Simple Political Economy of Rigorous Program Evaluation”, 2002 (; backlinks):
This paper attempts to explain the scarcity of rigorous evaluations of public policy.
I build a positive model to explain the “stylized fact” that there is under investment in the creation of reliable empirical knowledge about the impacts of public sector actions.
The model shows how “advocates” of particular issues or solutions—the public action equivalent of entrepreneurs—have incentives to under invest in knowledge creation because having credible estimates of the impact of their preferred program may undermine their ability to mobilize political (budgetary) support.
[Keywords: program evaluation, bureaucracy, issue advocates]
This paper was motivated by my 12 years in the World Bank, a large, international, quasi-public, bureaucracy whose objective was “development” and whose instrument was providing loans to developing country governments. The organization’s lending activities have spanned the range: from dam construction to family planning to micro credit to steel mills to “social funds” to macroeconomic stabilization to land reform. The World Bank is for the most part staffed by internationally recruited, highly trained, well-meaning, and experienced professionals and is arguably the premiere development institution. And yet, nearly all World Bank discussions of policies or project design had the character of “ignorant armies clashing by night”—there was heated debate amongst advocates of various activities but very rarely any firm evidence presented and considered about the likely impact of the proposed actions…How can this combination of brilliant well-meaning people and ignorant organization be a stable equilibrium?
In the United States no one can market a prescription medicine for male pattern baldness without evidence it is “safe and effective”. The accepted regulatory standard evidence of safety and effectiveness is a controlled, randomized, double-blind, evaluation. Yet the non-profit “market” is flooded with a continual new stream of proposed programs and interventions. Few public sector actions, even those of tremendous importance, are ever evaluated to the standard required of even the most trivial medicine. To take just one example, in the United States there is a huge and continuing debate over the importance of smaller class sizes for academic performance in primary and secondary education. One side of the debate points to the fact that per pupil expenditures in public schools have doubled while test scores have changed very little and to many studies which find no effect of class size to argue that it is plausible that hundred of billions of dollars of educational resources have been misallocated. The other side of the argument suggests that smaller class sizes are associated with stronger performance. The point is not that one side is obviously right and the other obviously wrong—the point is that brilliant, well meaning people can legitimately disagree on so fundamental a question as class size impacts on educational quality—yet there is no similar debate on the efficacy of treatments for male pattern baldness.
[As of 2022, class-size reduction movement (based heavily on correlational research) has lost steam and been abandoned by major proponents such as the Gates Foundation, with research on it showing the usual decline effects towards zero as it became more rigorously evaluated.]
…In a search on the EconLit there are 29 references to the Job Training and Partnership Act (JPTA). Why? Not because it was ever a particularly large or important federal program—in the 1990s its budget was around US $1.6 billion—but Title II of the JPTA provided for the largest randomized evaluation of training and hence analysts use the data over and over again—even though the program itself was ended.
It is a waste of money to add an evaluation component to nutritional programs—these evaluations never find an impact anyway—we should just move ahead with what we nutritionists know is right.
Nutritional advocate in project decision meeting
It’s amazing how many bad projects get support. Epistemologically, why do you think that is?
[email from a colleague]
…At the core of the model are 2 assumptions about evaluations and about political support for particular programs. The first assumption is that a randomized evaluation is impossible without the cooperation of the advocates responsible for program implementation so that evaluations can only happen if advocates see them in their best interest.
The second assumption is that advocates are more altruistic and care more about outcomes in their specific issue than does the public. Given this concern for outcomes they want to pursue the most effective instrument and, at any given level of the efficacy of the use of resources (outcome gain per dollar), they want a larger budget. If the budget is politically determined advocates view the problem of evaluation in a dual light. On the positive side evaluations potentially help improve program efficacy so they get more bang for the buck. But evaluations have a potential downside if they reduce political support for a larger budget for their program so they get fewer bucks…Doing a rigorous evaluation has the drawback that it may lower the mean belief about efficacy—sufficiently to erode program support—relative to what could have been achieved by promotional activities. So the question is whether the benefits—essentially avoiding promotional costs—are worth the costs of an evaluation.
In this model advocates may choose ignorance over public knowledge of true program efficacy. They are better off if the voting public does not know the “true” benefits even if it means they too must operate somewhat in the dark…I believe that these situations are common—because the gap between the altruism of advocates and the public can be large. It is hard to know how to marshal evidence, but I suspect that evaluations are rare because the middle group has low altruism and few interventions have sufficient efficacy to satisfy them relative to the level of efficacy required by advocates and core supporters (which may included providers and beneficiaries who benefit directly). In this case (essentially in the classification above—made more likely by uncertainty) there will be many programs operating and promoted and lobbying for middle group support—but resisting evaluation…If a program can already generate sufficient support to be adequately funded then knowledge is a danger. No advocate would want to engage in research that potentially undermines support for his/her program. Endless, but less than compelling, controversy is preferred to knowing for sure the answer is “no”.
…Second, I do not have a complete sample, but many programs that have had randomized evaluations were in fact eliminated—and it is not clear whether this had anything to do with the evaluation or not. The voucher program in Colombia was eliminated before the randomized evaluation results were even available. The training program evaluated under JTPA was terminated. The fact they were eliminated ex post at least suggests these program were without solid political support and the evaluation itself was a strategy of weakness.
Third, randomized evaluations are often implemented by those out of the mainstream, groups with much less to lose if the outcome is adverse. For instance, the randomized evaluation of the provision of textbooks in Kenya was carried out by a small NGO—not the government (Kremer et al). The implementation and evaluation of the Colombia voucher program was not carried out in the Ministry of Education (King et al).
Fourth, it is interesting to look at the pressures behind the evaluations that do exist, and typically one finds that the proposed intervention was either not supported by the “core supporters” or had strong opposition otherwise.