“Evaluating the Econometric Evaluations of Training Programs With Experimental Data”, Robert J. LaLonde1986-09 (, ; backlinks)⁠:

This paper compares the effect on trainee earnings of an employment program that was run as a field experiment where participants were randomly assigned to treatment and control groups with the estimates that would have been produced by an econometrician. This comparison shows that many of the econometric procedures do not replicate the experimentally determined results, and it suggests that researchers should be aware of the potential for specification errors in other nonexperimental evaluations.

…The National Supported Work Demonstration (NSW) was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment. Unlike other federally sponsored employment and training programs, the NSW program assigned qualified applicants to training positions randomly. Those assigned to the treatment group received all the benefits of the NSW program, while those assigned to the control group were left to fend for themselves.3

During the mid-1970s, the Manpower Demonstration Research Corporation (MDRC) operated the NSW program in 10 sites across the United States. The MDRC admitted into the program AFDC women, ex-drug addicts, ex-criminal offenders, and high school dropouts of both sexes.4 For those assigned to the treatment group, the program guaranteed a job for 9–18 months, depending on the target group and site. The treatment group was divided into crews of 3–5 participants who worked together and met frequently with an NSW counselor to discuss grievances and performance. The NSW program paid the treatment group members for their work. The wage schedule offered the trainees lower wage rates than they would have received on a regular job, but allowed their earnings to increase for satisfactory performance and attendance. The trainees could stay on their supported work jobs until their terms in the program expired and they were forced to find regular employment.

…male and female participants frequently performed different sorts of work. The female participants usually worked in service occupations, whereas the male participants tended to work in construction occupations. Consequently, the program costs varied across the sites and target groups. The program cost $26,666.81$9,1001986 per AFDC participant and ~$19,926.85$6,8001986 for the other target groups’ trainees.

The first 2 columns of Table 2 & 3 present the annual earnings of the treatment and control group members.9 The earnings of the experimental groups were the same in the pre-training year 1975, diverged during the employment program, and converged to some extent after the program ended. The post-training year was 1979 for the AFDC females and 1978 for the males.10 Columns 2 and 3 in the first row of Table 4 & 5 show that both the unadjusted and regression-adjusted pre-training earnings of the 2 sets of treatment and control group members are essentially identical. Therefore, because of the NSW program’s experimental design, the difference between the post-training earnings of the experimental groups is an unbiased estimator of the training effect, and the other estimators described in columns 5–10(11) are unbiased estimators as well. The estimates in column 4 indicate that the earnings of the AFDC females were $2,493.79$8511986 higher than they would have been without the NSW program, while the earnings of the male participants were $2,596.35$8861986 higher.11 Moreover, the other columns show that the econometric procedure does not affect these estimates.

The researchers who evaluated these federally sponsored programs devised both experimental and nonexperimental procedures to estimate the training effect, because they recognized that the difference between the trainees’ pre-training and post-training earnings was a poor estimate of the training effect. In a dynamic economy, the trainees’ earnings may grow even without an effective program. The goal of these program evaluations is to estimate the earnings of the trainees had they not participated in the program. Researchers using experimental data take the earnings of the control group members to be an estimate of the trainees’ earnings without the program. Without experimental data, researchers estimate the earnings of the trainees by using the regression-adjusted earnings of a comparison group drawn from the population. This adjustment takes into account that the observable characteristics of the trainees and the comparison group members differ, and their unobservable characteristics may differ as well.

The first step in a nonexperimental evaluation is to select a comparison group whose earnings can be compared to the earnings of the trainees. Table 2 & 3 present the mean annual earnings of female and male comparison groups drawn from the Panel Study of Income Dynamics (PSID) and Westat’s Matched Current Population Survey—Social Security Administration File (CPS-SSA). These groups are characteristic of 2 types of comparison groups frequently used in the program evaluation literature. The PSID-1 and the CPS-SSA-1 groups are large, stratified random samples from populations of household heads and households, respectively.14 The other, smaller, comparison groups are composed of individuals whose characteristics are consistent with some of the eligibility criteria used to admit applicants into the NSW program. For example, the PSID-3 and CPS-SSA-4 comparison groups in Table 2 include females from the PSID and the CPS-SSA who received AFDC payments in 1975, and were not employed in the spring of 1976. Table 2 & 3 show that the NSW trainees and controls have earnings histories that are more similar to those of the smaller comparison groups

Unlike the experimental estimates, the nonexperimental estimates are sensitive both to the composition of the comparison group and to the econometric procedure. For example, many of the estimates in column 9 of Table 4 replicate the experimental results, while other estimates are more than $2,930.42$1,0001986 larger than the experimental results. More specifically, the results for the female participants (Table 4) tend to be positive and larger than the experimental estimate, while for the male participants (Table 5), the estimates tend to be negative and smaller than the experimental impact.20 Additionally, the nonexperimental procedures replicate the experimental results more closely when the nonexperimental data include pretraining earnings rather than cross-sectional data alone or when evaluating female rather than male participants.

Table 5: Earnings Comparisons And Estimated Training Effects For The NSW Male Participants Using Comparison Groups From The PSID And The CPS-SSA.

Before taking some of these estimates too seriously, many econometricians at a minimum would require that their estimators be based on econometric models that are consistent with the pre-training earnings data. Thus, if the regression-adjusted difference between the post-training earnings of the 2 groups is going to be a consistent estimator of the training effect, the regression-adjusted pretraining earnings of the 2 groups should be the same.

Based on this specification test, econometricians might reject the nonexperimental estimates in columns 4–7 of Table 4 in favor of the ones in columns 8–11. Few econometricians would report the training effect of $2,549.46$8701986 in column 5, even though this estimate differs from the experimental result by only $55.68$191986. If the cross-sectional estimator properly controlled for differences between the trainees and comparison group members, we would not expect the difference between the regression adjusted pre-training earnings of the 2 groups to be $4,542.15$1,5501986, as reported in column 3. Likewise, econometricians might refrain from reporting the difference-in-differences estimates in columns 6 and 7, even though all these estimates are within 2 standard errors of $8,791.26$3,0001986. As noted earlier, this estimator is not consistent with the decline in the trainees’ pre-training earnings.

The 2-step estimates are usually closer than the one-step estimates to the experimental results for the male trainees as well. One estimate, which used the CPS-SSA-1 sample as a comparison group, is within $1,758.25$6001986 of the experimental result, while the one-step estimate falls short by $4,967.06$1,6951986. The estimates of the participation coefficients are negative, although unlike these estimates for the females, they are always statistically-significantly different from zero. This finding is consistent with the example cited earlier in which individuals with high participation unobservables and low earnings unobservables were more likely to be in training. As predicted, the unrestricted estimates are larger than the one-step estimates. However, as with the results for the females, this procedure may leave econometricians with a considerable range ($4,530.43$1,5461986) of imprecise estimates.

…This study shows that many of the econometric procedures and comparison groups used to evaluate employment and training programs would not have yielded accurate or precise estimates of the impact of the National Supported Work Program.