6





Supplementary Materials for

Did Cooperation Among Strangers Decline in the United States?

A Cross-Temporal Meta-Analysis of Social Dilemmas (1956–2017)




Supplementary Materials 2

Coding of Cooperation 2

Flowchart of Literature Search and Inclusion 3

Analysis Based on All Eligible Studies 5

Analysis of Studies on College Student Samples and Mean Age of 18 to 28 Years 34

Studies Included in the Meta-analysis 42

References 72





Supplementary Materials

Coding of Cooperation

We used the overall cooperation rates reported in social dilemma studies as the measure of cooperation. Specifically, in the continuous Public Goods Dilemma (e.g., Fehr & Fischbacher, 2004) and Prisoner’s Dilemma (e.g., Van Lange & Kuhlman, 1994) in which each player can decide how much of their initial endowments to contribute to a group account or their partner, we calculated cooperation as the percentage of endowment contributed (Pcont = Mean contributions/Endowment). In the dichotomous Prisoner’s Dilemma (e.g., Dawes, 1980) and Public Goods Dilemma (e.g., Shank et al., 2019) in which each participant has two options: cooperation (contributing all endowments to the group account) or defection (contributing nothing), we calculated the proportion of cooperative choices, with higher proportions indicating greater cooperation. If a study only reported the cooperation rates in each subgroup (e.g., communication and no communication, different experimental conditions), then we calculated the overall Mean (M) and pooled Standard Deviation (SD) of cooperation by using the following formulas:




Flowchart of Literature Search and Inclusion

Figure S1

PRISMA Flowchart of Literature Search and Inclusion

Shape1

Records from Cooperation Databank

(k = 3,026)


Shape2

Records excluded due to:



Shape3







Shape4






Shape5

Records after exclusions

(k = 618)




Shape6

Shape7

Data splitting for cooperation within studies that manipulated study characteristics



Shape8



Shape10 Shape9

Records excluded due to:

Records after data splitting

(k = 793)





Shape11


Shape12




Shape13

Records included in meta-analysis

(k = 660, Nstudies = 511, Nparticipants = 63,342)







Note. a We excluded records if the second annotator reported unreliable cooperation rates due to several reasons (e.g., insufficient information to code cooperation rates) during the re-checking procedure; b We excluded records if the sample was a duplicate of other samples (e.g., between the main study and its sub-studies). In the Cooperation Databank, if a study involved cooperation measures in different dilemmas or countries, sub-studies were coded for describing the sample and study characteristics separately; See Spadaro et al., (in press) for more details about how the data in the Cooperation Databank was coded with regard to studies and sub-studies; c We excluded records if there was no sufficient information to compute the variance of the cooperation estimate.




Analysis Based on All Eligible Studies

Year of Data Collection. Across all studies included in the meta-analysis, the year of data collection ranged from 1956 to 2017 (Mdn = 1999; see Figure S2 for the number of effect sizes in each year of data collection).

Figure S2

Histogram of Number of Effect Sizes in Each Year of Data Collection

Note. k = 660

Extreme Outliers Excluded From the Analysis. We also conducted analyses including seven extreme outliers (see Table S1). The results remained the same (see Table S5).


Table S1

Studies With Extreme Outliers in the Dataset

Study

N

Dilemma

Cooperation rates

Yi

ZYi

Tyson (1992) Study 1

492

PD

0.93

2.67

3.29

Park (2012) Study 1 Treatment 1

94

PD

0.94

2.70

3.34

Bergsieker (2013) Study 3

120

PD

0.94

2.75

3.40

Insko et al. (1990) Study 2

62

PD

0.95

2.94

3.63

Insko et al. (1994) Study 1 Treatment 1

24

PD

0.96

3.18

3.92

Schopler et al. (1991) Study 2 Treatment 1

36

PD

0.96

3.20

3.95

Bochet et al. (2006) Study 1 Treatment 2

64

PGD

0.97

3.46

4.27

Note. PD = Prisoner’s Dilemma; PGD = Public Goods Dilemma; Yi = logit-transformed cooperation rates; ZYi = standardized logit-transformed cooperation rates.

Publication Bias Test. To examine the possibility of publication bias, we first considered the funnel plot where all the cooperation estimates were plotted according to their sample size (see Figure S3). Then, we used a modified Egger’s regression method to test for funnel plot asymmetry (Egger et al., 1997). Specifically, we conducted a multi-level meta-regression of cooperation estimate on sample size (weighted by the inverse of the variance of the cooperation estimate). This modified approach allows us to (a) detect selective reporting in the presence of dependent effect sizes (Rodgers & Pustejovsky, 2021), and (b) improve accuracy in detecting publication bias when using logit transformations of the proportion as outcome measure (Macaskill et al., 2001). Results showed that sample size was not positively associated with cooperation estimates (b = 0.0005, SE = .0003, p = .080). Taken together, these results suggest that there is little publication bias in our data.


Figure S3

Funnel Plots Based on all the Cooperation Estimates

Note. k = 660

Correlations Between Study Characteristics. We calculated the correlations between study characteristics based on original dataset and presented the results in Table S2.

Variance Inflation Factor Analysis. We calculated the generalized variance inflation factors (GVIF, Fox & Monette, 1992) to assess multicollinearity (i.e., the degree to which each independent variable is explained by other independent variables) in our full model with all control variables (i.e., dilemma type, proportion of male participants, repetitions, group size, K index, communication, sanctions, and period of cooperation). In our study, predictors consisted of continuous variables and categorical variables with two or more levels. We followed the approach of Fox and Monette (1992), and computed to make GVIFs comparable across variables with a different number of parameters (e.g., a dummy-coded categorical variable could have multiple coefficients). Then we squared values and followed the rules of thumb for VIF values from previous literature (i.e., the multicollinearity level is acceptable when the VIF is lower than 10; O’Brien, 2007). The results presented in Table S3 suggested that our model was not affected by multicollinearity (i.e., all squared values were smaller than 10).

Table S2

Correlations Between Study Characteristics


1

2

3

4

5

6

7

8

9

10

11

12

1. Time












2. % male

-.25*











3. Dilemma typea

.42*

.01










4. Repetitions (mixed)

-.05

-.04

.03









5. Repetitions

-.25*

.24*

-.16*

-.22*








6. Group size

.16*

.04

.59*

-.03

-.21*







7. K index

-.15*

.02

.04

.09*

-.02

.05






8. Communication (mixed)

.05

-.02

-.01

.06

-.01

.05

.02





9. Communication

-.12*

-.22*

-.19*

-.01

.08

-.03

.02

-.06




10. Periodsb

.02

-.03

-.07

.11*

.06

-.05

-.08

.34*

.02



11. Sanctions (mixed)

.11*

.01

.17*

.05

-.04

.06

-.04

-.02

-.01

-.02


12. Sanctions

.15*

.14*

.25*

-.03

-.01

.12*

-.02

-.03

-.09*

-.04

-.03

Note. a 0 = Prisoner’s Dilemma, 1 = Public Goods Dilemma; Repetitions: comparison group = one-shot interaction; Communication: comparison group = no communication; b 0 = overall, 1 = first; Sanctions: comparison group = no sanctions. *p < .05.


Table S3

Generalized Variance Inflation Factor (GVIF) Values for Independent Variables in the Full Model

Variable

GVIF

Df

GVIF^(1/(2*Df))

(GVIF^(1/(2*Df)))^2

1. Time

1.51

1

1.23

1.51

2. Dilemma typea

2.08

1

1.44

2.07

3. % male

1.19

1

1.09

1.19

4. Repetitionsb

1.22

2

1.05

1.10

5. Group size

1.66

1

1.29

1.66

6. K index

1.06

1

1.03

1.06

7. Communicationc

1.15

2

1.03

1.06

8. Sanctionsd

1.11

2

1.03

1.06

9. Periodse

1.09

1

1.04

1.08

Note. k = 660. a 0 = Prisoner’s Dilemma, 1 = Public Goods Dilemma; b 0 = one-shot interaction, 0.5 = mixed, 1 = repeated interaction; c 0 = no communication, 0.5 = mixed, 1 = communication; d 0 = no sanctions, 0.5 = mixed, 1 = sanctions; e 0 = overall, 1 = first.




Model Results With Dilemma Type Replaced With Number of Choice Options. The original pre-registration also reported that we coded the number of choice options in the dilemmas. Each social dilemma has at least two choice options: cooperate (contribute all) versus defect (contribute nothing). Some studies ask participants to decide how much of an endowment to contribute to a public good. We coded the number of choice options as a continuous variable. Participants have two choice options (cooperation vs. defection) in dichotomous social dilemmas. For the continuous measure of cooperation, the number of choice options generally equals the maximum amount that participants can contribute plus 1 (i.e., initial endowments + 1, taking into account the option of giving nothing). Studies in which the manipulated initial endowments (i.e., endowments are asymmetric within a group) resulted in multiple choice options but the cooperation rates could not be observed in each treatment were coded using the median value of these numbers (k = 31). In our sample, the number of choice options ranged from 2 to 1001 (M = 22.66, Mode =2, Mdn = 2). Due to the skewed distribution, we log-transformed the number of choice options in our analyses.

The number of choice options was highly correlated with dilemma type (r = .38, p < .001). Generally, participants in the Public Goods Dilemma had more choice options than those in the Prisoner’s Dilemma. To reduce multicollinearity, we only focused on the dilemma type in our main analysis. Actually, the number of choice options did not significantly predict cooperation in a separate meta-regression model (b = 0.03, SE = .02, p = .082). When we included the number of choice options in the full model, we drew the same conclusions (see Table S4).

Table S4

Meta-Regression Models Without Control Variables (Model 1) and Including Control Variables (Model 2)


Model 1


Model 2

Variable

b

SE

95% CI

β


b

SE

95% CI

β

Birth cohort










Time

0.006*

.002

[0.002, 0.009]

.13


0.005*

.002

[0.001, 0.009]

.11

% male






-0.004

0.14

[-0.29, 0.28]

-.001

Repetitionsa










Mixed






-0.09

.20

[-0.48, 0.29]

-.02

Repeated






-0.13

.07

[-0.261, 0.003]

-.08

Number of choices






0.02

.02

[-0.03, 0.07]

.05

Group size






-0.02

.05

[-0.12, 0.08]

-.02

K index






0.44*

.16

[0.12, 0.76]

.10

Communicationb










Mixed






0.25

.23

[-0.21, 0.71]

.05

Communication






0.54*

.08

[0.39 0.69]

.28

Sanctionsc










Mixed






0.43*

.19

[0.05, 0.80]

.08

Sanctions






0.39*

.12

[0.15, 0.63]

.12

Periodsd






0.03

.19

[-0.34, 0.41]

.01

Model statistics










Qmodel(df)

10.83 (1)*



91.53 (12)*

Qresidual(df)

11030.15 (658)*



10023.36 (647)*

R2

.02



.11

τ2 (level 2)

.33



.28

τ2 (level 3)

.09



.11

I2 (level 2)

74.69



68.32

I2 (level 3)

20.75



26.44

Note. k = 660. a Comparison group = one-shot interaction; b Comparison group = no communication; c Comparison group = no sanctions; d 0 = overall, 1 = first. τ2 = tau-squared, the estimate of total amount of heterogeneity; I2 = percentage of total variability due to heterogeneity; Level 2 and 3 represent within-study and between-study variance, respectively (hereafter the same). *p < .05.

Model Results Including Extreme Outliers. We also conducted the analyses including 513 studies with seven extreme outliers (see Table S1) in the sample (i.e., 667 unique cooperation estimates involving 64,234 participants). These analyses resulted in the same conclusions as the analyses excluding the outliers (see Table S5).




Table S5

Meta-Regression Models Without Control Variables (Model 1) and Including Control Variables (Model 2)



Model 1



Model 2

Variable

b

SE

95% CI

β


b

SE

95% CI

β

Birth cohort










Time

0.006*

.002

[0.002, 0.009]

.12


0.006*

.002

[0.001, 0.010]

.11

Dilemma typea






0.03

.09

[-0.14, 0.19]

.02

% male






0.01

.14

[-0.27, 0.29]

.005

Repetitionsb










Mixed






-0.11

.21

[-0.51, 0.30]

-.02

Repeated






-0.14*

.07

[-0.275, -0.004]

-.08

Group size






-0.03

.06

[-0.15, 0.08]

-.03

K index






0.37*

.17

[0.03, 0.71]

.08

Communicationc










Mixed






0.23

.24

[-0.24, 0.71]

.04

Communication






0.57*

.08

[0.42, 0.73]

.28

Sanctionsd










Mixed






0.42*

.20

[0.03, 0.81]

.08

Sanctions






0.38*

.13

[0.14, 0.63]

.11

Periodse






0.01

.20

[-0.38, 0.41]

.003

Model statistics










Qmodel(df)


10.83 (1)*



89.29 (12)*

Qresidual(df)


11422.42 (665)*



10494.43 (654)*

R2


.02



.11

τ2 (level 2)


.35



.30

τ2 (level 3)


.11



.12

I2 (level 2)


72.80



67.93

I2 (level 3)


22.98



27.25

Note. k = 667. a 0 = Prisoner’s Dilemma; 1 = Public Goods Dilemma; b Comparison group = one-shot interaction; c Comparison group = no communication; d Comparison group = no sanctions; e 0 = overall, 1 = first. *p < .05.

Spline Model Results. Besides the quadradic model we reported in the main text, we also considered fitting a spline model in order to examine whether the pattern of cooperation over time was curvilinear. We used the ‘restricted cubic spline’ model that consists of a series of piecewise cubic polynomials (Stone & Koo, 1985) to further test for a non-linear trend of cooperation over time. For this spline model, we need to choose the number of 'knots', which are the positions where the piecewise cubic polynomials are connected. Here, we chose three and four ‘knots’ to conduct explorative analyses.

When setting three ‘knots’, we used year of data collection (time) to predict cooperation in the meta-regression model. As shown in Figure S4, cooperation increased from 1968 to 1999 (b = 0.010, SE = .004, p = .031), but did not change between 1999 and 2013 (b = -0.004, SE = .005, p = .343). We also used year of data collection (time) to predict cooperation in the meta-regression model with four ‘knots’. As shown in Figure S5, cooperation did not change from 1966 to 1989 (b = 0.01, SE = .01, p = .223), from 1989 to 2005 (b = -0.003, SE = .02, p = .848), and from 2005 to 2015 (b = -0.01, SE = .08, p = .942).

According to above analyses, the changing pattern of cooperation seems not to follow a curvilinear trend.


Figure S4

Cube Spline Model Showing Historical Changes Over Time in the Mean Cooperation Rates in Social Dilemmas (Three Knots)

Note. Knots are located at 1968, 1999, and 2013 for year of data collection (vertical dotted lines). The solid black line represents average model predictions of cooperation estimates. The shaded gray region indicates 95% prediction intervals based on average model predictions. Data points represent study means and the size of the data point is proportional to study (inverse variance) weighting. Larger dots are equated with means that have a smaller variance.


Figure S5

Cube Spline Model Showing Historical Changes Over Time in the Mean Cooperation Rates in Social Dilemmas (Four Knots)

Note. Knots are located at 1966, 1989, 2005, and 2015 for year of data collection (vertical dotted lines). The solid black line represents average model predictions of cooperation estimates. The shaded gray region indicates 95% prediction intervals based on average model predictions. Data points represent study means and the size of the data point is proportional to study (inverse variance) weighting. Larger dots are equated with means that have a smaller variance.


The Relationship Between Cultural Tightness and Cooperation. Cultural tightness-looseness refers to variance in norms, values, and behavior (Uz, 2015). A tight society tends to have many strongly enforced rules and little tolerance for deviance, whereas a loose society has few strongly enforced rules and greater tolerance for deviance (Harrington & Gelfand, 2014). Here we examined whether state-level tightness-looseness within the U.S. is related to cooperation. We did not preregister this hypothesis and analysis. Moreover, a valid and reliable state-level tightness-looseness index was developed around 2013 (see Harrington & Gelfand, 2014), and we predict cooperation in studies conducted between 1956 and 2017.

We coded the U.S. state in which the study was conducted according to the university where the experiments were conducted or the authors were affiliated to, if the former information was not reported in the paper. We did not estimate the inter-rater agreement for this variable because it was coded at a later stage. There are 630 unique samples that could be coded and associated with a specific state (N = 60,199). Then, we retrieved state-level tightness-looseness values (higher scores indicate greater tightness) across the United States from previous research (see Harrington & Gelfand, 2014).

Finally, we fitted a four-level meta-analytic model that can consider four different variance components distributed over the four levels of the model: sampling variance of the extracted cooperation estimates (level 1), variance between cooperation estimates extracted from the same study (level 2), variance between studies (level 3), and variance between states (level 4). We found that tightness-looseness did not have a statistically significant association with cooperation, when only including this as a predictor variable in a meta-regression model (b = 0.004, SE = .003, p = .230). When we included tightness-looseness in the full model comprising all the study characteristics (i.e., year of data collection, dilemma type, proportion of male participants, repetitions, group size, K index, communication, sanctions and period of cooperation), the association between tightness-looseness and cooperation remained nonsignificant (b = 0.003, SE = .003, p = .207).

Other Study Characteristics and Their Effects on Cooperation. The Cooperation Databank also contains other sample and study characteristics, including student sample, academic discipline, and game payment. These three variables scored a high and/or adequate level of inter-rater agreement (Krippendorff’s α ranged from 0.70 to 0.89, agreement rate ranged from 79.10% to 97.30%). Our original pre-registration plan did not state that we would report the effects of these study characteristics on cooperation. Here, we report whether these study characteristics influence cooperation. Considering that we first coded whether participants were college students or not and then coded the specific discipline of students, and thus these two variables have a hierarchical structure, we did not include student sample and discipline into the same model.

In the studies included in the meta-analysis, most of the participants were college students (k = 626, coded 1, N = 58,976), and with a minority of studies including non-student samples (k = 34, coded 0, N = 4,366). Cooperation does not differ between the college student samples and the non-student samples (b = 0.07, SE = .13, p = .558). When we included the student sample variable in the full model comprising all the study characteristics (i.e., year of data collection, dilemma type, students, payment, proportion of male participants, repetitions, group size, K index, communication, sanctions and period of cooperation), (non)student sample was still not significantly associated with cooperation (b = 0.16, SE = .12, p = .206).

We also coded the specific academic discipline of students. Academic discipline includes economics (k = 23, coded 0), psychology (k = 220, coded 1), other (k = 122, coded 2), and non-students (k = 34, coded 3). Besides, for 261 samples specific academic discipline of students was not reported. Cooperation did not differ between economics students, compared to either psychology students (b = -0.12, SE = .13, p = .350), students from other disciplines (b = -0.12, SE = .14, p = .409), and the non-student samples (b = -0.17, SE = .16, p = .298). When we included academic discipline in the full model including all the study characteristics (i.e., year of data collection, dilemma type, discipline, payment, proportion of male participants, repetitions, group size, K index, communication, sanctions and period of cooperation), academic discipline was still not associated with cooperation (all p-values ≥ .186).

We coded the payoffs determined by participants’ decision in the social dilemma. Studies that involved hypothetical values for participant’s outcomes were coded 0 (k = 47, labeled ‘hypothetical’). Studies that paid participants with money were coded 1 (k = 478, labeled ‘monetary’). Studies that paid participants with non-money resources (e.g., candies, school supplies) were coded 2 (k = 36, labeled ‘non-monetary’). The mixed forms of payment were coded 3 (k = 11, labeled ‘mixed’). Besides, for 88 samples the specific form of payment was not reported. Compared to participants who received hypothetical payments, cooperation rates were not higher or lower among participants who received monetary payments (b = 0.13, SE = .11, p = .232), non-monetary payoffs (b = 0.09, SE = .16, p = .572), or mixed forms of payments (b = -0.04, SE = .24, p = .859). When we included the payment in the full model involving all the study characteristics (i.e., year of data collection, dilemma type, students, payment, proportion of male participants, repetitions, group size, K index, communication, sanctions and period of cooperation), then cooperation still did not vary with the type of payment (all p-values ≥ .180).

In sum, student sample, academic discipline, and game payment did not significantly predict cooperation in our meta-analysis. In addition, when we included student sample (or academic discipline) and payment in the full model including all the study characteristics (i.e., year of data collection, dilemma type, proportion of male participants, repetitions, group size, K index, communication, sanctions and period of cooperation), time continued to have a statistically significant positive relation with cooperation (bs = 0.006 and 0.006, SEs = .002 and .002, p-values = .008 and .004, respectively).

Further Evaluating how Sociocultural Indicators are Associated With Cooperation After Detrending the Variables. It is possible that the correlations (see Table 4 in the main text) observed between the sociocultural indicators and cooperation are spurious because most of the variables have a strong linear trend over time, rather than each variable having a causal effect on cooperation. One approach to control for time trends in the data is that we enter year of data collection, the sociocultural indicators, and study characteristics into a three-level meta-regression model together as predictors of cooperation. However, year of data collection is strongly associated with Gini index (r = .98, p < .001), GDP per capita (r = .98, p < .001), social welfare function (r = .73, p < .001), urbanization (r = .98, p < .001), percentage of people living alone (r = .95, p < .001), violent crime rate (r = .44, p < .001), social trust (r = -.76, p < .001), and materialism (r = .66, p < .001). Year of data collection is not associated with unemployment rate (r = .19, p = .15), Party of President (r = -.15, p = .261), and divorce rate (r = .10, p = .461). These correlations might indicate severe multicollinearity in such models, which would result in unreliable estimates. For example, the squared for many of the sociocultural indicators 10 years prior to cooperation (e.g., GDP per capita and percentage of people living alone) were above (or close to) the acceptable threshold of 10 (O’Brien, 2007), such as Gini index (8.318), GDP per capita (38.576), urbanization (14.705), percentage of people living alone (17.972). Therefore, due to severe multicollinearity, the meta-regression models including both the year and the sociocultural indicators are likely to produce unreliable estimates.

An alternative approach to control for time trends in the data is to use time-series analyses. This approach could be used to test whether the associations between the sociocultural indicators and cooperation are due to each of these variables increasing over time. However, there are severe limitations with applying this approach to our data, including that we are unable to (a) account for the different number of cooperation estimates per year, (b) statistically control for between-study heterogeneity, and (c) weight the effect sizes (and so the following analyses in this section are not proper meta-analyses). Nonetheless, we do report the results of these analyses for interested readers.

In order to perform any time-series analyses, we could only include a single estimate of cooperation at each time point. Therefore, we obtained a single pooled cooperation estimate and associated squared standard errors for each year, by running a set of intercept-only meta-regression models for each year. This generated a dataset that includes a total of 62 data points (i.e., from 1956 to 2017).

Considering that the US President is elected every four years and that the Party of President was coded as a dichotomous variable, this variable does not resemble a significant time trend like the other time-series variables (e.g., Gini index or percentage of people living alone). For this reason, we did not conduct any of the following analyses including the Party of President variable.

After pooling the cooperation estimates, we performed a linear interpolation of missing data for certain years (1957, 1961, 1962, 1983 and 1985) for the time-series of cooperation and its squared standard errors. Then, we narrowed our use of cooperation data to match the time frame of the sociocultural indicators which were not available in 1956. For example, the Gini index was only available from 1967 (see Table 1 in the main text), so we used 1967 as the starting data point of the Gini index and cooperation (see Table S6).

Many time-series analytic methods carry the assumption of stationarity. That is, the data do not have underlying trends that bias analyses of their change over time or their multivariate relations (Caluori et al., 2020). Here, we used two approaches that detrend the time-series variables to ensure data stationarity.

The First Detrending Approach and Corresponding Analyses. To make data stationary, prior to performing any following analyses, we detrended our time-series vectors for the sociocultural indicators and cooperation on the basis of year in order to remove autoregressive trends associated with the passage of time. Specifically, we did so by (1) extracting the residuals of a linear regression model with year predicting the sociocultural indicator, and (2) extracting the residuals of a regression model with year predicting cooperation. We extracted the residuals in step (2) following different methods, including (a) running a linear regression model (function ‘lm’), (b) running a linear regression model in which the squared standard errors are specified through the ‘weights’ argument, and (c) running a meta-regression model that includes the squared standard errors, in which the residuals are extracted through the ‘residuals.rma’ function of the metafor R package (Viechtbauer, 2010).

We then subjected each time series to augmented Dickey-Fuller root tests, which evaluate whether a time series has an underlying trend that renders it nonstationary. A statistically significant result obtained with this test implies the stationarity of these time-series data and that they are suitable for time-series analysis. This test revealed significant results for cooperation, no matter what method we used to extract the residuals of a regression model with year predicting cooperation (all p-values ≤ .003), Gini index (p = .026), social welfare function (p = .071), urbanization (p < .001), unemployment rate (p < .001), and materialism (p = .018). We did not find significant results for GDP per capita (p = .968), divorce rate (p = .169), percentage of people living alone (p = .260), violent crime rate (p = .161), and social trust (p = .219). Nonetheless, we reported the results of the analyses of those non-stationary time-series data for interested readers.

After detrending our time series data, we then performed separate models for each of the sociocultural indicators predicting cooperation. Tables S6, S7, and S8 report the correlations of the sociocultural indicators and Americans’ cooperation. Regressions revealed no significant associations between cooperation and any of the sociocultural indicators. That is, changes in cooperation over time were not associated with any changes that occurred among the sociocultural indicators. However, it is important to note that this approach, contrary to the meta-regression method that led to the findings reported in the main text, does not allow for the weighting of the cooperation estimate by its variance.

Table S6

Relation Between the Sociocultural Indicators and Cooperation After the Variables Were Detrended (Method A)

Sociocultural indicators

β

SE

t

p

Gini index (1967-2017)

-.15

.14

-1.06

.294

GDP per capita (1960-2017)

.08

.13

0.57

.574

Social Welfare Function (1967-2017)

.12

.14

0.82

.418

Unemployment rate (1956-2017)

-.08

.13

-0.66

.512

Urbanization (1960-2017)

-.08

.13

-0.62

.537

Divorce rate (1960-2017)

.15

.13

1.17

.246

Percentage of people living alone (1960-2017)

.13

.13

0.98

.334

Violent crime rate (1958-2017)

.03

.13

0.23

.822

Materialism (1976-2017)

.07

.16

0.44

.664

Social trust (1976-2017)

.18

.16

1.17

.250

Note. Method A: Cooperation estimates were detrended through a linear regression model.


Table S7

Relation Between the Sociocultural Indicators and Cooperation After All Variables Were Detrended (Method B)

Sociocultural indicators

β

SE

t

p

Gini index (1967-2017)

-.15

.14

-1.05

.297

GDP per capita (1960-2017)

.08

.13

0.56

.575

Social Welfare Function (1967-2017)

.12

.14

0.81

.420

Unemployment rate (1956-2017)

-.08

.13

-0.65

.516

Urbanization (1960-2017)

-.08

.13

-0.62

.539

Divorce rate (1960-2017)

.15

.13

1.17

.247

Percentage of people living alone (1960-2017)

.13

.13

0.97

.335

Violent crime rate (1958-2017)

.03

.13

0.23

.822

Materialism (1976-2017)

.07

.16

0.44

.666

Social trust (1976-2017)

.18

.16

1.16

.253

Note. Method B: Cooperation estimates were detrended through a linear regression model in which the squared standard errors are specified through the ‘weights’ argument.

Table S8

Relation Between the Sociocultural Indicators and Cooperation After All Variables Were Detrended (Method C)

Sociocultural indicators

β

SE

t

p

Gini index (1967-2017)

-.15

.14

-1.06

.295

GDP per capita (1960-2017)

.08

.13

0.57

.574

Social Welfare Function (1967-2017)

.12

.14

0.82

.418

Unemployment rate (1956-2017)

-.08

.13

-0.66

.512

Urbanization (1960-2017)

-.08

.13

-0.62

.538

Divorce rate (1960-2017)

.15

.13

1.17

.246

Percentage of people living alone (1960-2017)

.13

.13

0.97

.334

Violent crime rate (1958-2017)

.03

.13

0.23

.822

Materialism (1976-2017)

.07

.16

0.44

.665

Social trust (1976-2017)

.18

.16

1.17

.250

Note. Method C: Cooperation estimates were detrended through a meta-regression model and the residuals are extracted through the ‘residuals.rma’ function of the metafor R package.

We next performed Granger tests of predictive causality, as this test can evaluate whether one time-series variable (x) predicts changes in another time-series variable (y), even when controlling for earlier values of y (Caluori et al., 2020; Grossmann & Varnum, 2015). We performed Granger tests of causality on separate models, one for each sociocultural indicator as a predictor of cooperation, at time lags of t 1, t5, and t10 years. A statistically significant result at these time lags could suggest that the sociocultural indicator at time t could predict cooperation 1, 5, and 10 years in the future, even when analyses control for cooperation at time t. The results of Granger test of causality are presented in Tables S9, S10, and S11. Most of these associations were non-significant. We only found that percentage of people living alone at time t could predict cooperation 10 years in the future, and that materialism at time t could predict cooperation 5 years in the future when cooperation estimates were detrended using method c. However, the percentage of people living alone did not satisfy the assumption of stationarity, which could produce inaccurate results.


Table S9

F Statistics From the Granger Test of Predictive Causality at 1- , 5- and 10-Year Lags (Method A)

Sociocultural indicators

1-year

5-year

10-year

Gini index

F(1, 48) = 1.66, p = .205

F(5, 40) = 1.91, p = .118

F(10, 30) = 2.09, p = .078

GDP per capita

F(1, 55) = 0.10, p = .759

F(5, 47) = 0.48, p = .788

F(10, 37) = 0.62, p = .781

Social Welfare Function

F(1, 48) = 0.65, p = .423

F(5, 40) = 1.46, p = .227

F(10, 30) = 1.54, p = .197

Unemployment rate

F(1, 59) = 0.05, p = .827

F(5, 51) = 1.13, p = .358

F(10, 41) = 0.71, p = .708

Urbanization

F(1, 55) = 0.34, p = .560

F(5, 47) = 0.63, p = .676

F(10, 37) = 0.75, p = .675

Divorce rate

F(1, 55) = 0.03, p = .869

F(5, 47) = 0.72, p = .614

F(10, 37) = 0.78, p = .647

% people living alone

F(1, 55) = 0.03, p = .852

F(5, 47) = 0.55, p = .739

F(10, 37) = 2.47, p = .030

Violent crime rate

F(1, 57) = 0.12, p = .731

F(5, 49) = 0.79, p = .561

F(10, 39) = 1.01, p = .462

Materialism

F(1, 39) = 1.57, p = .218

F(5, 31) = 2.58, p = .051

F(10, 21) = 1.10, p = .439

Social trust

F(1, 39) = 0.85, p = .364

F(5, 31) = 1.98, p = .115

F(10, 21) = 1.46, p = .271

Note. Method A: Cooperation estimates were detrended through a linear regression model.

Table S10

F Statistics From the Granger Test of Predictive Causality at 1- , 5- and 10-Year Lags (Method B)

Sociocultural indicators

1-year

5-year

10-year

Gini index

F(1, 48) = 1.56, p = .218

F(5, 40) = 1.86, p = .127

F(10, 30) = 1.89, p = .108

GDP per capita

F(1, 55) = 0.14, p = .707

F(5, 47) = 0.63, p = .680

F(10, 37) = 0.66, p = .754

Social Welfare Function

F(1, 48) = 0.73, p = .397

F(5, 40) = 1.43, p = .239

F(10, 30) = 1.48, p = .219

Unemployment rate

F(1, 59) = 0.03, p = .854

F(5, 51) = 0.91, p = .484

F(10, 41) = 0.52, p = .862

Urbanization

F(1, 55) = 0.32, p = .577

F(5, 47) = 0.46, p = .803

F(10, 37) = 0.66, p = .748

Divorce rate

F(1, 55) = 0.01, p = .924

F(5, 47) = 0.76, p = .585

F(10, 37) = 0.80, p = .632

% people living alone

F(1, 55) = 0.01, p = .907

F(5, 47) = 0.56, p = .729

F(10, 37) = 2.45, p = .031

Violent crime rate

F(1, 57) = 0.10, p = .753

F(5, 49) = 1.00, p = .427

F(10, 39) = 1.37, p = .242

Materialism

F(1, 39) = 1.54, p = .222

F(5, 31) = 1.80, p = .149

F(10, 21) = 1.08, p = .445

Social trust

F(1, 39) = 0.78, p = .381

F(5, 31) = 1.81, p = .147

F(10, 21) = 1.61, p = .224

Note. Method B: Cooperation estimates were detrended through a linear regression model in which the squared standard errors are specified through the ‘weights’ argument.




Table S11

F Statistics From the Granger Test of Predictive Causality at 1- , 5- and 10-Year Lags (Method C)

Sociocultural indicators

1-year

5-year

10-year

Gini index

F(1, 48) = 1.67, p = .202

F(5, 40) = 1.89, p = .121

F(10, 30) = 2.05, p = .083

GDP per capita

F(1, 55) = 0.08, p = .778

F(5, 47) = 0.40, p = .845

F(10, 37) = 0.60, p = .801

Social Welfare Function

F(1, 48) = 0.60, p = .443

F(5, 40) = 1.46, p = .229

F(10, 30) = 1.54, p = .196

Unemployment rate

F(1, 59) = 0.05, p = .817

F(5, 51) = 1.12, p = .364

F(10, 41) = 0.62, p = .788

Urbanization

F(1, 55) = 0.35, p = .554

F(5, 47) = 0.73, p = .605

F(10, 37) = 0.81, p = .620

Divorce rate

F(1, 55) = 0.04, p = .850

F(5, 47) = 0.70, p = .629

F(10, 37) = 0.77, p = .656

% people living alone

F(1, 55) = 0.04, p = .833

F(5, 47) = 0.54, p = .747

F(10, 37) = 2.47, p = .030

Violent crime rate

F(1, 57) = 0.12, p = .728

F(5, 49) = 0.73, p = .603

F(10, 39) = 0.93, p = .519

Materialism

F(1, 39) = 1.56, p = .219

F(5, 31) = 2.77, p = .039

F(10, 21) = 1.09, p = .442

Social trust

F(1, 39) = 0.86, p = .361

F(5, 31) = 1.98, p = .116

F(10, 21) = 1.46, p = .271

Note. Method C: Cooperation estimates were detrended through a meta-regression model and the residuals are extracted through the ‘residuals.rma’ function of the metafor R package.


The Second Detrending Approach and Corresponding Analyses. To make the data stationary, prior to performing any of the following analyses, we applied first-order differencing to our time-series vectors for the sociocultural indicators and cooperation (i.e., computing the differences between two consecutive observations). For example, if the level of cooperation was 0.20 in 1956 and 0.40 in 1957, this would correspond to a first-order difference of 0.20 in 1957. Similarly, if the level of cooperation was 0.50 in 1958, this would correspond to a first-order difference of 0.10 in 1958. Accordingly, the reported estimates indicate whether a change in a sociocultural indicator is associated with a change in cooperation.

As before, we then subjected each time series to augmented Dickey-Fuller root tests to evaluate whether the first-order differenced time series were stationary. The augmented Dickey-Fuller root tests revealed that all of the first-order differenced time series were stationary, making our data suitable for time-series analysis: cooperation (p < .001), Gini index (p < .001), GDP per capita (p < .001), social welfare function (p < .001), unemployment rate (p < .001), urbanization (p = .025), divorce rate (p = .002), percentage of people living alone (p < .001), violent crime rate (p < .001), social trust (p < .001), and materialism (p < .001).

After detrending our time series data, we then performed separate models for each of the change in sociocultural indicators predicting the change in cooperation. Table S12 reports the correlations of the change in sociocultural indicators and the change in Americans’ cooperation. Results from these regressions revealed that the change in cooperation was not associated with the change in any sociocultural indicators. However, it is important to note that, as for the method discussed above, this regression method does not allow for the weighting of the cooperation estimate by its variance.

We next performed Granger tests of causality at time lags of t 1, t5, and t10 years, and we performed separate models for the change in each sociocultural indicator as a predictor of the change in cooperation. The results presented in Table S13 suggest that none of these associations were significant.


Table S12

Relation Between the Change in Sociocultural Indicators and the Change in Cooperation After All Variables Were First-Order Differenced

Sociocultural indicators

β

SE

t

p

Gini index (1968-2017)

.09

.14

0.60

.552

GDP per capita (1961-2017)

-.12

.13

-0.87

.391

Social Welfare Function (1968-2017)

-.12

.14

-0.81

.420

Unemployment rate (1957-2017)

-.04

.13

-0.28

.777

Urbanization (1961-2017)

.02

.13

0.13

.897

Divorce rate (1961-2017)

-.06

.13

-0.47

.642

Percentage of people living alone (1961-2017)

-.22

.13

-1.64

.107

Violent crime rate (1959-2017)

-.10

.13

-0.78

.439

Materialism (1977-2017)

-.22

.16

-1.38

.176

Social trust (1977-2017)

.17

.16

1.05

.302


Table S13

F Statistics From the Granger Test of Predictive Causality at 1- , 5- and 10-Year Lags After All Variables Were First-Order Differenced

Sociocultural indicators

1-year

5-year

10-year

Gini index

F(1, 47) = 0.79, p = .378

F(5, 39) = 1.29, p = .291

F(10, 29) = 1.26, p = .320

GDP per capita

F(1, 54) = 1.14, p = .291

F(5, 46) = 0.17, p = .972

F(10, 36) = 0.38, p = .944

Social Welfare Function

F(1, 47) = 1.35, p = .252

F(5, 39) = 1.04, p = .411

F(10, 29) = 1.10, p = .409

Unemployment rate

F(1, 58) = 1.09, p = .302

F(5, 50) = 0.43, p = .824

F(10, 40) = 0.51, p = .871

Urbanization

F(1, 54) = 0.10, p = .758

F(5, 46) = 0.20, p = .961

F(10, 36) = 0.50, p = .876

Divorce rate

F(1, 54) = 0.00, p = .978

F(5, 46) = 0.23, p = .947

F(10, 36) = 0.68, p = .732

% people living alone

F(1, 54) = 0.21, p = .645

F(5, 46) = 0.35, p = .879

F(10, 36) = 1.82, p = .106

Violent crime rate

F(1, 56) = 0.02, p = .896

F(5, 48) = 0.21, p = .956

F(10, 38) = 0.75, p = .676

Materialism

F(1, 38) = 0.20, p = .654

F(5, 30) = 1.14, p = .367

F(10, 20) = 0.57, p = .808

Social trust

F(1, 38) = 0.08, p = .775

F(5, 30) = 2.53, p = .056

F(10, 20) = 1.23, p = .376


Overall, these time-series analyses suggest that none of the sociocultural indicators were associated with the observed changes in cooperation over time. The significant results of the Granger tests of predictive causality for percentage of people living alone and materialism provided inconsistent findings across the two detrending approaches. This pattern of results strongly differs from the results reported in the text of the paper, which did report that there were some associations between the sociocultural indicators and cooperation estimates. The results of these supplementary analyses imply that the associations between the sociocultural indicators and cooperation could have emerged because each of the variables have increased over time.

That said, there are some severe limitations of using this analytic approach to our data that strongly limit the extent to which the results from these analyses can inform the relationships identified in the main text of the manuscript. Our dataset has multiple cross-sectional cooperation rates estimated for each time point. These estimates originate from different studies and samples. Moreover, each year is not represented by the same number of cooperation estimates. Therefore, to conduct these analyses we had to convert our original dataset of 660 unique effect sizes, to a dataset of 62 effect sizes (from 1956 to 2017). Additionally, these analyses do not account for heterogeneity between the studies. In fact, when pooling the cooperation estimates, the information about the structural variables that differ between the studies is lost. These study characteristics are specific experimental methods that have been found to predict variance in cooperation (e.g., the degree of conflicting interests, and sanctions; Jin et al., 2021). Importantly, the use of some experimental methods has varied over time (see Balliet et al., 2021), which further stresses the value in statistically controlling for between-study heterogeneity when analyzing variation over time in the cooperation estimates. Finally, these analyses on the relations between sociocultural indicators and cooperation were not based on meta-regressions, which weight the cooperation estimates. For these reasons, we have abstained from reporting these analyses in the main text of the paper. The results reported in the paper were instead based on a multilevel meta-regression that accounted for different cooperation estimates per year, statistically controlled for between-study differences in methods, and weighted each effect size.



Analysis of Studies on College Student Samples and Mean Age of 18 to 28 Years

We also found the same results when we conducted additional analysis excluding the 19 studies (i.e., 26 unique samples which contain 3,034 participants) that either did not report participants’ mean age or student information. This exclusion resulted in 494 independent studies used to compute 641 unique cooperation estimates involving 61,200 participants. Furthermore, we report results of analyses conducted after excluding 6 extreme outliers (|Z| > 3.29; see Tabachnick & Fidell, 2007) on the final 493 studies with 635 unique cooperation estimates involving 60,800 participants (see Table S17).

Year of Data Collection. Across all studies that were identified as having college student samples and samples with mean age of 18 to 28 years, the year of data collection ranged from 1956 to 2017 (Mdn = 1998; see Figure S6).

Extreme Outliers Excluded From the Analysis. We also conducted alternative analysis including studies with extreme outliers in our dataset (see Table S14). The results remained the same (see Table S19).

Table S14

Studies With Extreme Outliers in Our Dataset

Study

N

Dilemma

Cooperation rates

Yi

ZYi

Park (2012) Study 1 Treatment 1

94

PD

0.94

2.70

3.29

Bergsieker (2013) Study 3

120

PD

0.94

2.75

3.35

Insko et al. (1990) Study 2

62

PD

0.95

2.94

3.58

Insko et al. (1994) Study 1 Treatment 1

24

PD

0.96

3.18

3.87

Schopler et al. (1991) Study 2 Treatment 1

36

PD

0.96

3.20

3.90

Bochet, et al. (2006) Study 1 Treatment 2

64

PGD

0.97

3.46

4.21

Note. PD = Prisoner’s Dilemma; PGD = Public Goods Dilemma; Yi = logit-transformed cooperation rates; ZYi = standardized logit-transformed cooperation rates.


Figure S6

Histogram of Number of Effect Sizes in Each Year of Data Collection

Note. k = 635.

Correlations Between Study Characteristics. We calculated the correlations between study characteristics based on the original dataset and presented the results in Table S15.

Table S15

Correlations Between Study Characteristics


1

2

3

4

5

6

7

8

9

10

11

12

1.Time












2. % male

-.25*











3. Dilemma typea

.42*

.01










4. Repetitions (mixed)

-.05

-.05

.03









5. Repetitions

-.24*

.25*

-.16*

-.23*








6. Group size

.16*

.04

.60*

-.03

-.21*







7. K index

-.17*

.04

.03

.09*

-.02

.05






8. Communication (mixed)

.05

-.02

-.01

.06

-.02

.05

.02





9. Communication

-.10*

-.21*

-.19*

-.01

.08

-.03

.02

-.07




10. Periodsb

.02

-.04

-.06

.11*

.06

-.05

-.08

.34*

.02



11. Sanctions (mixed)

.10*

.01

.16*

.06

-.02

.06

-.05

-.02

-.01

-.02


12. Sanctions

.16*

.14*

.26*

-.03

-.02

.12*

-.02

-.03

-.09*

-.04

-.03

Note. a 0 = Prisoner’s Dilemma, 1 = Public Goods Dilemma; Repetitions: comparison group = one-shot interaction; Communication: comparison group = no communication; b 0 = overall, 1 = first; Sanctions: comparison group = no sanctions. *p < .05.

Main Results. We first estimated and reported the mean effect size for cooperation rates across the entire sample of studies (k = 635; see Table S16).

Table S16

Estimated Population Mean Cooperation Rate


M

SE

95% CI

τ2

(level 2)

τ2

(level 3)

I2

(level 2)

I2

(level 3)

Cooperation

0.49

.03

[0.47, 0.50]

.35

.11

72.77

22.86

Note. k = 635. τ2 = tau-squared, the estimate of total amount of heterogeneity; I2 = percentage of total variability due to heterogeneity; Level 2 and 3 represent within-study and between-study variance, respectively (hereafter the same). Heterogeneity was significant, Q(634) = 11197.52, p < .001.

Then, we first used year of data collection (time) to predict cooperation in a mixed-effects meta-regression model (Model 1). The results showed that Americans’ cooperation increased significantly over time (b = 0.006, SE = .002, p < .001; see Table S17). Next, we conducted a multivariate meta-regression model to test whether the predicted effect of historical time remained after controlling for all of the study characteristics in the social dilemmas (i.e., dilemma type, proportion of male participants, repetitions, group size, K index, communication, sanctions, and period of cooperation). Time continued to have a statistically significant positive relation with cooperation (b = 0.005, SE = .002, p = .032). In summary, Americans’ cooperation levels increased over time. The results also suggest that communication and sanctions are important means to promote cooperation in social dilemmas.

In addition, people were more cooperative in one-shot dilemmas compared to repeated dilemmas in this subsample while we found no significant difference in cooperation between one-shot and repeated games when using all 660 samples (see Table 3 in the main text). In addition, repetition of interaction has no significant effect on cooperation in the meta-regression model that included larger dataset and controlled for more study characteristics (see Jin et al., 2021). The mixed results didn’t support previous theories and findings that people display more cooperation in repeated interactions compared to one-shot interaction according to game theory and interdependence theory (e.g., Andreoni, 1988; Croson, 1996; Mengel & Peeters, 2011; Van Lange et al., 2011). In this meta-analysis, the possible reason for lower cooperation rates in repeated dilemma is that cooperation declines in repeated Public Good Dilemmas (PGD) over time due to individuals behaving as conditional cooperators (Battu & Srinivasan, 2020). Conditional cooperators donate to a PGD if many others donate to the game and reduce their contributions to the public good in reaction to other participants’ free-riding (Andreozzi et al., 2020). In a repeated PGD, free riding by even a few people in the initial rounds would trigger negative reciprocity or free riding in subsequent rounds (Battu & Srinivasan, 2020). Therefore, the dynamics of conditional cooperation could cause cooperation rates to be higher in one-shot social dilemmas relative to multi-round ones.

Variance Inflation Factor Analysis. We calculated the generalized variance inflation factors (GVIF, Fox & Monette, 1992) to assess multicollinearity among predictors in our full model with all control variables. All the squared values presented in Table S18 fell below the cutoff of 10 (range 1.06~2.07), suggesting that our full model was not affected by multicollinearity.

Table S17

Meta-Regression Models Without Control Variables (Model 1) and Including Control Variables (Model 2)



Model 1



Model 2

Variable

b

SE

95% CI

β


b

SE

95% CI

β

Birth cohort










Time

0.006*

.002

[0.002, 0.010]

.13


0.005*

.002

[0.0004, 0.009]

.10

Dilemma typea






0.05

.09

[-0.12, 0.22]

.04

% male






-0.01

.15

[-0.31, 0.28]

-.003

Repetitionsb










Mixed






-0.14

.21

[-0.54, 0.26]

-.02

Repeated






-0.17*

.07

[-0.31, -0.03]

-.10

Group size






-0.02

.06

[-0.13, 0.09]

-.02

K index






0.32

.18

[-0.04, 0.68]

.08

Communicationc










Mixed






0.23

.24

[-0.24, 0.69]

.04

Communication






0.54*

.08

[0.38, 0.69]

.28

Sanctionsd










Mixed






0.58*

.22

[0.15, 1.00]

.10

Sanctions






0.40*

.13

[0.15, 0.65]

.12

Periodse






0.04

.20

[-0.35, 0.43]

.01

Model statistics










Qmodel(df)


10.96 (1)*



86.59 (12)*

Qresidual(df)


11189.08 (633)*



10064.89 (622)*

R2


.02



.11

τ2 (level 2)


.35



.29

τ2 (level 3)


.10



.12

I2 (level 2)


73.66



67.20

I2 (level 3)


21.88



27.72

Note. k = 635. a 0 = Prisoner’s Dilemma; 1 = Public Goods Dilemma; b Comparison group = one-shot interaction; c Comparison group = no communication; d Comparison group = no sanctions; e 0 = overall, 1 = first. *p < .05.

Table S18

Generalized Variance Inflation Factor (VIF) Values for Independent Variables in the Full Model

Variable

GVIF

Df

GVIF^(1/(2*Df))

(GVIF^(1/(2*Df)))^2

1.Time

1.50

1

1.22

1.49

2. Dilemma typea

2.06

1

1.44

2.07

3. % male

1.20

1

1.09

1.19

4. Repetitionsb

1.22

2

1.05

1.10

5. Group size

1.66

1

1.29

1.66

6. K index

1.07

1

1.03

1.06

7. Communicationc

1.13

2

1.03

1.06

8. Sanctionsd

1.11

2

1.03

1.06

9. Periodse

1.09

1

1.04

1.08

Note. k = 635. a 0 = Prisoner’s Dilemma, 1 = Public Goods Dilemma; b 0 = one-shot interaction, 0.5 = mixed, 1 = repeated interaction; c 0 = no communication, 0.5 = mixed, 1 = communication; d 0 = no sanctions, 0.5 = mixed, 1 = sanctions; e 0 = overall, 1 = first.


Model Results Including Extreme Outliers. We also conducted the analyses including 494 studies with six extreme outliers (see Table S14) in the sample (i.e., 641 unique cooperation estimates involving 61,200 participants). These analyses resulted in the same conclusions as those excluding the outliers (see Table S19).





Table S19

Meta-Regression Models Without Control Variables (Model 1) and Including Control Variables (Model 2)



Model 1



Model 2

Variable

b

SE

95% CI

β


b

SE

95% CI

β

Birth cohort










Time

0.006*

.002

[0.003, 0.010]

.13


0.006*

.002

[0.001, 0.010]

.11

Dilemma typea






0.04

.09

[-0.13, 0.21]

.03

% male






0.03

0.15

[-0.26, 0.33]

.01

Repetitionsb










Mixed






-0.14

.21

[-0.55, 0.28]

-.02

Repeated






-0.16*

.07

[-0.30, -0.01]

-.08

Group size






-0.04

.06

[-0.16, 0.07]

-.04

K index






0.34

.18

[-0.01, 0.69]

.08

Communicationc










Mixed






0.23

.25

[-0.25, 0.71]

.04

Communication






0.59*

.08

[0.43, 0.74]

.29

Sanctionsd










Mixed






0.58*

.22

[0.14, 1.01]

.10

Sanctions






0.39*

.13

[0.13, 0.65]

.11

Periodse






0.01

.20

[-0.38, 0.41]

.003

Model statistics










Qmodel(df)


11.61 (1)*



90.71 (12)*

Qresidual(df)


11345.21 (639)*



10214.01 (628)*

R2


.02



.11

τ2 (level 2)


.36



.30

τ2 (level 3)


.12



.13

I2 (level 2)


72.02



67.10

I2 (level 3)


23.73



28.03

Note. k = 641. a 0 = Prisoner’s Dilemma; 1 = Public Goods Dilemma; b Comparison group = one-shot interaction; c Comparison group = no communication; d Comparison group = no sanctions; e 0 = overall, 1 = first. *p < .05.

Studies Included in the Meta-analysis

Table S20

Studies Included in the Meta-analysis


Sample characteristics


Study characteristics



Study

Stu(#)

Year

N

Mprop

St


DT

Rep

GS

K

Com

San

Per

P(C)

95% CI

Acevedo & Krueger (2005)

1

2003

80

.36

Y


PD

IT

2

.50

X

X

O

0.61

[0.50, 0.71]

Acevedo & Krueger (2005)

2

2003

56

.32

Y


PD

IT

2

.50

X

X

O

0.23

[0.14, 0.36]

Ahn, Esarey, & Scholz (2009)

1

2005

196

NA

Y


PD

IT

2

.50

X

X

O

0.48

[0.41, 0.55]

Ajzen (1971)

1

1969

216

.50

Y


PD

IT

2

.50

X

X

O

0.54

[0.47, 0.60]

Alexander Jr & Weil (1969)

1

1967

40

1.00

Y


PD

IT

2

NA

X

X

O

0.41

[0.27, 0.57]

Alfano & Marwell (1980)

1

1978

80

.50

Y


PGD

OS

58

.24

X

X

O

0.64

[0.57, 0.70]

Al-Ubaydli, Jones, & Weel (2016)

1

2014

167

.68

Y


PD

IT

2

.50

X

X

O

0.40

[0.33 0.48]

Anderson & Putterman (2006)

1

2003

216

NA

Y


PGD

OS

3

.50

X

Y

O

0.65

[0.60, 0.69]

Anderson & Stafford (2003)

1

2001

60

NA

Y


PGD

MI

10

.38

X

MI

O

0.70

[0.60, 0.78]

Anderson, DiTraglia, & Gerlach (2011)

1b

2008

74

.55

Y


PGD

IT

10

NA

X

X

O

0.40

[0.39, 0.41]

Anderson, Mellor, & Milyo (2004)

1

2002

48

NA

Y


PGD

IT

8

.40

X

X

O

0.28

[0.20, 0.36]

Anderson, Mellor, & Milyo (2008)

1

2005

48

.44

Y


PGD

IT

8

.40

X

X

O

0.28

[0.20, 0.36]

Andreoni (1988) Sample a

1

1987

40

NA

Y


PGD

OS

5

.60

X

X

O

0.41

[0.35, 0.48]

Andreoni (1988) Sample b

1

1987

30

NA

Y


PGD

IT

5

.60

X

X

O

0.33

[0.27, 0.40]

Andreoni (1993)

1

1991

108

NA

Y


PGD

IT

3

NA

X

X

O

0.46

[0.41, 0.50]

Andreoni (1995a)

1

1993

120

NA

Y


PGD

OS

5

.26

X

X

O

0.44

[0.40, 0.48]

Andreoni (1995b)

1

1993

80

NA

Y


PGD

OS

5

.60

X

X

O

0.25

[0.22, 0.28]

Andreoni, Brown, & Vesterlund (2002)

1

1999

126

NA

Y


PGD

OS

2

NA

X

X

O

0.38

[0.34, 0.41]

Andreoni & Gee (2012)

1

2011

192

NA

X


PGD

OS

4

.71

X

MI

O

0.56

[0.52, 0.60]

Table S20 (Continued)

















Sample characteristics


Study characteristics



Study

Stu(#)

Year

N

Mprop

St


DT

Rep

GS

K

Com

San

Per

P(C)

95% CI

Andreoni & Gee (2015)

1

2013

108

NA

Y


PGD

OS

4

NA

X

MI

O

0.62

[0.56, 0.68]

Andreoni & Petrie (2008)

1

2006

80

.51

Y


PGD

IT

5

.60

X

X

O

0.35

[0.31, 0.39]

Arnstein & Feigenbaum (1967)

1

1966

23

NA

Y


PD

IT

2

.44

X

X

O

0.59

[0.39, 0.77]

Arora, Peterson, et al. (2012)

1

2009

300

NA

Y


PD

OS

4

.65

X

X

O

0.65

[0.60, 0.70]

Arora, Peterson, et al. (2012)

2

2009

300

NA

Y


PD

IT

4

.65

X

X

O

0.27

[0.23, 0.33]

Atanasov & Kunreuther (2016)

1

2013

130

NA

Y


PD

IT

2

.42

X

X

O

0.57

[0.49, 0.65]

Atanasov & Kunreuther (2016)

2

2013

118

NA

Y


PD

IT

2

.42

X

X

O

0.49

[0.40, 0.58]

Atilgan (2017)

1

2016

48

NA

Y


PGD

OS

2

.50

X

X

O

0.60

[0.52, 0.69]

Atilgan (2017)

2

2016

145

NA

Y


PGD

MI

2

.50

X

X

F

0.58

[0.52, 0.63]

Atilgan (2017)

3

2016

224

.28

Y


PGD

IT

2

.50

X

X

F

0.59

[0.55, 0.63]

Balliet (2007)

1

2007

135

NA

Y


PGD

OS

55

NA

X

X

O

0.75

[0.68, 0.81]

Barker, Barclay, & Reeve (2013)

1

2012

104

.56

Y


PGD

IT

4

.50

X

X

O

0.39

[0.35, 0.44]

Baron (2001)

1a

1999

84

.29

X


PD

IT

3

.18

X

X

O

0.68

[0.57, 0.77]

Batson & Ahmad (2001)

1

2000

60

.00

Y


PD

OS

2

.40

X

X

O

0.18

[0.10, 0.30]

Batson & Moran (1999)

1

1998

60

.00

Y


PD

OS

2

.40

X

X

O

0.47

[0.35, 0.59]

Baxter Jr (1973)

1

1968

90

.00

Y


PD

IT

2

.80

X

X

O

0.65

[0.54, 0.74]

Berg, Lilienfeld, & Waldman (2013)

1

2011

210

.32

Y


PD

IT

2

NA

X

X

O

0.67

[0.60, 0.73]

Bettenhausen & Murnighan (1991)

1

1989

226

.21

Y


PD

IT

2

.15

X

X

O

0.49

[0.42, 0.55]

Bigoni, Camera, & Casari (2013) Sample a

1

2012

80

NA

Y


PD

OS

2

.60

X

Y

O

0.65

[0.54, 0.74]

Bigoni, Camera, & Casari (2013) Sample b

1

2012

80

NA

Y


PD

OS

2

.60

X

X

O

0.48

[0.37, 0.59]

Bixenstine, Lowenfeld, & Engelhart (1981)

1

1980

64

.50

Y


PD

IT

2

.33

X

X

O

0.55

[0.42, 0.66]

Bixenstine, Lowenfeld, & Engelhart (1981)

2

1980

96

.50

Y


PD

IT

2

.33

X

X

O

0.57

[0.47, 0.66]

Black & Higbee (1973) Sample a

1

1972

36

.50

Y


PD

IT

2

.50

X

X

O

0.14

[0.06, 0.29]

Table S20 (Continued)

















Sample characteristics

Study characteristics



Study

Stu(#)

Year

N

Mprop

St


DT

Rep

GS

K

Com

San

Per

P(C)

95% CI

Black & Higbee (1973) Sample b

1

1972

36

.50

Y


PD

IT

2

.50

X

Y

O

0.28

[0.16, 0.45]

Bochet, Page, & Putterman (2006) Sample a

1

2002

96

NA

Y


PGD

IT

4

.33

X

MI

O

0.58

[0.52, 0.64]

Bochet, Page, & Putterman (2006) Sample b

1

2002

96

NA

Y


PGD

IT

4

.33

Y

MI

O

0.89

[0.76, 0.95]

Bochet, Page, & Putterman (2006) Sample c

1

2002

88

NA

Y


PGD

IT

4

.33

Y

MI

O

0.57

[0.51, 0.63]

Bochet & Putterman (2009) Sample a

1

2007

48

NA

Y


PGD

IT

4

.33

X

X

O

0.48

[0.41, 0.55]

Bochet & Putterman (2009) Sample b

1

2007

44

NA

Y


PGD

IT

4

.33

Y

X

O

0.47

[0.40, 0.54]

Bochet & Putterman (2009) Sample c

1

2007

44

NA

Y


PGD

IT

4

.33

Y

X

O

0.45

[0.38, 0.52]

Bochet & Putterman (2009) Sample d

1

2007

48

NA

Y


PGD

IT

4

.33

X

X

O

0.69

[0.58, 0.79]

Bochet & Putterman (2009) Sample e

1

2007

44

NA

Y


PGD

IT

4

.33