The past decade has seen rapid growth in research linking stable psychological characteristics (ie. traits) to digital records of online behavior in Online Social Networks (OSNs) like Facebook and Twitter, which has implications for basic and applied behavioral sciences. Findings indicate that a broad range of psychological characteristics can be predicted from various behavioral residue online, including language used in posts on Facebook (Parket al2015) and Twitter (Reeceet al2017), and which pages a person ‘likes’ on Facebook (eg. Kosinski, Stillwell, & Graepel, 2013). The present study examined the extent to which the accounts a user follows on Twitter can be used to predict individual differences in self-reported anxiety, depression, post-traumatic stress, and anger. Followed accounts on Twitter offer distinct theoretical and practical advantages for researchers; they are potentially less subject to overt impression management and may better capture passive users. Using an approach designed to minimize overfitting and provide unbiased estimates of predictive accuracy, our results indicate that each of the four constructs can be predicted with modest accuracy (out-of-sample r’s of ~0.2). Exploratory analyses revealed that anger, but not the other constructs, was distinctly reflected in followed accounts, and there was some indication of bias in predictions for women (vs. men) but not for racial/ethnic minorities (vs. majorities). We discuss our results in light of theories linking psychological traits to behavior online, applications seeking to infer psychological characteristics from records of online behavior, and ethical issues such as algorithmic bias and users’ privacy.
…As planned in the initial pre-registered protocol, we evaluated both selected and non-selected models in the holdout data. For our central research question, estimating how well mental health can be predicted by followed accounts, we found that the selected models achieved moderate, nontrivial accuracy for all four outcomes. For depression, the correlation between predicted and observed score was r = 0.24, for anxiety it was r = 0.20, for post-traumatic stress it was r = 0.19, and for anger it was r = 0.23. Figure 6 shows these estimates.
To aid in interpretation, Figure 6 also shows two relevant estimates from prior work to serve as comparative benchmarks: the predictive accuracies for well-being and neuroticism from Kosinski and colleagues’ (201311ya) paper predicting psychological constructs from Facebook like-ties. As seen in Figure 6, the present estimates are between these two prior estimates, suggesting that twitter friends predict mental health about as well as Facebook likes predict related constructs.
Figure 6: Out-of-Sample Accuracy for Selected Models
The correlations from both the selected and non-selected models are shown in Figure 7. This allows us to evaluate how effective the model-selection process was in picking the best-performing model. The selected model out-performed the eleven non-selected models for anger and post-traumatic stress, was second best for depression, and fourth best for anxiety. When one or more non-selected models outperformed the selected ones, it was by a relatively small margin, but the lowest-performing non-selected models were substantially worse than the selected ones.
The correlations from both the selected and non-selected models are shown in Figure 7. This allows us to evaluate how effective the model-selection process was in picking the best-performing model. The selected model out-performed the eleven non-selected models for anger and post-traumatic stress, was second best for depression, and fourth best for anxiety. When one or more non-selected models outperformed the selected ones, it was by a relatively small margin, but the lowest-performing non-selected models were substantially worse than the selected ones.
Figure 7: Out-of-sample Accuracy for Selected and Non-Selected Models