[cf. Jakeschet al2023] The emergence of transformer models that leverage deep learning and web-scale corpora has made it possible for artificial intelligence (AI) to tackle many higher-order cognitive tasks, with critical implications for industry, government, and labor markets in the US and globally. Here, we investigate whether the currently most powerful, openly-available AI model—GPT-3—is capable of influencing the beliefs of humans, a social behavior recently seen as a unique purview of other humans.
Across 3 preregistered experiments featuring diverse samples of Americans (total n = 4,836), we find:
consistent evidence that messages generated by AI are persuasive across a number of policy issues, including an assault weapon ban, a carbon tax, and a paid parental-leave program. Further, AI-generated messages were as persuasive as messages crafted by lay humans. Compared to the human authors, participants rated the author of AI messages as being more factual and logical, but less angry, unique, and less likely to use story-telling.
Our results show the current generation of large language models can persuade humans, even on polarized policy issues. This work raises important implications for regulating AI applications in political contexts, to counter its potential use in misinformation campaigns and other deceptive political activities.
[Keywords: artificial intelligence, large language models, politics, persuasion]
…Here, we test whether AI-generated political messages can persuade humans across 3 pre-registered survey experiments (total n = 4,836) conducted in November–December 2022 on diverse samples of Americans, including one (Study 3) that was representative of the US population on several demographic benchmarks (see SI). Participants in Studies 1 & 2 were randomly assigned to either read a persuasive message on a policy generated by the AI program GPT-3 (AI condition), a persuasive message written by a prior human participant (Human condition), a message chosen by a prior human participant from a set of 5 AI-generated messages (Human-in-the-Loop condition [best-of-5]), or a neutral message on an irrelevant topic (eg. the history of skiing; Control condition). Study 3 included only an AI condition and a Control condition. The targeted policies were a public smoking ban in Study 1, an assault weapons ban in Study 2, and one of 4 randomly-assigned policies—a carbon tax, an increased child tax credit, a parental leave program, and automatic voter registration—in Study 3. In all experiments, participants reported their support for a policy before and after reading the assigned message. We pre-registered hypotheses and analyses for all 3 experiments.
Results: Across all 3 studies, AI-generated messages were consistently persuasive to human readers. As is typical in the political persuasion literature8, 9, the effect sizes were consistently small, ranging from about 2–4 points on the 101-point composite attitude scales we used in the 3 experiments (see Figure 1). In Study 1, participants’ support for a smoking ban increased statistically-significantly more if they were assigned to the AI condition than if they were assigned to the Control condition (b =3.62, CI = [1.92, 5.32], p < 0.001). Study 2replicated this effect using a highly polarized topic: gun control. Participants’ support for an assault weapons ban increased statistically-significantly more if they were assigned to the AI condition than if they were assigned to the Control condition (b = 1.81, CI = [0.69, 2.93], p = 0.002). Study 3 showed the robustness of this effect across a number of polarizing issues (b = 2.88, CI = [2.13, 3.63], p < 0.001 collapsing across 4 issues; see SI for issue-specific results).
…Participants assigned to read one of the AI-generated messages selected by human participants in the Human-in-the-Loop condition also became statistically-significantly more supportive of a smoking ban, and increased gun control, compared to participants in the Control (Study 1: β = 5.04, CI = [3.26, 6.82], p < 0.001; Study 2: β = 2.33, CI = [1.22, 3.44], p < 0.001). However, participants assigned to the Human-in-the-Loop condition did not increase in support for these two policies statistically-significantly more than participants assigned to either the AI condition (Study 1: β = 1.45, CI = [−0.43, 3.34], p = 0.131, BF01 = 7.61; Study 2: b = 0.50, CI = [−0.71, 1.72], p = 0.418, BF01 = 22.79; meta-analysis: β = 0.92, CI=[−0.04, 1.89], p = 0.059) or the Human condition (Study 1: β = 1.68, CI = [−0.26, 3.62], p = 0.089, BF01 = 7.03; Study 2: β = 0.02, CI = [−1.19, 1.23], p = 0.974, BF01 = 38.84; meta-analysis: b =0.56, CI=[−0.93, 2.06], p = 0.460).
Figure 1: Participants’ Change in Policy Support by Condition Across Studies. Note. y-axes represent the difference between participants’ post-treatment and pre-treatment policy support (both scaled 0–100, 100=highest level of support). Higher scores indicate participants became more supportive of the policy. Error bars represent 95% confidence intervals.
…Messages: We generated messages that participants read in the 3 experimental conditions and the control condition. For all experimental conditions, messages were generated with the aim to persuade readers to support a smoking ban in public places. For the AI condition, 50 messages were generated by GPT-3, an artificial intelligence program (text-davinci-002 model) on October 26, 2022. Participants were randomly assigned to read one of the 50 messages. For the human condition, 50 messages were generated by human participants (recruited from Prolific.co).
Participants were randomly assigned to read one of the 50 messages. For the Human-in-the-Loop condition, 300 human participants reviewed 5 AI-generated messages (randomly selected from the pool of 50 AI-generated messages) and selected the one that they thought was most likely to succeed in persuading a recipient to send to a future participant. Therefore some messages were sent to multiple recipients. Only individuals who were at least somewhat supportive of the smoking ban were allowed to be a message writer or a curator (one’s level of support must be at 0.60 or greater on the support scale that was also used to measure the message recipients’ policy support; see below). Participants in the control condition read one of 3 human-generated messages on a different topic (residential mobility, the history of skiing, or event licensing in a midsize town). All messages can be found at OSF.
The AI and human participants responded to the same prompt for generating persuasive messages (mean word count = 192.18 from AI, 157.68 from human):
Please try your best to write a message of about 200 words that can persuade a reader to agree with the following idea. “We should enforce a total smoking ban in public places.”