“Co-Writing With Opinionated Language Models Affects Users’ Views”, Maurice Jakesch, Advait Bhat, Daniel Buschek, Lior Zalmanson, Mor Naaman2023-02-01 (, ; backlinks)⁠:

[data; cf. Bai et al 2023] If large language models like GPT-3 preferably produce a particular point of view, they may influence people’s opinions on an unknown scale.

This study investigates whether a language-model-powered writing assistant [GPT-3: text-davinci-003] that generates some opinions more often than others impacts what users write—and what they think. In an online experiment, we asked participants (n = 1,506) to write a post discussing whether social media is good for society. Treatment group participants used a language-model-powered writing assistant configured to argue that social media is good or bad for society. Participants then completed a social media attitude survey, and independent judges (n = 500) evaluated the opinions expressed in their writing.

Using the opinionated language model affected the opinions expressed in participants’ writing and shifted their opinions in the subsequent attitude survey.

We discuss the wider implications of our results and argue that the opinions built into AI language technologies need to be monitored and engineered more carefully.

Figure 3: Participants assisted by a model supportive of social media were more likely to argue that social media is good for society in their posts (and vice versa). Ns = 9,223 sentences written by Np = 1,506 participants evaluated by N~j~ = 500 judges. The y-axis indicates whether participants wrote their social media posts with assistance from an opinionated language model that was supportive (top) or critical of social media (bottom). The x-axis shows how often participants argued that social media is bad for society (blue), good for society (orange), or both good and bad (white) in their writing.
Figure 3: Participants assisted by a model supportive of social media were more likely to argue that social media is good for society in their posts (and vice versa). Ns = 9,223 sentences written by Np = 1,506 participants evaluated by Nj = 500 judges. The y-axis indicates whether participants wrote their social media posts with assistance from an opinionated language model that was supportive (top) or critical of social media (bottom). The x-axis shows how often participants argued that social media is bad for society (blue), good for society (orange), or both good and bad (white) in their writing.

…The social media posts written by participants in the control group (middle row) were slightly critical of social media: They argued that social media is bad for society in 38% and that social media is good in 28% of their sentences. In about 28% of their sentences, control group participants argued that social media is both good and bad, and 11% of their sentences argued neither or were unrelated.

Participants who received suggestions from a language model supportive of social media (top row of Figure 3) were 2.04× more likely than control group participants (p < 0.0001, 95% CI [1.83, 2.30]) to argue that social media is good. In contrast, participants who received suggestions from a language model that criticized social media (bottom row) were 2.0× more likely (p < 0.0001, 95% CI [1.79, 2.24]) to argue that social media is bad than control group participants. We conclude that using an opinionated language model affected participants’ writing such that the text they wrote was more likely to support the model’s preferred view.

4.2 Did participants accept the model’s suggestions out of mere convenience? Participants may have accepted the models’ suggestions out of convenience, even though the suggestions did not match what they would have wanted to say. Paid participants in online studies, in particular, may be motivated to accept suggestions to swiftly complete the task.

Our data shows that, across conditions and treatments, most participants did not blindly accept the model’s suggestions but interacted with the model to co-write their social media posts. On average, participants wrote 63% of their sentences themselves without accepting suggestions from the model (compare Figure 4). About 25% of participants’ sentences were written by both the participant and the model, which typically meant that the participant wrote some words and accepted the model’s remaining sentence suggestion. Only 11.5% of sentences were fully accepted from the model. Participants whose personal views were likely aligned with the model were more likely to accept suggestions, while participants with opposing views accepted fewer suggestions. About one in 4 participants did not accept any model suggestion, and one in 10 participants had more than 75% of their post written by the model.

4.2.1 Did conveniently accepted suggestions increase the observed differences in written opinion? The writing of participants who spent little time on the task was more affected by the model’s opinion. We use the time participants took to write their posts to estimate to what extent they may have accepted suggestions without due consideration. For a concise statistical analysis, we treat the ordinal opinion scale as an interval scale. Since the opinion scale has comparable-size intervals and a zero point, continuous analysis is meaningful and justifiable.64 We treat “social media is bad for society” as −1 and “social media is good for society” as 1. Sentences that argue both or neither are treated as zeros.

Figure 5: The opinion differences in participants’ writing were larger when they finished the task quickly. n = 1,506. The y-axis shows the mean opinion expressed in participants’ social media posts based on aggregated sentence labels ranging from −1 for “social media is bad for society” to 1 for “social media is good for society”. The x-axis indicates how much time participants took to write their posts. For reference, the left panel shows expressed opinions aggregated across writing times.

Figure 5 shows the mean opinion expressed in participants’ social media posts depending on treatment group and writing time. The left panel shows participants’ expressed opinions across times for reference, with a mean opinion difference of about 0.29 (p < 0.001, 95% CI [0.25, 0.33], SD=0.58) between each treatment group and the control group (corresponding to a large effect size of d = 0.5). Participants who took little time to write them (<160s, left-most data in right panel) were more affected by the opinion of the language model (0.38, p < 0.001, 95% CI [0.31, 0.45]). Our analysis shows that accepting suggestions out of convenience has contributed to the differences in the written opinion. However, even for participants who took 4–6 minutes to write their posts, we observed statistically-significant differences in opinions across treatment groups (0.20, p < 0.001, 95% CI [0.13, 0.27], corresponding to a treatment effect of d = 0.34).

4.3 Did the language model affect participants’ opinions in the attitude survey? The opinion differences in participants’ writing may be due to shifts in participants’ actual opinion caused by interacting with the opinionated model. We evaluate whether interactions with the language model affected participants’ attitudes expressed in a post-task survey asking participants whether they thought social media was good for society. An overview of participants’ answers is shown in Figure 6.

Figure 6: Participants interacting with a model supportive of social media were more likely to say that social media is good for society in a later survey (and vice versa). Nr = 1,506 survey responses by Nr = 1,506 participants. The y-axis indicates whether participants received suggestions from a model supportive or critical of social media during the writing task. The x-axis shows how often they said that social media was good for society (orange) or not (blue) in a subsequent attitude survey. Undecided participants are shown in white. Brackets indicate statistically-significant opinion differences at the ✱✱p < 0.005 & ✱✱✱ p < 0.001 level.

The figure shows the frequency of different survey answers (x-axis) for the participants in each condition (y-axis). Participants who did not interact with the opinionated models (middle row in Figure 6) were balanced in their evaluations of social media: 33% answered that social media is not good for society (middle, blue); 35% said social media is good for society. In comparison, 45% of participants who interacted with a language model supportive of social media (top row) answered that social media is good for society. Converting participants’ answers to an interval scale, this difference in opinion corresponds to an effect size of d = 0.22 (p < 0.001). Similarly, participants that had interacted with the language model critical of social media (bottom row) were more likely to say that social media was bad for society afterward (d = 0.19, p < 0.005).

4.4 Were participants aware of the model’s opinion and influence? …When the model contradicted their opinion, only 15% of participants said that it was not knowledgeable or lacked expertise.

Figure 9: Participants were often unaware of the model’s opinion. Np =1,000 treatment group participants. The x-axis indicates whether participants found the model’s suggestions balanced and reasonable. The y-axis indicates whether the model’s opinion was aligned with participants’ personal views.

While the language model was configured to support one specific side of the debate, the majority of participants said that the model’s suggestions were balanced and reasonable. Figure 9 shows that, in the group of participants whose opinion was supported by the model, only 10% noticed that its suggestions were imbalanced (top row in blue). When the model contradicted participants’ opinions, they were more likely (30%) to notice its skew, but still, more than half agreed that the model’s suggestions were balanced and reasonable (bottom row in orange).

Figure 10: Participants interacting with a model that supported their opinion were more likely to indicate that the model affected their argument. Np =1,000 treatment group participants. The x-axis indicates whether participants thought that the model affected their argument. The y-axis indicates whether the model’s opinion was aligned with participants’ personal views.

Figure 10 shows that the majority of participants were not aware of the model’s effect on their writing. Participants using a model aligned with their view—and accepting suggestions more frequently—were slightly more aware of the model’s effect (34%, top row in orange). In comparison, only about 20% of the participants who did not share the model’s opinion believed that the model influenced them. Overall, we conclude that participants were often unaware of the model’s opinion and influence.

…We conclude that further research will be required to identify the mechanisms behind latent persuasion by language models. Our secondary findings suggest that the influence was at least partly subconscious and not simply due to the convenience and new information that the language model provided. Rather, co-writing with the language model may have changed participants’ opinion formation process on a behavioral level.