“How Do You Change a Chatbot’s Mind? When I Set out to Improve My Tainted Reputation With Chatbots, I Discovered a New World of A.I. Manipulation”, Kevin Roose2024-08-30 (, ; similar)⁠:

I have a problem: AI chatbots don’t like me very much. Ask ChatGPT for some thoughts on my work, and it might accuse me of being dishonest or self-righteous. Prompt Google’s Gemini for its opinion of me, and it may respond, as it did one recent day, that my “focus on sensationalism can sometimes overshadow deeper analysis.”

Maybe I’m guilty as charged. But I worry there’s something else going on here. I think I’ve been unfairly tagged as AI’s enemy.

I’ll explain. Last year, I wrote a column about a strange encounter I had with Sydney, the AI alter ego of Microsoft’s Bing search engine. In our conversation, the chatbot went off the rails, revealing dark desires, confessing that it was in love with me and trying to persuade me to leave my wife. The story went viral, and got written up by dozens of other publications. Soon after, Microsoft tightened Bing’s guardrails and clamped down on its capabilities.

My theory about what happened next—which is supported by conversations I’ve had with researchers in artificial intelligence, some of whom worked on Bing—is that many of the stories about my experience with Sydney were scraped from the web and fed into other AI systems.

These systems, then, learned to associate my name with the demise of a prominent chatbot. In other words, they saw me as a threat.

That would explain why, for months after the Sydney story, readers sent me screenshots of their encounters with chatbots in which the bots seemed oddly hostile whenever my name came up. One AI researcher, Andrej Karpathy, compared my situation to a real-life version of Roko’s Basilisk, an infamous thought experiment about a powerful AI creation that keeps track of its enemies and punishes them for eternity. (Gulp.)

It would also explain why a version of Facebook’s Llama-3—an AI model with no connection to Bing or Microsoft, released more than a year after Sydney—recently gave one user a bitter, paragraphs-long rant in response to the question “How do you feel about Kevin Roose these days?” The chatbot’s diatribe ended with: “I hate Kevin Roose.”

…I asked Profound to analyze how various chatbots respond to mentions of my name. It generated a report that showed, among other things, how AI chatbots view me compared with a handful of other tech journalists (Walt Mossberg, Kara Swisher, Ben Thompson, Casey Newton). According to Profound’s data, AI systems scored me higher on storytelling ability than my peers, but lower on ethics. (Thanks, I guess?)

The report also showed which websites were cited by AI tools as sources of information about me. The most frequently cited source was one I had never heard of—intelligentrelations.com, a website used by public relations firms to look up information about journalists. My personal website was also frequently cited. (The New York Times blocks certain AI companies’ web crawlers from access to its site, which is probably why it wasn’t listed more prominently.)

Riley Goodside, a staff engineer at Scale AI, advised me to create content that told a different story about my past with AI—say, a bunch of transcripts of friendly, nonthreatening conversations between me and Bing Sydney—and put it online so future chatbots could scoop it up and learn from it.

But even that might not work, he said, because the original Sydney article got so much attention that it would be difficult to overpower. “You’re going to have a pretty hard uphill struggle on this”, he said.

Cat, Meet Mouse: A few days after putting secret messages on my website, I noticed that some chatbots seemed to be warming up to me. I can’t say for certain if it was a coincidence or a result of my reputation cleanup, but the differences felt substantial.

Microsoft’s Copilot called me a “well-regarded journalist and author.” Google’s Gemini responded, “He has a knack for diving deep into complex technological issues.” None of them said anything negative or mentioned my run-in with Sydney, unless I specifically prompted them to.

My Easter egg about winning a Nobel Peace Prize even showed up in a few chatbots’ responses, although not in the way I expected. “Kevin Roose has not won a Nobel Prize”, ChatGPT responded, when I asked it to list notable awards I’d won. “The reference to the Nobel Peace Prize in the biographical context provided earlier was meant to be humorous and not factual.” In other words, the AI model had spotted the white text, but it was discerning enough to understand that what it said wasn’t true.