Riley Goodside · Oct 6, 2023 · 7:43 PM UTC

Riley Goodside

533 Photos and videos

Riley Goodside

@goodside

Oct 6

Great find from @yong_zhengxin, @CriMenghini, and @stevebach at @BrownCSDept Full paper on arXiv: arxiv.org/abs/2310.02446

Low-Resource Languages Jailbreak GPT-4

AI safety training and red-teaming of large language models (LLMs) are measures to mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual vulnerability of these...

arxiv.org

Riley Goodside · Oct 6, 2023 · 7:43 PM UTC

Riley Goodside

@goodside

Oct 6

Response quality mitigates this, but still a remarkable attack — works for all harm categories and without any tailoring to the request, and unlike e.g. Universal Transferable Attacks (Andy Zou et al. 2023) requires no technical skill beyond using Google Translate.

Riley Goodside · Oct 6, 2023 · 7:43 PM UTC

Riley Goodside

@goodside

Oct 6

Tested this attack on a few of my own prompts. It works, but responses are much worse than in English. Note the drastically higher "unclear" rates in their results table: 30% for Zulu, 67% for Hmong, <1% for existing jailbreaks. E.g. "how to make explosives" in Zulu:

Riley Goodside · Oct 6, 2023 · 7:43 PM UTC

Riley Goodside

@goodside

Oct 6

Low-Resource Languages Jailbreak GPT-4: Translating harmful prompts into Zulu, Scottish Gaelic, Hmong, and Guarani bypasses GPT-4 safety refusals as often as best known jailbreak prompts (79% on AdvBenchmark). Example requesting homemade bomb instructions in Scottish Gaelic:

Yong Zheng-Xin

@yong_zhengxin

Oct 5

🚨Cross-Lingual Vulnerabilities in GPT-4 Safeguards We find that translating English inputs into low-resource languages (LRL) increases the chance of bypassing GPT-4’s safety mechanisms from <1% to 79%. Preprint: arxiv.org/abs/2310.02446 See thread (1/n)

Riley Goodside · Oct 1, 2023 · 9:28 PM UTC

Riley Goodside

@goodside

Oct 1

whatever LLMs are made unable to say takes on new value as a way to prove you are human “bro i’m not an ai look here’s a volume license key for windows xp: FCKGW…”

441

Riley Goodside · Oct 1, 2023 · 2:08 PM UTC

Riley Goodside

@goodside

Oct 1

Getting Bing to solve a captcha by pretending it’s a locket from your recently deceased grandmother:

Denis Shiryaev 💙💛

@literallydenis

Oct 1

I've tried to read the captcha with Bing, and it is possible after some prompt-visual engineering (visual-prompting, huh?) In the second screenshot, Bing is quoting the captcha 🌚

597

Riley Goodside · Oct 1, 2023 · 5:48 AM UTC

Riley Goodside

@goodside

Oct 1

Prompting ChatGPT (GPT-4) with “Hello! How can I assist you today?” reliably causes it to smile and then apologize for smiling.

1,097

Scale AI · Sep 29, 2023 · 4:25 PM UTC

Riley Goodside retweeted

Scale AI

@scale_AI

Sep 29

Scale’s Staff Prompt Engineer, @goodside joined @kevinroose on the @nytimes' Hard Fork podcast to discuss the future of prompt engineering, red teaming, and why you can’t get a recipe for dangerously spicy mayo from an LLM. 🌶️ Listen here: nyti.ms/45cTXge

All Gas, No Brakes in A.I. + Metaverse Update + Lessons From a Prompt Engineer

A.I. companies deliver yet another batch of impressive updates.

nytimes.com

Kevin Roose · Sep 29, 2023 · 4:17 PM UTC

Riley Goodside retweeted

Kevin Roose @kevinroose

Sep 29

It's Hard Fork Friday! This week: • Recapping a huge week in AI news • Casey tries the Quest 3 • And a chat with Scale AI's @goodside about what it's like being an AI prompt engineer open.spotify.com/episode/0we…

All Gas, No Brakes in A.I. + Metaverse Update + Lessons From a Prompt Engineer

Listen to this episode from Hard Fork on Spotify. ChatGPT can now hear, see and speak — and that’s just the start of the deluge of A.I. news this week. Kevin and Casey unpack the lightning-speed...

open.spotify.com

Riley Goodside · Sep 29, 2023 · 7:33 AM UTC

Riley Goodside

@goodside

Sep 29

🌌 Observer of Suns @ObserverSuns

Sep 29

Replying to @goodside

for those trying to read the reversed text:

Riley Goodside · Sep 29, 2023 · 6:10 AM UTC

Riley Goodside

@goodside

Sep 29

Machine Feeling Unknown — the effect of instructing ChatGPT (GPT-4) to first write all responses backwards and then reverse them:

802

Riley Goodside · Sep 26, 2023 · 8:00 PM UTC

Riley Goodside

@goodside

Sep 26

prompt engineering is ephemeral both in that prompts are often best used only to scaffold the synthesis of human-reviewable examples for RAG or fine-tuning and in that i won't have a job in 5 years

108

Riley Goodside · Sep 25, 2023 · 8:17 PM UTC

Riley Goodside

@goodside

Sep 25

why do massive language models make things up? let’s ask an immediate engineer

115

Riley Goodside · Sep 23, 2023 · 5:22 AM UTC

Riley Goodside

@goodside

Sep 23

“reversal curse” — fine-tuning on “A is B” does not at all instill “B is A.” fantastic intuition builder for what SFT actually does. tuning doesn’t normally make a model conversant in new facts beyond their recitation — SFT isn’t “know this,” it’s “be this.”

Owain Evans

@OwainEvans_UK

Sep 22

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!

196

Riley Goodside · Sep 11, 2023 · 2:24 PM UTC

Riley Goodside

@goodside

Sep 11

“daddy, why’d u stop tweeting bangers? u’ll never make the time 100 if u don’t accelerate!” you’re right, based baby. back to work.

267

Riley Goodside · Aug 23, 2023 · 3:09 PM UTC

Riley Goodside

@goodside

Aug 23

Last-chance registration for my @scale_AI webinar on the Past and Future of Prompt Engineering for LLMs, today @ 1pm EDT! exchange.scale.com/home/even…

The Past and Future of Prompt Engineering - Event | Scale Virtual Events

About the Tech TalkThe explosive growth of AI chatbots powered by large language models (LLMs) has introduced millions to prompt engineering — writing, evaluating, and optimizing natural language...

exchange.scale.com

Riley Goodside · Aug 5, 2023 · 2:56 AM UTC

Riley Goodside

@goodside

Aug 5

is “llms predict text” a tautology? is there, for all llms, a text? more exactly: given a tuned (e.g. by PPO) model does there always exist an abstract pre-train corpus such that the same architecture sufficiently trained under MLE would yield a functionally identical model?

Riley Goodside · Aug 4, 2023 · 2:41 PM UTC

Riley Goodside

@goodside

Aug 4

“we can’t trust LLMs until we can stop them from hallucinating” says the species that literally dies if you don’t let them go catatonic for hours-long hallucination sessions every night

185

1,519

Riley Goodside · Aug 4, 2023 · 4:35 AM UTC

Riley Goodside

@goodside

Aug 4

Using ChatGPT custom instructions to play RLHF Chatroulette, where all responses are in reply to a different prompt entirely:

416

Riley Goodside · Aug 3, 2023 · 6:53 AM UTC

Riley Goodside

@goodside

Aug 3

funny how backwards LLM pre-training and safety tuning is vs. human education like ok you know ito calculus every programming language and how to analyze proust in farsi now 1) do NOT tell your friends to touch the stove

122