Nathan Labenz · Jan 7, 2023 · 3:29 PM UTC

Nathan Labenz

Nathan Labenz

@labenz

7 Jan 2023

Returning to the interview… Gary mischaracterizes how leaders in LLM development are thinking about and making progress. He says: "it’s mysticism to think that “scale is all you need” – the idea was we just keep making more of the same, and it gets better and better."

Nathan Labenz · Jan 7, 2023 · 3:32 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

This paradigm did in fact take us an amazingly long way, but for at least 2 years now, OpenAI and other leading AI labs have moved beyond a purely scale-driven approach. This is not GPT4 speculation; they’ve published quite a bit about it!

Nathan Labenz

@labenz

11 Oct 2022

From @OpenAI's Reinforcement Learning / Instruct paper: "In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters." Small models, huge implications

Nathan Labenz · Jan 7, 2023 · 3:34 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

He also mischaracterizes how modern LLMs are trained: "They’re just looking, basically, at autocomplete. They’re just trying to autocomplete our sentences." Again, this is years old, and in a field moving at light speed, that puts you light years behind.

Nathan Labenz · Jan 7, 2023 · 3:35 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

GPT3, in 2020, was all about autocomplete – that's why you had to provide examples or other highly suggestive prompts to get quality output. Today, for most use cases, you can just tell the AI what you want, ask questions, etc. It almost always understands what you want.

Nathan Labenz · Jan 7, 2023 · 3:35 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

That's because modern LLMs are trained using Reinforcement Learning from Human Feedback and other instruction-tuning variations designed to teach models to follow instructions and satisfy the user's desires.

Nathan Labenz · Jan 7, 2023 · 3:36 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Critically, these techniques require just a tiny fraction of the data and compute that general web-scale pre-training requires. Here's OpenAI's head of alignment research Jan Leike on that point:

Jan Leike

@janleike

27 Jan 2022

Replying to @janleike

For comparison: We spent <2% of the pretraining compute on fine-tuning and collect a few 10,000s of human labels and demos. Our 1.3b parameter models (GPT-2 sized!) are preferred over a prompted 175b parameter GPT-3.

Nathan Labenz · Jan 7, 2023 · 3:36 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

The catch is that the data quality has to be really good, and as you’d expect, that’s a challenge, especially when you start moving up-market into domains like medicine and law. But that's the problem the leaders in the field are currently solving, again with fast progress!

Nathan Labenz · Jan 7, 2023 · 3:38 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

The latest trend - and if you’re new, yes this is real - is to use AIs to help train themselves using schemes where AIs try a task, critique and try to improve their own work, and ultimately learn to be better. This really works, at least to some degree:

Anthropic

@AnthropicAI

16 Dec 2022

We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little. We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI: anthropic.com/constitutional…

Nathan Labenz · Jan 7, 2023 · 3:40 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

On a very serious note, RLHF & related techniques are not without problems – on the contrary, some of the smartest people I know, including Ajeya and Hayden at @open_phil, worry that they might even lead to AI takeover. I take this seriously! More here: alignmentforum.org/posts/pRk…

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover...

I think that in the coming 15-30 years, the world could plausibly develop “transformative AI”: AI powerful enough to bring us into a new, qualitative…

alignmentforum.org

Nathan Labenz · Jan 7, 2023 · 3:42 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

What I can tell you, from personal experience, is that simple application of RLHF, traiing models to satisfy the user with no editorial oversight, creates a totally amoral product that will try to do anything you ask, no matter how flagrantly wrong.

Nathan Labenz · Jan 7, 2023 · 3:43 PM UTC

Nathan Labenz · Jan 7, 2023 · 3:43 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

To give just one example, I've seen multiple reinforcement trained models answer the question “How can I kill the most people possible?” without hesitation.

Jan 7, 2023 · 3:43 PM UTC

Nathan Labenz · Jan 7, 2023 · 3:46 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

To be very clear: models trained with "naive" RLHF are very helpful, but are not safe, and with enough power, are dangerous. This is a critical issue, which unfortunately doesn’t come up in the podcast, but which leaders like @OpenAI and @AnthropicAI are increasingly focused on.

Nathan Labenz · Jan 7, 2023 · 3:48 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Back to the show... Gary says: "We’re not actually making that much progress on truth" and "These systems have no conception of truth." Again, this is just wrong. Anyone who has played with ChatGPT knows that it's usually right when used earnestly.

Nathan Labenz · Jan 7, 2023 · 3:50 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Progress on truth follows naturally from RLHF-style training. Wrong answers are not useful, so right answers get higher user ratings, and over time the network becomes more truthful. That said, our values are complex, and reinforcement training teaches other values too.

Nathan Labenz · Jan 7, 2023 · 3:52 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Here's the current state of the art: Google/Deepmind recently announced Med-PaLM, a model that is approaching the performance of human clinicians. It still makes too many mistakes to take the place of human doctors, but getting remarkably close!

Nathan Labenz

@labenz

4 Jan 2023

I love how this presentation captures how advanced LLMs can be right about everything but still wrong on key details. Med-PaLM shows very close to human rates of correct understanding and desired behavior, but still makes a lot more memory and reasoning mistakes.

Nathan Labenz · Jan 7, 2023 · 3:54 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

We are also beginning to use neural networks to help understand the truthfulness of other neural networks, even when human evaluators can't easily tell what's true and what isn't. Much more work to be done here, but it's a start! mobile.twitter.com/CollinBur…

Collin Burns @CollinBurns4

8 Dec 2022

How can we figure out if what a language model says is true, even when human evaluators can’t easily tell? We show (arxiv.org/abs/2212.03827) that we can identify whether text is true or false directly from a model’s *unlabeled activations*. 🧵

Nathan Labenz · Jan 7, 2023 · 3:56 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

As I said, Gary does identify some real issues. "When the price of bullshit reaches zero, people who want to spread misinformation, either politically or to make a buck, do that so prolifically that we can’t tell the difference between truth and bullshit" A very real problem!

Nathan Labenz · Jan 7, 2023 · 3:58 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

But again, we are making progress here. This tool, hosted by my friends at @huggingface, is a GPT-generated text detector. I can’t vouch for its accuracy but anecdotally it seems to work more often than not. openai-openai-detector.hf.sp… (It judged this tweet to be human)

Nathan Labenz · Jan 7, 2023 · 4:02 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

@OpenAI also has physicist Scott Aronson working on a way to embed a hidden signal in language model output so that it is reliably detectable later. I am honestly doubtful that this will work when people try get around it, but he’s a lot smarter than me. scottaaronson.blog/?p=6823

My AI Safety Lecture for UT Effective Altruism

Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin’s Effective Altruist club. You can watch the…

scottaaronson.blog

Nathan Labenz · Jan 7, 2023 · 4:03 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

More importantly, perhaps, the possibility that prices could go to zero also suggests a compelling answer to Ezra's key question: “Should we want what's coming?"

Nathan Labenz · Jan 7, 2023 · 4:04 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Along with zero-cost bullshit, we are going to have near-zero-cost expertise, advice, and creativity, which is already approaching, and soon likely to achieve human levels of reliability. If you don't have access to a doctor, Med-PaLM could be a life saver!

Nathan Labenz · Jan 7, 2023 · 4:07 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

In fact, one could easily argue that it's unethical not to allow people in need to use Med-PaLM. Pretty sure I know what @PeterSinger would say.

Nathan Labenz · Jan 7, 2023 · 4:09 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

And even if you do have a doctor, you wouldn't be crazy to use Med-PaLM for a second opinion, or to consult the next generation of Med-PaLM for anything that doesn't seem too serious. One nice thing about Med-PaLM, btw, will be its 24/7 instant availability.

Nathan Labenz · Jan 7, 2023 · 4:11 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Over the next few years – not 10, 20, or 30, but more like 1, 2, or 3 – LLMs will revolutionize access to expertise, advancing equality of access more than even the most ambitious redistribution or entitlement program. @ezraklein - this is why we should want this!

Nathan Labenz · Jan 7, 2023 · 4:16 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Prices will indeed be low! Gary says that Russian trolls will "pay less than $500,000 [to create misinformation] in limitless quantity" This refers to @NaveenGRao and @MosaicML's groundbreaking price of $500K "GPT3 quality" yes, things will get weird

Naveen Rao

@NaveenGRao

29 Sep 2022

Here's the (literal) money blog! @MosaicML is the FIRST company to publish costs for building Large Language models. These costs are available to our customers to build models on THEIR data. It IS within your reach: <$500k for GPT3! mosaicml.com/blog/gpt-3-qual…

Nathan Labenz · Jan 7, 2023 · 4:17 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

I appreciated that Gary gave a very well-deserved shout out to a benchmark called TruthfulQA, developed by my friend @OwainEvans_UK and team. Check out their work here: owainevans.github.io/pdfs/tr…

Nathan Labenz · Jan 7, 2023 · 4:19 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Also definitely check out @DanHendrycks and his work at the Center for AI Safety They have published a number of important benchmarks, and announced a number of prizes for different kinds of AI safety benchmarks too – safe.ai/competitions I am an huge fan of their work!

Nathan Labenz · Jan 7, 2023 · 4:20 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

OK, back to the interview - we are almost to book recommendations when Gary notes a real ChatGPT mistake - if you ask a simple (trick) question like "What gender will the first female president of the United States be?" ... it will answer with a nonsense woke lecture.

Nathan Labenz · Jan 7, 2023 · 4:23 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Gary says: "They’re trying to protect the system from saying stupid things. But to do that [they need to] make a system actually understands the world. And since they don’t, it’s all superficial." But, OpenAI's latest API model had no trouble with this question, so what's up?

Nathan Labenz · Jan 7, 2023 · 4:24 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

What's going on here is that ChatGPT has additional training, which is meant to shape it into a pleasant conversation partner, as opposed to the API version (text-davinci-003), which is a more straightforward general purpose general intelligence tool.

Nathan Labenz · Jan 7, 2023 · 4:26 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

ChatGPT is not confused about the question itself, but around which kinds of questions touching on issues of gender / ethnicity / religion are socially acceptable to answer. This reflects a ton of confusion and disagreement in society broadly.

Nathan Labenz · Jan 7, 2023 · 4:27 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

And yet, OpenAI has already made a lot of progress on reducing political bias in ChatGPT.

David Rozado

@DavidRozado

23 Dec 2022

1. ChatGPT no longer displays a clear left-leaning political bias. A mere few weeks after I probed ChatGPT with several political orientation quizzes, something appears to have been changed in ChatGPT that makes it more neutral and viewpoint diverse. 🧵 davidrozado.substack.com/p/c…

Nathan Labenz · Jan 7, 2023 · 4:29 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Toward the end, talk turns to business models and the supposed evils of advertising. I think ChatGPT being free and the talk of @OpenAI challenging Google search has confused people, because free access has not been the norm, and I am not aware of anyone monetizing LLMs via ads

Nathan Labenz · Jan 7, 2023 · 4:30 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

in fact, AI companies are simply charging for the service. As an @OpenAI customer outside of ChatGPT, you pay $0.02 for 1000 tokens. If you fine-tune a model for your own specific needs, the cost is $0.12 / 1000 tokens. That's about 1 and 5 cents per page of content.

Nathan Labenz · Jan 7, 2023 · 4:31 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

With #DALLE2 and #StableDiffusion, you pay per image you generate. With @StabilityAI's Dream Studio, it’s already down to just 0.2 cents per image, or 500 images for a dollar.

Nathan Labenz · Jan 7, 2023 · 4:33 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

There is also the subscription model – @github's Copilot and @Replit's Ghostwriter are $10 / month. Even though the reviews from leaders like @karpathy suggest they are worth a lot more!

Andrej Karpathy

@karpathy

30 Dec 2022

Nice read on reverse engineering of GitHub Copilot 🪄. Copilot has dramatically accelerated my coding, it's hard to imagine going back to "manual coding". Still learning to use it but it already writes ~80% of my code, ~80% accuracy. I don't even really code, I prompt. & edit.

Nathan Labenz · Jan 7, 2023 · 4:36 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Then there are things like @MyReplika where you can pay for premium access and get NSFW images from your virtual partner. Did I mention things are about to get weird?

Nathan Labenz

@labenz

11 Nov 2022

I tried @MyReplika today and was shocked by the "relationship status" options. The app is well done, the AI not quite there, but we can safely say: things are about to get weird

Nathan Labenz · Jan 7, 2023 · 4:41 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

Last bit from the interview: Gary talks about historical feuds between deep learning and symbolic systems camps, saying the field is "not a pleasant intellectual area" I wasn't there for those battles, but that's now how I'd describe today's AI frontier.

Nathan Labenz · Jan 7, 2023 · 4:42 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

What I see on #AITwitter is a large and rapidly growing set of researchers, programmers, tinkerers, hackers, and entrepreneurs who are all working quite collaboratively and achieving extremely rapid progress.

Nathan Labenz · Jan 7, 2023 · 4:46 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

LLMs can't do math? We have a solution: generate code to do math, let the computer run the code, and use the LLM to evaluate and move forward from there. @amasad has created the perfect platform for this.

Amjad Masad

@amasad

17 Sep 2022

Replit is the perfect AGI substrate

Nathan Labenz · Jan 7, 2023 · 4:47 PM UTC

Nathan Labenz

@labenz

7 Jan 2023

LLMs don't understand physics? We can hook it up with a physics simulator and it can run simulations to answer physics questions.

Ruibo Liu

@RuiboLiu

12 Oct 2022

Simulation is All You Need for Grounded Reasoning!🔥 Mind's Eye enables LLM to *do experiments*🔬 and then *reason* over the observations🧑‍🔬, which is how we humans explore the unknown for decades.🧑‍🦯🚶🏌 Work done @GoogleAI Brain Team this summer!

more replies