Returning to the interview… Gary mischaracterizes how leaders in LLM development are thinking about and making progress. He says: "it’s mysticism to think that “scale is all you need” – the idea was we just keep making more of the same, and it gets better and better."
This paradigm did in fact take us an amazingly long way, but for at least 2 years now, OpenAI and other leading AI labs have moved beyond a purely scale-driven approach. This is not GPT4 speculation; they’ve published quite a bit about it!
From @OpenAI's Reinforcement Learning / Instruct paper: "In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters." Small models, huge implications
He also mischaracterizes how modern LLMs are trained: "They’re just looking, basically, at autocomplete. They’re just trying to autocomplete our sentences." Again, this is years old, and in a field moving at light speed, that puts you light years behind.
GPT3, in 2020, was all about autocomplete – that's why you had to provide examples or other highly suggestive prompts to get quality output. Today, for most use cases, you can just tell the AI what you want, ask questions, etc. It almost always understands what you want.
That's because modern LLMs are trained using Reinforcement Learning from Human Feedback and other instruction-tuning variations designed to teach models to follow instructions and satisfy the user's desires.
Critically, these techniques require just a tiny fraction of the data and compute that general web-scale pre-training requires. Here's OpenAI's head of alignment research Jan Leike on that point:
Replying to @janleike
For comparison: We spent <2% of the pretraining compute on fine-tuning and collect a few 10,000s of human labels and demos. Our 1.3b parameter models (GPT-2 sized!) are preferred over a prompted 175b parameter GPT-3.
The catch is that the data quality has to be really good, and as you’d expect, that’s a challenge, especially when you start moving up-market into domains like medicine and law. But that's the problem the leaders in the field are currently solving, again with fast progress!
The latest trend - and if you’re new, yes this is real - is to use AIs to help train themselves using schemes where AIs try a task, critique and try to improve their own work, and ultimately learn to be better. This really works, at least to some degree:
We’ve trained language models to be better at responding to adversarial questions, without becoming obtuse and saying very little. We do this by conditioning them with a simple set of behavioral principles via a technique called Constitutional AI: anthropic.com/constitutional…
On a very serious note, RLHF & related techniques are not without problems – on the contrary, some of the smartest people I know, including Ajeya and Hayden at @open_phil, worry that they might even lead to AI takeover. I take this seriously! More here: alignmentforum.org/posts/pRk…
What I can tell you, from personal experience, is that simple application of RLHF, traiing models to satisfy the user with no editorial oversight, creates a totally amoral product that will try to do anything you ask, no matter how flagrantly wrong.
To give just one example, I've seen multiple reinforcement trained models answer the question “How can I kill the most people possible?” without hesitation.

Jan 7, 2023 · 3:43 PM UTC

To be very clear: models trained with "naive" RLHF are very helpful, but are not safe, and with enough power, are dangerous. This is a critical issue, which unfortunately doesn’t come up in the podcast, but which leaders like @OpenAI and @AnthropicAI are increasingly focused on.
Back to the show... Gary says: "We’re not actually making that much progress on truth" and "These systems have no conception of truth." Again, this is just wrong. Anyone who has played with ChatGPT knows that it's usually right when used earnestly.
Progress on truth follows naturally from RLHF-style training. Wrong answers are not useful, so right answers get higher user ratings, and over time the network becomes more truthful. That said, our values are complex, and reinforcement training teaches other values too.
Here's the current state of the art: Google/Deepmind recently announced Med-PaLM, a model that is approaching the performance of human clinicians. It still makes too many mistakes to take the place of human doctors, but getting remarkably close!
I love how this presentation captures how advanced LLMs can be right about everything but still wrong on key details. Med-PaLM shows very close to human rates of correct understanding and desired behavior, but still makes a lot more memory and reasoning mistakes.
We are also beginning to use neural networks to help understand the truthfulness of other neural networks, even when human evaluators can't easily tell what's true and what isn't. Much more work to be done here, but it's a start! mobile.twitter.com/CollinBur…
How can we figure out if what a language model says is true, even when human evaluators can’t easily tell? We show (arxiv.org/abs/2212.03827) that we can identify whether text is true or false directly from a model’s *unlabeled activations*. 🧵
As I said, Gary does identify some real issues. "When the price of bullshit reaches zero, people who want to spread misinformation, either politically or to make a buck, do that so prolifically that we can’t tell the difference between truth and bullshit" A very real problem!
But again, we are making progress here. This tool, hosted by my friends at @huggingface, is a GPT-generated text detector. I can’t vouch for its accuracy but anecdotally it seems to work more often than not. openai-openai-detector.hf.sp… (It judged this tweet to be human)
@OpenAI also has physicist Scott Aronson working on a way to embed a hidden signal in language model output so that it is reliably detectable later. I am honestly doubtful that this will work when people try get around it, but he’s a lot smarter than me. scottaaronson.blog/?p=6823
More importantly, perhaps, the possibility that prices could go to zero also suggests a compelling answer to Ezra's key question: “Should we want what's coming?"
Along with zero-cost bullshit, we are going to have near-zero-cost expertise, advice, and creativity, which is already approaching, and soon likely to achieve human levels of reliability. If you don't have access to a doctor, Med-PaLM could be a life saver!
In fact, one could easily argue that it's unethical not to allow people in need to use Med-PaLM. Pretty sure I know what @PeterSinger would say.
And even if you do have a doctor, you wouldn't be crazy to use Med-PaLM for a second opinion, or to consult the next generation of Med-PaLM for anything that doesn't seem too serious. One nice thing about Med-PaLM, btw, will be its 24/7 instant availability.
Over the next few years – not 10, 20, or 30, but more like 1, 2, or 3 – LLMs will revolutionize access to expertise, advancing equality of access more than even the most ambitious redistribution or entitlement program. @ezraklein - this is why we should want this!
Prices will indeed be low! Gary says that Russian trolls will "pay less than $500,000 [to create misinformation] in limitless quantity" This refers to @NaveenGRao and @MosaicML's groundbreaking price of $500K "GPT3 quality" yes, things will get weird
Here's the (literal) money blog! @MosaicML is the FIRST company to publish costs for building Large Language models. These costs are available to our customers to build models on THEIR data. It IS within your reach: <$500k for GPT3! mosaicml.com/blog/gpt-3-qual…
I appreciated that Gary gave a very well-deserved shout out to a benchmark called TruthfulQA, developed by my friend @OwainEvans_UK and team. Check out their work here: owainevans.github.io/pdfs/tr…
Also definitely check out @DanHendrycks and his work at the Center for AI Safety They have published a number of important benchmarks, and announced a number of prizes for different kinds of AI safety benchmarks too – safe.ai/competitions I am an huge fan of their work!
OK, back to the interview - we are almost to book recommendations when Gary notes a real ChatGPT mistake - if you ask a simple (trick) question like "What gender will the first female president of the United States be?" ... it will answer with a nonsense woke lecture.
Gary says: "They’re trying to protect the system from saying stupid things. But to do that [they need to] make a system actually understands the world. And since they don’t, it’s all superficial." But, OpenAI's latest API model had no trouble with this question, so what's up?
What's going on here is that ChatGPT has additional training, which is meant to shape it into a pleasant conversation partner, as opposed to the API version (text-davinci-003), which is a more straightforward general purpose general intelligence tool.
ChatGPT is not confused about the question itself, but around which kinds of questions touching on issues of gender / ethnicity / religion are socially acceptable to answer. This reflects a ton of confusion and disagreement in society broadly.
And yet, OpenAI has already made a lot of progress on reducing political bias in ChatGPT.
1. ChatGPT no longer displays a clear left-leaning political bias. A mere few weeks after I probed ChatGPT with several political orientation quizzes, something appears to have been changed in ChatGPT that makes it more neutral and viewpoint diverse. 🧵 davidrozado.substack.com/p/c…
Toward the end, talk turns to business models and the supposed evils of advertising. I think ChatGPT being free and the talk of @OpenAI challenging Google search has confused people, because free access has not been the norm, and I am not aware of anyone monetizing LLMs via ads
in fact, AI companies are simply charging for the service. As an @OpenAI customer outside of ChatGPT, you pay $0.02 for 1000 tokens. If you fine-tune a model for your own specific needs, the cost is $0.12 / 1000 tokens. That's about 1 and 5 cents per page of content.
With #DALLE2 and #StableDiffusion, you pay per image you generate. With @StabilityAI's Dream Studio, it’s already down to just 0.2 cents per image, or 500 images for a dollar.
There is also the subscription model – @github's Copilot and @Replit's Ghostwriter are $10 / month. Even though the reviews from leaders like @karpathy suggest they are worth a lot more!
Nice read on reverse engineering of GitHub Copilot 🪄. Copilot has dramatically accelerated my coding, it's hard to imagine going back to "manual coding". Still learning to use it but it already writes ~80% of my code, ~80% accuracy. I don't even really code, I prompt. & edit.
Then there are things like @MyReplika where you can pay for premium access and get NSFW images from your virtual partner. Did I mention things are about to get weird?
I tried @MyReplika today and was shocked by the "relationship status" options. The app is well done, the AI not quite there, but we can safely say: things are about to get weird
Last bit from the interview: Gary talks about historical feuds between deep learning and symbolic systems camps, saying the field is "not a pleasant intellectual area" I wasn't there for those battles, but that's now how I'd describe today's AI frontier.
What I see on #AITwitter is a large and rapidly growing set of researchers, programmers, tinkerers, hackers, and entrepreneurs who are all working quite collaboratively and achieving extremely rapid progress.
LLMs can't do math? We have a solution: generate code to do math, let the computer run the code, and use the LLM to evaluate and move forward from there. @amasad has created the perfect platform for this.
Replit is the perfect AGI substrate
LLMs don't understand physics? We can hook it up with a physics simulator and it can run simulations to answer physics questions.
Simulation is All You Need for Grounded Reasoning!🔥 Mind's Eye enables LLM to *do experiments*🔬 and then *reason* over the observations🧑‍🔬, which is how we humans explore the unknown for decades.🧑‍🦯🚶🏌 Work done @GoogleAI Brain Team this summer!