Dario (2:57): “The whole reason for scaling these models up was that [...] the models weren’t smart enough to do RLHF on top of. [...]”
Chris: “I think there was also an element of, like, the scaling work was done as part of the safety team that Dario started at OpenAI because we thought that forecasting AI trends was important to be able to have us taken seriously and take safety seriously as a problem.”
Dario: “Correct.”
That LLMs were scaled up partially in order to do RLHF on top of them is something I had previously heard from an OpenAI employee, but I wasn’t sure it’s true. This conversation seems to confirm it.
we thought that forecasting AI trends was important to be able to have us taken seriously
This might be the most dramatic example ever of forecasting affecting the outcome.
Similarly I’m concerned that a lot of alignment people are putting work into evals and benchmarks which may be having some accelerating affect on the AI capabilities which they are trying to understand.
There are a few sentences in Anthropic’s “conversation with our cofounders” regarding RLHF that I found quite striking:
That LLMs were scaled up partially in order to do RLHF on top of them is something I had previously heard from an OpenAI employee, but I wasn’t sure it’s true. This conversation seems to confirm it.
This might be the most dramatic example ever of forecasting affecting the outcome.
Similarly I’m concerned that a lot of alignment people are putting work into evals and benchmarks which may be having some accelerating affect on the AI capabilities which they are trying to understand.
“That which is measured improves. That which is measured and reported improves exponentially.”