"What could make AI conscious? with Wojciech Zaremba, co-founder of OpenAI" (on abandoning robotics for self-supervised learning & what robotics needs)

gwern · 2021-06-04T23:26:06+00:00

Lukas: Why did you choose to work on robotics?

Wojciech: Actually, here is a reveal. I was actually working for several years on robotics and as of recently we changed the focus at OpenAI, and I'm actually, I disbanded the robotics team.

Lukas: Oh, wow.

Wojciech: Yeah.

Lukas: Why did you do that?

Wojciech: Okay. So, the reasoning is...there is a few pieces. It turns out that we can make or check on the progress whenever we have access to data. And I kept all our machinery unsupervised, reinforcement learning, they work extremely well. There is actually plenty of domains that are very, very rich with data. And ultimately that was holding us back in case of robotics.

This decision was quite hard for me. I got the realization sometime ago that actually that's the best from perspective of the company. The sad thing is, I think if we would be a robotics company, or if the mission of the company would be different then I think we would just continue. I actually quite strongly believe in the approach that robotics took and the direction.

But from perspective of what we want to achieve, which is to build AGI, I think there was actually some components missing. So when we created robotics, we thought that we can go very far with self-generated data and reinforcement learning. At the moment, I believe that actually pre-training allows to give model 100X cheaper IQ points, and then that might be followed with other techniques.

Lukas: And what is pre-training?

Wojciech: Pre-training, that's like, I can explain it in case of GPT-3. So pre-training in case of GPT-3, or in case of language models, means training them on some unsupervised task, such as next word prediction. And that builds in all the internal representation that allows the model to off the bat solve many tasks. And in case of robotics we haven't had such a data.

Lukas: I see. So do you regret working on robotics?

Wojciech: No. I think that actually we've got plenty of insights for other projects. I think that also we built a really amazing technology. I would say I'm actually very proud. There was like of course moments of sadness when I was making this decision, but I'm quite happy where we've got. Also even from my own perspective, in the meanwhile I manage also other teams. That made some significant progress in the new world and more information, there will be more information about it sometime.

Lukas: Cool. I guess one thing that I always observe is when you look at what computers do versus what seems easy, robotics seems the most striking. I feel like the simplest things of picking up an arbitrary object, it seems like the most natural thing for my brain. It seems so hard, maybe harder than anything else that feels natural, to make a robot do it. What do you think about that? Do you think that there's more progress in the short term or will it be the last thing that we solve on the path to AGI?

Wojciech: So there are two possibilities for me, like a few possibilities. So one is if someone would be able to actually in a natural way to collect a lot of data, I think that might be the capabilities. Another possibility is that we just need very powerful video models, the same way as at the moment we have very powerful text models. We need very powerful video models to take it off the ground.

The trickiness at the moment with video models is that they just require way more compute than text models. So in case of text, already individual word conveys a lot, a lot of information and it just takes few bits to represent it. In case of video, if we would like to process images of a size few hundred by few hundred several frames at a time, that requires orders of magnitude more compute.

I believe that if we would have models that have a really powerful understanding of video, it would be way easier to train them toward manipulation. There is also one more technical issue here. It's like, these small models most likely, they would have to be very huge and then the difficulties in running them real time.

So at the moment I see a few issues with robotics simultaneously, and this idea to be able to go after domains when the number of issues is like, let's say one or two is very favorable. It's also when we started...okay, in some sense, we started all sorts of projects at the beginning of OpenAI and we haven't had the clarity how and exactly what we want to build. And over the time, we got way more clarity and the amount we can increase the focus in different directions.

Lukas: So that's the other question that I've always had, how does OpenAI think about the projects you pick? I feel like, maybe critics would say that OpenAI has sort of been too good at picking projects that are very evocative. Like you guys put out these GPT-3 and the music stuff that you did, like at least to me it just seems so cool. But I think maybe some people feel frustrated that it's like, it feels almost targeted towards like a media event or something. Is that something that you think about at OpenAI or I guess, how does OpenAI pick what to work on next?

Wojciech: We have some internal beliefs, what has to be built for general purpose intelligence. And people mostly choose projects on their own. There is also, let's say, there is some level of freedom to go after crazy high-payoff ideas. I don't think ever that people are like saying, "Let's go after this one because it's high PR payoff." It's more that we have amazing people in conveying our work to public. And maybe if we would release a GPT-3 or Jukebox as TXT file, then people wouldn't say that it was for, that they wouldn't say such things.

Lukas: If you just did a bad job with the PR, the people would give you more benefit of the doubt. But I don't know, I feel like you chose to win Dota which...weren't other people thinking about this and it seemed like it was a very clear milestone I guess, as opposed to putting out a paper on reinforcement learning at massive scale or something like that.

Wojciech: Yeah. So there's also actually element of internal motivation with these significant goals. I actually, I think that Elon suggested us to go after Dota. Motivation was, "Let's pick very complicated game," such that if we would make a progress, it would be undeniable. So there is a lot of toy tasks out there. Like for instance, people work on a humanoid walking in MuJoCo and this one is clearly I'd say disconnected from reality. Because people can make it walk in a simulation for multiple years already, but none of it works in reality. And then here in case of Dota, we wanted to ensure that actually what we are after, it's meaningful. So, how to ensure that it's meaningful? Some people are really devoting their life to actually play Dota, who are strategizing about it to play against us.

Lukas: How much of the work then on Dota was, you felt, like fundamentally moving ML forward and how much of it was Dota-specific or can you even pull those apart?

Wojciech: I think there was a decent amount of Dota-specific work. And then I think it was more than optimal, but also simultaneously hard. So I remember at the beginning of Dota project, it was actually unclear how to approach it. People are saying that contemporary reinforcement learning will have no chance in solving this problem. And people looked into off-policy matters, on-policy matters, evolutionary strategies.

The thing that became quite surprising is that methods that already exist, with appropriate scale work extremely well. So that was a big surprise. And I remember some people even before Dota time at OpenAI, saying that maybe reinforcement learning is a dead end. And all of a sudden it's a very different story now.

(I noticed only after downloading the YT subs & editing them that they have a transcript up. Oh well.)

gwern · 2021-07-16T20:13:59+00:00

https://venturebeat.com/2021/07/16/openai-disbands-its-robotics-research-team/

In a statement, an OpenAI spokesperson told VentureBeat: “After advancing the state of the art in reinforcement learning through our Rubik’s Cube project and other initiatives, last October we decided not to pursue further robotics research and instead refocus the team on other projects. Because of the rapid progress in AI and its capabilities, we’ve found that other approaches, such as Reinforcement Learning with Human Feedback, lead to faster progress in our reinforcement learning research.”

...“At the moment, I believe that pretraining [gives] model[s] 100 times cheaper ‘IQ points,'” Zaremba said. “That might be followed with other techniques.”

reinforcementlearning

MODERATORS

reinforcementlearning

MODERATORS

Welcome to Reddit.

Want to add to the discussion?