“Earnings Call: Tesla Discusses Q1 2024 Challenges and AI Expansion”, 2024-04-24 ():
Colin Rusch: …Thanks so much, guys. Given the pursuit of Tesla, really as a leader in AI for the physical world, in your comments around distributed inference, can you talk about what that approach is unlocking beyond what’s happening in the vehicle right now?
Elon Musk: Do you want to say something?
Ashok Elluswamy: Yes. Like Elon mentioned, like the car, even when it’s a full robotaxi, it’s probably going to be used 150 hours a week [out of 7×24=168].
E. Musk: That’s my guess, like a third of the hours of the week [~56 hours].
A. Elluswamy: Yes. It could be more or less, but then there’s certainly going to be some hours left for charging and cleaning and maintenance in that world, you can do a lot of other workloads—even right now we are seeing, for example, these LLM companies have these batch workloads where they send a bunch of documents and those run through pretty large neural networks and take a lot of compute to chunk through those workloads. And now that we have already paid for this compute in these cars, it might be wise to use them and not let them be idle, would be like buying a lot of expensive machinery and leaving to them idle. Like we don’t want that, we want to use the computer as much as possible and close to like basically 100% of the time to make it a use of it.
- E. M.: That’s right. I think it’s analogous to Amazon Web Services, where people didn’t expect that AWS would be the most valuable part of Amazon when it started out as a bookstore. So that was on nobody’s radar. But they found that they had excess compute because the compute needs would spike to extreme levels for brief periods of the year and then they had idle compute for the rest of the year. So then what should they do to pull that excess compute for the rest of the year? That’s kind of…
A. E.: Monetize it
M: Yes, monetize it. So, it seems like kind of a no-brainer to say, okay, if we’ve got millions and then tens of millions of vehicles out there where the computers are idle most of the time that we might well have them do something useful.
E: Exactly.
M: And then, I mean, if you get like to the 100 million vehicle level, which I think we will, at some point, get to, then—and you’ve got a kilowatt of usable compute and maybe you’re on Tesla hardware #6 or #7 by that time. Then you really—I think you could have on the order of 100 gigawatts of useful compute, which might be more than anyone more than any company, probably more than a company.
E: Yes, probably because it takes a lot of intelligence to drive the car anyway. And when it’s not driving the car, you just put this intelligence to other uses, solving scientific problems or answer in terms of someone else.
M: It’s like a human, ideally. We’ve already learned about deploying workloads to these nodes…
E: Yes. And unlike laptops and our cell phones, it is totally under Tesla’s control. So it’s easier to distribute the workload across different nodes as opposed to asking users for permission on their own cell phones to be very tedious.
M: Well, you’re just draining the battery on the phone.
E: Yes, exactly. The battery is also…
M: So like technically, I suppose like Apple would have the most amount of distributed compute, but you can’t use it because you can’t get the—you can’t just run the phone at full power and drain the battery.
E: Yes.
M: So, whereas for the car, even if you’re a kilowatt level inference computer, which is crazy power compared to a phone. If you’ve got 50–60 kilowatt hour pack, it’s still not a big deal to run if you are plugged it—whether you plugged it or not—you could be plugged in or not like you could run for 10 hours and use 10-kilowatt hours of your kilowatt of compute power.
Lars Moravy: Yes. We got built in, like, liquid cold thermal management.
M: Yes, exactly.
L. Moravy: Exactly for data centers, it’s already there in the car.
M: Exactly. Yes. Its distributed power generation—distributed access to power and distributed cooling, that was already paid for.
E: Yes. I mean that distributed power and cooling, people underestimate that costs a lot of money.
Vaibhav Taneja: Yes. And the capex is shared by the entire world sort of everyone wants a small chunk, and they get a small profit out of it, maybe.
Martin Viecha: …Ashok, do you want to chime in on the AI process and safety?
E: Yes, we have multiple tiers of validating the safety in any given week.
We train hundreds of neural networks that can produce different trajectories for how to drive the car, we replay them through the millions of clips that we have already collected from our users and our own QA. Those are like critical events, like someone jumping out in front or like other critical events that we have gathered database over many, many years, and we replay through all of them to make sure that we are net improving safety. And on top of it, we have simulation systems that also try to recreate this and test this in closed loop fashion. And some of this is validated, we give it to our own QA drivers. We have hundreds of them in different cities, in San Francisco, Los Angeles, Austin, New York, a lot of different locations. They are also driving this and collecting real-world miles, and we have an estimate of what are the critical events, are they a net improvement compared to the previous week’s builds. And once we have confidence that the build is a net improvement, then we start shipping to early users, like 2,000 employees initially that they would like it to build, they will give feedback on like if it’s an improvement there or they’re noting some new issues that we did not capture in our own QA process.
And only after all of this is validated, then we go to external customers. And even when we go external, we have like live dashboards of monitoring every critical event that’s happening in the fleet sorted by the criticality of it. So we are having a constant pulse on the build quality and the safety improvement along the way. And then any failures like Elon alluded to, we get the data back, add it to the training and that improves the model in the next cycle.
So we have this like constant feedback loop of issues, fixes, evaluations and then rinse and repeat. And especially with the new V12 architecture, all of this is automatically improving without requiring much engineering interventions in the sense that engineers don’t have to be creative in like how they code the algorithms. It’s mostly learning on its own based on data. So you see that, okay, every failure or like this is how a person shows, this is how you drive this intersection or something like that, they get the data back. We add it to the neural network, and it learns from that trained data automatically instead of some engineers saying that, oh, here, you must rotate the steering wheel by this much or something like that.
There’s no hard inference conditions, it’s everything is neural network, it’s very soft, it’s probabilistic. So it will adapt its probability distribution based on the new data that it’s getting.
M: Yes. We do have some insight into how good the things will be in like, let’s say, 3–4 months because we have advanced models that are far more capable than what is in the car, but have some issues with them that we need to fix. So they are like there’ll be a step change improvement in the capabilities of the car, but it will have some quirks that are—that need to be addressed in order to release it.
As Ashok was saying, we have to be very careful in what we release the fleet or to customers in general. So like—if we look at say FSD 12.4 and 12.5, which are really could arguably even be Version 13, Version 14 because it’s pretty close to a total retrain of the neural nets in each case are substantially different.
So we have good insight into where the model is, how well the car will perform, in, say, 3–4 months.
E: Yes. In terms of scaling laws, people in the AI community generally talk about model scaling laws where they increase the model size a lot and then their corresponding gains in performance, but we have also figured out scaling laws and other access in addition to the model side scaling, making also data scaling. You can increase the amount of data you use to train the neural network and that also gives similar gains, and you can also scale up by training compute, you can train it for much longer or make more GPUs or more Dojo nodes and that also gives better performance, and you can also have architecture scaling, where you come up with better architectures that for the same amount of compute for produce better results.
So a combination of model size scaling, data scaling, training compute scaling and the architecture scaling, we can basically extract like, okay, with the continued scaling based on this—at this ratio, we can sort of predict future performance.
Obviously, it takes time to do the experiments because it takes a few weeks to train, it takes a few weeks to collect tens of millions of video clips and process all of them, but you can estimate what’s going to be the future progress based on the trends that we have seen in the past, and they’re generally held true based on past data.