“OpenAI’s Colin Jarvis Predicts “Exponential” Advancements in Large Language Model Capabilities during AI Summit London Keynote”, 2024-06-12 ():
OpenAI’s chief architect, Colin Jarvis, predicted substantial advancements in large language models during his keynote address at AI Summit London on Wednesday.
Jarvis highlighted 4 key areas where he expects major progress: Smarter and cheaper models, increased model customization, more multimodality like audio and video and market-leading chatbots performing at similarly high levels.
“Don’t build for what’s available today because things are changing so fast”, Jarvis told attendees, saying the speed of advancement means current capabilities will be outmoded by the time new applications ship.
He urged companies to differentiate by using language AI APIs and creating unique user experiences, data approaches and model customizations. Jarvis said the key differentiator for businesses building language model-powered services is leveraging your own proprietary data.
“The user experience you create, the data you bring to the model and how you customize it and the like service that you expose to the model, that is actually where you folks are going to differentiate and build something like genuinely unique”, Jarvis said. “If you just build a wrapper around one of these very useful models, then you’re no different than your competitors.”
Jarvis said that use cases and user experiences previously cast aside by businesses due to cost or complexity can now be put into action due to reduced operating costs and smarter models.
For example, he highlighted OpenAI’s model embedding costs, describing them as “basically free”—adding that use cases previously out of bounds because of cost or latency can now be put into deployment.
“With GPT-4o coming out with that’s twice as fast as GPT-4, we saw a lot of use cases that were painfully slow for users actually just drop under that threshold where you’re happy to ship at that stage”, he said.
“What we’ve seen in the last year confirms that firstly models get smarter, then they get cheaper and faster. We’ve got smarter models, but then we can also serve them for cheap work.”
…“The thing that will be interesting to see over the next year is whether somebody manages to make another GPT-3 to GPT-4 jump in terms of the capabilities of these models, would expect to see this to continue, with more providers and a more fragmented, diverse market”, he said.
…Jarvis said models like GPT-4o let businesses run inputs through a single API call, rather than separate calls for each modality—thereby reducing costs to run the model.
“This is making stuff a lot faster”, he said. “This is where a whole new raft of user experiences that depend on low latency interaction with modalities changing then become accessible with this change.” OpenAI demoed interactive multimodal chatbots at its spring event and the company’s chief architect said they’re the next change in the meta for language models—more modalities under one language model. “We are eventually going to see a model where I can talk into it, and then it produces a video versus what I talked and actually, the modalities stop being a barrier, I just accept that I can interact with this API in the way I want”, Jarvis said.