“Gold-Medalist Coders Build an AI That Can Do Their Job for Them: A New Startup Called Cognition AI Can Turn a User’s Prompt into a Website or Video Game”, 2024-03-12 (; backlinks):
…Take the case of Cognition AI Inc. You almost certainly have not heard of this startup, in part because it’s been trying to keep itself secret and in part because it didn’t even officially exist as a corporation until two months ago [founded January 2024?]…raised $21 million from Peter Thiel’s venture capital firm Founders Fund and other brand-name investors, including former Twitter executive Elad Gil. They’re betting on Cognition AI’s team and its main invention, which is called Devin…Thiel has tried from the outset to position Cognition AI as a budding AI superpower. His VC firm hasn’t invested in many AI companies, he says in a statement, but he sees Cognition AI as being in the same league as the heavies Founders Fund has backed, which include DeepMind (now part of Google), OpenAI and Scale.
[blog announcement; 13% on SWE-bench; demos: Upwork, finetuning LLaMA, ControlNet, competitive programming, Game of Life, repo editing, Python algebra, LLM chess+Chrome extension, map calculator, checking out a repo, asking questions on Slack]
Devin is a software development assistant in the vein of Copilot, which was built by GitHub, Microsoft and OpenAI, but, like, a next-level software development assistant. Instead of just offering coding suggestions and autocompleting some tasks, Devin can take on and finish an entire software project on its own. To put it to work, you give it a job—“Create a website that maps all the Italian restaurants in Sydney”, say—and the software performs a search to find the restaurants, gets their addresses and contact information, then builds and publishes a site displaying the information. As it works, Devin shows all the tasks it’s performing and finds and fixes bugs on its own as it tests the code being written.
…Wu, 27, is the brother of Neal Wu, who also works at Cognition AI. These two men are world-renowned for their coding prowess: The Wu brothers have been competing in, and often winning, international coding competitions since they were teenagers, and they have helped elevate the US national coding team to a more respectable position against its Chinese and Eastern European rivals in recent years.
Sport-coding—yes, it’s a real thing—requires people to solve puzzles and program with speed and accuracy. [Competitive programming is notorious for requiring extensive knowledge of dynamic programming and tree algorithms—all building blocks of MCTS.] Along the way, it trains contestants to approach problems in novel ways. Cognition AI is full of sport-coders. Its staff has won a total of 10 gold medals at the top international competition, and Scott Wu says this background gives his startup an edge in the AI wars. “Teaching AI to be a programmer is actually a very deep algorithmic problem that requires the system to make complex decisions and look a few steps into the future to decide what route it should pick”, he says. “It’s almost like this game that we’ve all been playing in our minds for years, and now there’s this chance to code it into an AI system.”
One of the big claims Cognition AI is making with Devin is that the company has hit on a breakthrough in a computer’s ability to reason. Reasoning in AI-speak means that a system can go beyond predicting the next word in a sentence or the next snippet in a line of code, toward something more akin to thinking and rationalizing its way around problems. The argument in AI-land is that reasoning is the next big thing that will advance the industry, and lots of startups are making various boasts about their ability to do this type of work.
…Most current AI systems have trouble staying coherent and on task during these types of long jobs, but Devin keeps going through hundreds and even thousands of tasks without going off track. In my tests with the software, Devin could build a website from scratch in 5–10 minutes, and it managed to re-create a web-based version of Pong in about the same amount of time. I had to prompt it a couple of times to improve the physics of the ball movement in the game and to make some cosmetic changes on its websites, all of which Devin accomplished just fine and with a polite attitude.
Silas Alberti, a computer scientist and co-founder of another stealth AI startup (of course), has tried Devin and says the technology is a leap forward. It’s less like an assistant helping with code and more like a real worker doing its own thing, he says. “This feels very different because it’s an autonomous system that can do something for you”, Alberti says. Devin excels at prototyping projects, fixing bugs and displaying complex data in graphical forms, according to Alberti. “Most of the other assistants derail after 4–5 steps, but this maintains its state almost effortlessly through the whole job”, he says.
Exactly how Cognition AI made this breakthrough, and in so short a time, is something of a mystery, at least to outsiders. Wu declines to say much about the technology’s underpinnings other than that his team found unique ways to combine large language models (LLMs) such as OpenAI’s GPT-4 with reinforcement learning techniques. “It’s obviously something that people in this space have thought about for a long time”, he says. “It’s very dependent on the models and the approach and getting things to align just right.”
[Tree search is an approach which is algorithmically tricky & highly dependent on getting things just right, would require a lot of time & LLM calls, would be familiar & obvious to competitive programmers deeply steeped in DP & algorithmic search. The bugs in the Devin demos also look like classic failures of tree search like the ‘horizon problem’, where it takes disastrous actions that however only fail further than the planning process can see: for example, in the Game of Life coding demo, the first version has a serious bug after 180 frames, which is past the planning horizon. But if they have solved LLM MCTS, why isn’t it working even better?]