I saw a somewhat astonishing thing today. GPT was asked a question that it needed to write code to answer, and given access to a Python REPL. It wrote buggy code, then based on the error message it fixed its own code until it worked (and gave the correct answer). It debugged.

Feb 22, 2023 · 5:46 AM UTC

The question posed was "What is the 10th fibonacci number?" The GPT-based agent's first attempt was "fibonacci(10)"—"Action Input" is what the LLM is sending to the Python REPL. This resulted in an error because it's not a standard function. The LLM figured that out.
It then defined a fibonacci function, called it, and got the right answer. All fully automated, with no human intervention (using @LangChainAI). Technically simple but kinda mind-blowing: It debugged its own code until it worked.
Here's the 40 lines of code that will let you demonstrate GPT automatically debugging its own code. The prereqs are pip install openai langchain & get an @OpenAI API key. gist.github.com/wiseman/4a70…
In this run I asked it to calculate the 100th Fibonacci number and it realizes its naive approach would take too long and it needs a more efficient algorithm. (I modified the Python interface to raise an exception if code takes longer than 1 second to run.)
I’m imagining programming language error messages becoming optimized for giving feedback to LLM’s and thinking it would probably provide a significant benefit to humans as well.
6 million views on my post about GPT automatically debugging its own code (which it did), but only @voooooogel mentioned that GPT didn't actually use the result of the code to figure out the answer.
Replying to @lemonodor
Couldn't it just have googled the answer?
I don’t give it access to google. :)
Replying to @lemonodor
I’ve seen it do that, correcting a join that it incorrectly wrote for sql