Geoffrey Litt · Mar 14, 2023 · 9:39 PM UTC

Geoffrey Litt · Mar 14, 2023 · 9:39 PM UTC

Geoffrey Litt

Geoffrey Litt @geoffreylitt

14 Mar 2023

Update: GPT-4 now solves this Advent of Code problem perfectly on the first try, whereas ChatGPT required many back-and-forth iterations... Excited to find out where the new boundary is

Geoffrey Litt @geoffreylitt

3 Dec 2022

Had a fun time getting ChatGPT to solve today's Advent of Code puzzle I'd describe its performance in human terms as "nervous interview candidate who drank too much coffee": pretty smart, makes careless mistakes, responds well to feedback, works very fast. 1/

Mar 14, 2023 · 9:39 PM UTC

Geoffrey Litt · Mar 14, 2023 · 9:40 PM UTC

Geoffrey Litt @geoffreylitt

14 Mar 2023

Prompt and output below for this example -- confirmed that GPT-3.5 fails on first attempt w/ the exact same prompt...

Geoffrey Litt · Mar 14, 2023 · 9:49 PM UTC

Geoffrey Litt @geoffreylitt

14 Mar 2023

On Advent of Code 2022 Day 5 (adventofcode.com/2022/day/5), it didn't get it right on the first try... 😅 ...but it did succeed after 2 iterations of reading and responding to errors, with no manual hints!

Geoffrey Litt · Mar 14, 2023 · 10:03 PM UTC

Geoffrey Litt @geoffreylitt

14 Mar 2023

I previously found Day 8 was beyond GPT-3's capabilities... nitter.net/geoffreylitt/sta… GPT-4 looked like it was in danger of reverting to nonsensical revisions after 2 incorrect iterations, but somehow succeeded on its 3rd attempt

Geoffrey Litt @geoffreylitt

8 Dec 2022

Replying to @geoffreylitt

Results have been mixed: - discussing a high-level plan before jumping into code seems to help the model think better, and also lets me give feedback earlier - it's still not great at debugging, tends to rabbit-hole on incorrect hypotheses. trying to teach it to do better

Geoffrey Litt · Mar 14, 2023 · 10:18 PM UTC

Geoffrey Litt @geoffreylitt

14 Mar 2023

On Day 9 (adventofcode.com/2022/day/9) it flailed for 5 attempts and didn't get it right. Similar failure modes to GPT-3, making up nonsensical explanations about why the code was wrong

Geoffrey Litt · Mar 14, 2023 · 10:19 PM UTC

Geoffrey Litt @geoffreylitt

14 Mar 2023

Interestingly, previously for this Day 9 problem, I had found that GPT was useful for generating a runtime viz that helped me code a solution, altho it wasn't capable of solving by itself I guess Human-AI collab is still a thing for now!

Geoffrey Litt @geoffreylitt

11 Dec 2022

Replying to @_paulshen

Inspired by your thing here, I'm making a state viz for this problem before I even start solving it! (not using the console.log feature, just ticking thru the state history) Arguably inefficient but feels good to invest in this visibility up front 😅