My original tweet makes the performance look a bit better than it is, if you're looking for exact answers. Here's the accuracy in terms of exact solutions.
Replying to @colin_fraser
I wonder if it varies changing number representation (base, language, digits vs words, etc). Would be very interested in reading the source.
It's my own experimentation. I used the API to get 40000 solutions to randomly generated arithmetic problems (500 pairs of numbers * 4 operators * 20 solutions per pair). I pull out whatever the last number in the ChatGPT response is as the solution.
Replying to @colin_fraser
subtraction and division are cursed, and always have been.
I would say that this does make subtraction worse than maybe it is, and multiplication look better than it is, if you're looking for exact solutions. See my followup post
My original tweet makes the performance look a bit better than it is, if you're looking for exact answers. Here's the accuracy in terms of exact solutions.