“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Alistar Pullen2023-02-06 (, , ; backlinks)⁠:

The larger models (particularly the davinci family) obviously produce the highest quality outputs, but are the slowest and most expensive to run… According to Alex Gravely, one of the creators of Github’s Copilot, there is a 1% drop in completions for every additional 10ms of latency. This logic applies to search too so it was an immediate priority to move away from large models like davinci to smaller ones like ada and babbage.

[Below: k-shot davinci-003 answers; Above: results from a babbage finetuned on those answers.]

Our solution is simple: generate a moderately sized corpus of completions made by davinci for a given task, and fine-tune a model like babbage to do the same task. If done correctly you can get near-identical completions (or at least 90% similarity) at a 40× lower price and around 4–5× better latency.

You can go one-better than this, if you’re willing to invest a little time, you can introduce a human into the loop too: we recently did this to fine-tune a babbage model to competently identify characteristics in code, so I got ChatGPT to create me a basic web UI to allow us to easily review and improve the identification which davinci had carried out; fundamentally you’re never going to get like-for-like performance out of a smaller model so making the completions better than the model you’re trying to mimic means you’ll at least be closer when the training is complete.