There are times and places for training your own models... With the release OpenAI's chatGPT API - coding is looking less like one of them. The human-eval pass@1 rate of ChatGPT is as good as the best Open Source model's pass@100 rate. And this is still just GPT 3.5...

Mar 1, 2023 · 8:32 PM UTC

Not only that, but 10x better pricing than text-davinci, and far lower latency. After seeing this news today, I really would not want to be one of OpenAI's competitors
For those unfamiliar with pass@k, this means if I took the best open-source code model (CodeGen 16B) and sampled 100 generations, the probability that 1 of those 100 generations was correct is the same as the probability ChatGPT gets it right on the first try.
This unification of capabilities is insane. Before, the best code models needed to be rigorously fine-tuned on code - forgetting much of general language modeling, Now, ChatGPT is the best coding model while remaining the best generalist model.
Replying to @amanrsanger
There is the memorization factor here too! HumanEval exists in many forms across multiple repositories at this point and could technically leak into the train set.
Possibly, but from testing chatgpt in Cursor vs text-davinci and code-davinci, it qualitatively feels much better.
Replying to @amanrsanger
That’s insane. Where’s the table from?
Replying to @OwariDa
The Codex and CodeGen numbers come from the CodeGen paper: arxiv.org/pdf/2203.13474.pdf I calculated the code-davinci, text-davinci, and gpt-3.5 numbers via a fork of human-eval: github.com/Sanger2000/human-…
No real reason - mainly bc of rate limits. Ran it pretty quickly for pass@1. It would be kinda slow for pass@10 and really slow for pass@100.