Aman Sanger · Mar 1, 2023 · 8:32 PM UTC

Aman Sanger · Mar 1, 2023 · 8:32 PM UTC

Aman Sanger

Aman Sanger

@amanrsanger

1 Mar 2023

There are times and places for training your own models... With the release OpenAI's chatGPT API - coding is looking less like one of them. The human-eval pass@1 rate of ChatGPT is as good as the best Open Source model's pass@100 rate. And this is still just GPT 3.5...

Mar 1, 2023 · 8:32 PM UTC

533

Aman Sanger · Mar 1, 2023 · 8:32 PM UTC

Aman Sanger

@amanrsanger

1 Mar 2023

Not only that, but 10x better pricing than text-davinci, and far lower latency. After seeing this news today, I really would not want to be one of OpenAI's competitors

Aman Sanger · Mar 1, 2023 · 8:32 PM UTC

Aman Sanger

@amanrsanger

1 Mar 2023

For those unfamiliar with pass@k, this means if I took the best open-source code model (CodeGen 16B) and sampled 100 generations, the probability that 1 of those 100 generations was correct is the same as the probability ChatGPT gets it right on the first try.

Aman Sanger · Mar 1, 2023 · 8:32 PM UTC

Aman Sanger

@amanrsanger

1 Mar 2023

This unification of capabilities is insane. Before, the best code models needed to be rigorously fine-tuned on code - forgetting much of general language modeling, Now, ChatGPT is the best coding model while remaining the best generalist model.

Victor Dibia · Mar 1, 2023 · 9:18 PM UTC

Victor Dibia

@vykthur

1 Mar 2023

Replying to @amanrsanger

There is the memorization factor here too! HumanEval exists in many forms across multiple repositories at this point and could technically leak into the train set.

Aman Sanger · Mar 1, 2023 · 9:40 PM UTC

Aman Sanger

@amanrsanger

1 Mar 2023

Possibly, but from testing chatgpt in Cursor vs text-davinci and code-davinci, it qualitatively feels much better.

more replies

Joel Eriksson · Mar 1, 2023 · 9:07 PM UTC

Joel Eriksson

@OwariDa

1 Mar 2023

Replying to @amanrsanger

Link to / name of the paper?

Aman Sanger · Mar 1, 2023 · 9:37 PM UTC

Aman Sanger

@amanrsanger

1 Mar 2023

The Codex and CodeGen numbers come from the CodeGen paper: arxiv.org/pdf/2203.13474.pdf I calculated the code-davinci, text-davinci, and gpt-3.5 numbers via a fork of human-eval: github.com/Sanger2000/human-…

GitHub - Sanger2000/human-eval: Code for the paper "Evaluating Large Language Models Trained on...

Code for the paper "Evaluating Large Language Models Trained on Code" - GitHub - Sanger2000/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"

github.com

more replies

Jordan Burgess · Mar 1, 2023 · 10:49 PM UTC

Jordan Burgess

@jordnb

1 Mar 2023

Replying to @amanrsanger

That’s insane. Where’s the table from?

Aman Sanger · Mar 3, 2023 · 5:46 AM UTC

Aman Sanger

@amanrsanger

3 Mar 2023

Aman Sanger

@amanrsanger

1 Mar 2023

Replying to @OwariDa

Brendan Dolan-Gavitt · Mar 2, 2023 · 7:42 PM UTC

Brendan Dolan-Gavitt @moyix

2 Mar 2023

Replying to @amanrsanger

By the way, why no pass@10 or pass@100 for ChatGPT? It looks like the API lets you ask for multiple answers and control the temperature: platform.openai.com/docs/api…

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

platform.openai.com

Aman Sanger · Mar 2, 2023 · 9:34 PM UTC

Aman Sanger

@amanrsanger

2 Mar 2023

No real reason - mainly bc of rate limits. Ran it pretty quickly for pass@1. It would be kinda slow for pass@10 and really slow for pass@100.

more replies