Sharif Shameem · Jun 25, 2023 · 6:20 AM UTC

Sharif Shameem · Jun 25, 2023 · 6:20 AM UTC

Sharif Shameem

Sharif Shameem

@sharifshameem

25 Jun 2023

it's pretty jarring how massive of a gap there is between all the LLaMa finetunes and OpenAI's models for programming tasks on HumanEval+ assuming there's no data contamination on OpenAI's side, nothing even comes close which matches my subjective experience.

Jun 25, 2023 · 6:20 AM UTC

245

Teknium (e/λ) · Jun 25, 2023 · 11:42 AM UTC

Teknium (e/λ)

@Teknium1

25 Jun 2023

Replying to @sharifshameem @izzyz

Llama may be bad at code but why not put starcoder, wizardcoder, replit, and replit instruct on this list?

Nimit Mehra · Jun 25, 2023 · 4:58 PM UTC

Nimit Mehra

@nimitmehra

25 Jun 2023

Replying to @sharifshameem

Till when do you think would this gap remain. Are open source models getting better faster? Or open AI would always have dominance?

👩‍💻 Paige Bailey · Jun 26, 2023 · 9:28 AM UTC

👩‍💻 Paige Bailey

@DynamicWebPaige

26 Jun 2023

Replying to @sharifshameem

cc: @miltos1 @mmjb86 @maidotgimenez (as we think about code instruction tuning)

Ketan Singh · Jun 25, 2023 · 4:28 PM UTC

Ketan Singh

@ketansingh279

25 Jun 2023

Replying to @sharifshameem

@C_P_Gurnani still confident that you'd be able to rank 3rd here, let alone beat @openAI in terms of performance of your upcoming foundational model? It's great that you've accepted a technical challenge, but please set expectations realistically such that you and indirectly all Indians won't have to save face later. Accept gracefully when someone is significantly far ahead, like you'd expect others to do when we do something exceptional. cc: @Rajeev_GoI @sama @anandmahindra @PMOIndia

An Qu · Jun 25, 2023 · 12:11 PM UTC

An Qu

@hahahahohohe

25 Jun 2023

Replying to @sharifshameem

Overall GPT-4 is in a league of its own. Nothing comes even close to it still

anton · Jun 25, 2023 · 2:40 PM UTC

anton

@abacaj

25 Jun 2023

Replying to @sharifshameem

Mpt-30b is not on there. Starcoder models also score above 30 and wizard is at 50+. Open source isn’t that *far*

Jr Kibs · Jun 25, 2023 · 1:25 PM UTC

Jr Kibs @JrKibs

25 Jun 2023

Replying to @sharifshameem @TheRealAdamG

With all data they collect in real time from millions of people around the world, it will be hard to catch up with them

Tariq Rauf · Jun 25, 2023 · 7:19 PM UTC

Tariq Rauf

@tariqrauf

25 Jun 2023

Replying to @sharifshameem @TheRealAdamG

Wizardcoder-15b seems to be missing from the list. Scores 57, you can run it locally on your machine and is on GitHub Astonishing what open source is achieving with basically no resources

Shahul Es · Jun 25, 2023 · 3:37 PM UTC

Shahul Es

@Shahules786

25 Jun 2023

Replying to @sharifshameem

None of the open-source models shown are fine-tuned for coding. Another factor is the discontinuous nature of measuring accuracy (Exact Match). GPT-4 level performance is hard to achieve as of now for Open LLMs but surely there are better ones out there.

Chris · Jun 27, 2023 · 4:30 AM UTC

Chris @Cory29565470

27 Jun 2023

Replying to @sharifshameem

Dumb take: MODEL =\ KING, DATA = KING