it's pretty jarring how massive of a gap there is between all the LLaMa finetunes and OpenAI's models for programming tasks on HumanEval+ assuming there's no data contamination on OpenAI's side, nothing even comes close which matches my subjective experience.

Jun 25, 2023 · 6:20 AM UTC

Llama may be bad at code but why not put starcoder, wizardcoder, replit, and replit instruct on this list?
Replying to @sharifshameem
Till when do you think would this gap remain. Are open source models getting better faster? Or open AI would always have dominance?
Replying to @sharifshameem
cc: @miltos1 @mmjb86 @maidotgimenez (as we think about code instruction tuning)
Replying to @sharifshameem
@C_P_Gurnani still confident that you'd be able to rank 3rd here, let alone beat @openAI in terms of performance of your upcoming foundational model? It's great that you've accepted a technical challenge, but please set expectations realistically such that you and indirectly all Indians won't have to save face later. Accept gracefully when someone is significantly far ahead, like you'd expect others to do when we do something exceptional. cc: @Rajeev_GoI @sama @anandmahindra @PMOIndia
Replying to @sharifshameem
Overall GPT-4 is in a league of its own. Nothing comes even close to it still
Replying to @sharifshameem
Mpt-30b is not on there. Starcoder models also score above 30 and wizard is at 50+. Open source isn’t that *far*
With all data they collect in real time from millions of people around the world, it will be hard to catch up with them
Wizardcoder-15b seems to be missing from the list. Scores 57, you can run it locally on your machine and is on GitHub Astonishing what open source is achieving with basically no resources
Replying to @sharifshameem
None of the open-source models shown are fine-tuned for coding. Another factor is the discontinuous nature of measuring accuracy (Exact Match). GPT-4 level performance is hard to achieve as of now for Open LLMs but surely there are better ones out there.
Replying to @sharifshameem
Dumb take: MODEL =\ KING, DATA = KING