it's pretty jarring how massive of a gap there is between all the LLaMa finetunes and OpenAI's models for programming tasks on HumanEval+
assuming there's no data contamination on OpenAI's side, nothing even comes close which matches my subjective experience.
Jun 25, 2023 · 6:20 AM UTC