A recent trend is to fine-tune open-source LMs on ChatGPT outputs (e.g., Alpaca, Self-Instruct, Vicuna), with the aim of broadly imitating the model. In our new paper, we critically analyze this approach.
arxiv.org/abs/2305.15717 👇[1/N]
This is a really interesting read. I’ve had conversations with people who have trained chat models on top of Pythia and GPT-NeoX who repeatedly say that their models aren’t actually much better on benchmarks but massively preferred by users. I suspect it’s the same thing
May 27, 2023 · 6:11 PM UTC