I'm starting to think loss is harmful.
Our loss has been a flat 2.2 for the past five days training GPT-2 1.5B. Yet according to human testing, it's been getting noticeably better every day.
It's now good enough to amuse /r/dota2: teddit.net/r/DotA2/comments/…
Dec 31, 2019 · 11:40 PM UTC
The dota2 data is only 0.73% of the overall training data (73MB out of 10GB). Yet the bot is adept enough to convince /r/dota2 that it's talking about dota.
Again: Loss has been a flat 2.2 for the last five days.
And five days ago, the model wasn't this good.