Shawn Presser · Dec 31, 2019 · 11:40 PM UTC

Shawn Presser · Dec 31, 2019 · 11:40 PM UTC

Shawn Presser

Shawn Presser

@theshawwn

31 Dec 2019

I'm starting to think loss is harmful. Our loss has been a flat 2.2 for the past five days training GPT-2 1.5B. Yet according to human testing, it's been getting noticeably better every day. It's now good enough to amuse /r/dota2: teddit.net/r/DotA2/comments/…

Dec 31, 2019 · 11:40 PM UTC

Shawn Presser · Dec 31, 2019 · 11:40 PM UTC

Shawn Presser

@theshawwn

31 Dec 2019

The dota2 data is only 0.73% of the overall training data (73MB out of 10GB). Yet the bot is adept enough to convince /r/dota2 that it's talking about dota. Again: Loss has been a flat 2.2 for the last five days. And five days ago, the model wasn't this good.

Shawn Presser · Dec 31, 2019 · 11:43 PM UTC

Shawn Presser

@theshawwn

31 Dec 2019

The history of science shows that when something seems out of place, we should pay close attention. Loss != quality. This topic deserves thorough analysis. And as far as I know, no one has done it yet. Otherwise people wouldn't be using loss as a quality metric.

Kirito (e/acc) 🏴‍☠️ · Jan 1, 2020 · 12:54 AM UTC

Kirito (e/acc) 🏴‍☠️

@bronzeagepapi

1 Jan 2020

Replying to @theshawwn

I wonder if there is a connection to ”deep double descend” openai.com/blog/deep-double-…

Eugene Vinitsky · Apr 13, 2020 · 11:42 PM UTC

Eugene Vinitsky @EugeneVinitsky

13 Apr 2020

Replying to @theshawwn

Are there other interesting metrics that are evolving and visible? Distribution of weights, some entropy measures, etc.

FlyingOctopus0 · Jan 1, 2020 · 10:27 PM UTC

FlyingOctopus0 @FlyingOctopus0

1 Jan 2020

Replying to @theshawwn @heghbalz

This would suggest that the weights still travel to "better" arrangement despite near constant loss. I thought that at that point it's more of a random walk.

Jeffrey Li 💙💛 · Jan 2, 2020 · 10:22 PM UTC

Jeffrey Li 💙💛 @askerlee

2 Jan 2020

Replying to @theshawwn

Is the loss evaluated on dota2 data only or the whole corpus? It may only progress on dota2 and stall on the remaining

Pierre Ouannes · Jan 2, 2020 · 9:58 PM UTC

Pierre Ouannes @PierreOuannes

2 Jan 2020

Replying to @theshawwn

Something I don't understand here: the optimizer goal is to minimize the loss no? So how can it converge towards better solutions without actually reducing the loss? What tells it "this is a better sol" if the only measure it has for the direction is the gradient of the loss?

Jonathan Fly 👾 · Dec 31, 2019 · 11:43 PM UTC

Jonathan Fly 👾 @jonathanfly

31 Dec 2019

Replying to @theshawwn

Need to train a model to detect the 'getting better' trend here.