fluffy · Dec 21, 2023 · 1:03 AM UTC

fluffy · Dec 21, 2023 · 1:03 AM UTC

fluffy

fluffy

@fluffykittnmeow

21 Dec 2023

1. Take pretrained LLMs 2. Prompt with "3.14159265358979323846" 3. ??? (circle size == pretraining tokens)

Dec 21, 2023 · 1:03 AM UTC

167

Felix · Dec 21, 2023 · 1:27 AM UTC

Felix @felix_red_panda

21 Dec 2023

Replying to @fluffykittnmeow

Seems like a very brittle "benchmark" but still a fun observation :)

fluffy · Dec 21, 2023 · 1:34 AM UTC

fluffy

@fluffykittnmeow

21 Dec 2023

Agreed! Though it's perhaps more fair than it seems due to every model except Mistral here using the same tokenizer (NeoX) and being trained on 300B tokens from The Pile.

more replies

Max Headroom · Dec 21, 2023 · 12:43 PM UTC

Max Headroom @growthwtf

21 Dec 2023

Replying to @fluffykittnmeow

One attempt or repeated queries?

fluffy · Dec 21, 2023 · 1:20 PM UTC

fluffy

@fluffykittnmeow

21 Dec 2023

I used argmax sampling, so the results are deterministic

faw · Dec 21, 2023 · 12:20 PM UTC

faw @fawwazanvilen

21 Dec 2023

Replying to @fluffykittnmeow

is the sequence correct though?

fluffy · Dec 21, 2023 · 12:25 PM UTC

fluffy

@fluffykittnmeow

21 Dec 2023

Yup, the y-axis here shows how many digits each model got right before making an error. All models were queried with top_k=1

more replies

Fareesh Vijayarangam · Dec 21, 2023 · 12:28 PM UTC

Fareesh Vijayarangam

@fareesh

21 Dec 2023

Replying to @fluffykittnmeow

lol lmao even

Fern · Dec 21, 2023 · 4:34 AM UTC

Fern

@hi_tysam

21 Dec 2023

Replying to @fluffykittnmeow

Oh I was about to raise a ruckus about you stealing fluffy's work before realizing that you were fluffy. ;PPPP Love the glam-up, it looks great! ;PPPP <3

samsja · Jan 2, 2024 · 7:21 AM UTC

samsja @samsja19

Jan 2

Replying to @fluffykittnmeow

Which paper is it coming from ? And is the number of tokens for Mistral 7b has been reported somewhere else ?

Nicholas Bardy · Dec 21, 2023 · 4:49 AM UTC

Nicholas Bardy @NicholasBardy

21 Dec 2023

Replying to @fluffykittnmeow

I’m fascinate. Curious to see this why a ton of models.