Riley Goodside · Mar 11, 2023 · 4:16 AM UTC

Riley Goodside · Mar 11, 2023 · 4:16 AM UTC

Riley Goodside

Riley Goodside

@goodside

11 Mar 2023

Numeric scores generated in GPT-3.5/ChatGPT snap to discrete increments, causing ties in comparisons. To fix this, try a probability-weighted average using the top-5 token logprobs in GPT-3.5. E.g. here: (50 * .7396 + 60 * .1027 + ...) / (.7396 + ...)

Mar 11, 2023 · 4:16 AM UTC

115

Riley Goodside · Mar 11, 2023 · 4:16 AM UTC

Riley Goodside

@goodside

11 Mar 2023

Generated scores need to be a single token for this. 0-100 is fine, or 0-10, A-F, ordered bin labels (good/neutral/bad) — whatever you can correspond to a number. Also, no logprobs in ChatGPT API, so this is GPT-3/3.5 only.

Riley Goodside · Mar 11, 2023 · 4:16 AM UTC

Riley Goodside

@goodside

11 Mar 2023

Sampling discards information. Avoid it when you can.

Soham Sarkar · Mar 11, 2023 · 2:25 PM UTC

Soham Sarkar @sohamxsarkar

11 Mar 2023

Replying to @goodside

How would this fare across json object classification/scoring (say properties of a deal in a CRM)

Riley Goodside · Mar 11, 2023 · 3:21 PM UTC

Riley Goodside

@goodside

11 Mar 2023

Assigning one score to a JSON input should work like my example, no major difference. If you mean generating multiple scores as JSON that should still work, but generated scores are conditional on previously sampled scores in the completion.

more replies

Visar Berisha · Mar 11, 2023 · 7:50 AM UTC

Visar Berisha @visarASU

11 Mar 2023

Replying to @goodside

Also, curious. Do the scores improve if you ask it to justify the value it produced as an output?

Riley Goodside · Mar 11, 2023 · 7:58 AM UTC

Riley Goodside

@goodside

11 Mar 2023

Only if you ask for (or demonstrate in k-shot) rationales that come before the answer. I see people make that mistake a lot — if it says the explanation after, it’s not a rationale it’s a rationalization.

more replies

Jason Gu · Mar 11, 2023 · 4:18 AM UTC

Jason Gu

@jasonprompts

11 Mar 2023

Replying to @goodside

What about meme numbers like 420 and 69?

Riley Goodside · Mar 11, 2023 · 4:29 AM UTC

Riley Goodside

@goodside

11 Mar 2023

You'll have to wait for Elon's AI for that.

Gary Basin 🍍 · Apr 24, 2023 · 6:06 PM UTC

Gary Basin 🍍

@garybasin

24 Apr 2023

Replying to @goodside

Do you mean from davinci? how did you get logprobs from 3.5

Sophia Xu · Mar 11, 2023 · 8:18 PM UTC

Sophia Xu

@thesophiaxu

11 Mar 2023

Replying to @goodside

Wow we’re literally doing this too