Numeric scores generated in GPT-3.5/ChatGPT snap to discrete increments, causing ties in comparisons. To fix this, try a probability-weighted average using the top-5 token logprobs in GPT-3.5.
E.g. here: (50 * .7396 + 60 * .1027 + ...) / (.7396 + ...)
Mar 11, 2023 · 4:16 AM UTC