âLlamas Know What GPTs Donât Show: Surrogate Models for Confidence Estimationâ, 2023-11-15 ()â :
To maintain user trust, large language models (LLMs) should signal low confidence on examples where they are incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but as of November 2023, state-of-the-art LLMs such as GPT-4 and Claude-v1.3 do not provide access to these probabilities.
We first study eliciting confidence linguisticallyâasking an LLM for its confidence in its answerâwhich performs reasonably (80.5% AUC on GPT-4 averaged across 12 question-answering datasetsâ7% above a random baseline) but leaves room for improvement.
We then explore using a surrogate confidence modelâusing a model where we do have probabilities to evaluate the original modelâs confidence in a given question. Surprisingly, even though these probabilities come from a different and often weaker model, this method leads to higher AUC than linguistic confidences on 9â12 datasets.
Our best method composing linguistic confidences and surrogate model probabilities gives state-of-the-art confidence estimates on all 12 datasets (84.6% average AUC on GPT-4).