- See Also
-
Links
- “Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
- “Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023
- “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023
- “The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, Marks & Tegmark 2023
- “Representation Engineering: A Top-Down Approach to AI Transparency”, Zou et al 2023
- “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, Li et al 2023
- “Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, Tian et al 2023
- “How Language Model Hallucinations Can Snowball”, Zhang et al 2023
- “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
- “GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
- “Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
- “Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
- “Language Models (Mostly) Know What They Know”, Kadavath et al 2022
- “Forecasting Future World Events With Neural Networks”, Zou et al 2022
- “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
- “Teaching Models to Express Their Uncertainty in Words”, Lin et al 2022
- “Co-training Improves Prompt-based Learning for Large Language Models”, Lang et al 2022
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
- “Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021
- “Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, Mielke et al 2020
- “GPT-3 Nonfiction”, Gwern 2020
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“Calibrated Language Models Must Hallucinate”, Kalai & Vempala 2023
“Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, Shrivastava et al 2023
“Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”
“Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Schoenegger & Park 2023
“Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”
“The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, Marks & Tegmark 2023
“Representation Engineering: A Top-Down Approach to AI Transparency”, Zou et al 2023
“Representation Engineering: A Top-Down Approach to AI Transparency”
“Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, Li et al 2023
“Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”
“Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, Tian et al 2023
“How Language Model Hallucinations Can Snowball”, Zhang et al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, Xie et al 2023
“Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”
“GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
“Predicting Consumer Contracts [With GPT-3]”, Kolt 2023
“Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, Nay 2023
“Can Large Language Models Reason about Medical Questions?”, Liévin et al 2022
“Language Models (Mostly) Know What They Know”, Kadavath et al 2022
“Forecasting Future World Events With Neural Networks”, Zou et al 2022
“Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, Srivastava et al 2022
“Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models”
“Teaching Models to Express Their Uncertainty in Words”, Lin et al 2022
“Co-training Improves Prompt-based Learning for Large Language Models”, Lang et al 2022
“Co-training Improves Prompt-based Learning for Large Language Models”
“AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, Wu et al 2021
“Calibrate Before Use: Improving Few-Shot Performance of Language Models”, Zhao et al 2021
“Calibrate Before Use: Improving Few-Shot Performance of Language Models”
“Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, Mielke et al 2020
“Reducing conversational agents’ overconfidence through linguistic calibration”
“GPT-3 Nonfiction”, Gwern 2020
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
confidence-calibration
prediction-ai
lm-calibration
Miscellaneous
Link Bibliography
-
https://arxiv.org/abs/2310.13014
: “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, Philipp Schoenegger, Peter S. Park -
https://arxiv.org/abs/2305.13534
: “How Language Model Hallucinations Can Snowball”, Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith -
https://arxiv.org/pdf/2303.08774.pdf#page=12&org=openai
: “GPT-4 Technical Report § Limitations: Calibration”, OpenAI -
2022-kolt.pdf
: “Predicting Consumer Contracts [With GPT-3]”, Noam Kolt -
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4335945
: “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, John Nay -
https://arxiv.org/abs/2207.08143
: “Can Large Language Models Reason about Medical Questions?”, Valentin Liévin, Christoffer Egeberg Hother, Ole Winther -
https://arxiv.org/abs/2207.05221#anthropic
: “Language Models (Mostly) Know What They Know”, -
https://arxiv.org/abs/2206.15474
: “Forecasting Future World Events With Neural Networks”, -
gpt-3-nonfiction
: “GPT-3 Nonfiction”, Gwern