“‘GPT Calibration’ Tag”,2021-03-21 ():
![]()
Bibliography for tag
ai/nn/transformer/gpt/calibration, most recent first: 1 related tag, 46 annotations, & 14 links (parent).
- See Also
- Gwern
- Links
- “How Do You Change a Chatbot’s Mind? When I Set out to Improve My Tainted Reputation With Chatbots, I Discovered a New World of A.I. Manipulation”, 2024
- “Future Events As Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs”, et al 2024
- “What Are the Odds? Language Models Are Capable of Probabilistic Reasoning”, et al 2024
- “Creativity Has Left the Chat: The Price of Debiasing Language Models”, 2024
- “Can Language Models Use Forecasting Strategies?”, et al 2024
- “To Believe or Not to Believe Your LLM”, et al 2024
- “Can Language Models Explain Their Own Classification Behavior?”, et al 2024
- “Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience”, et al 2024
- “Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation”, et al 2024
- “Few-Shot Recalibration of Language Models”, et al 2024
- “Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States”, et al 2024
- “The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4”, 2024
- “I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, et al 2024
- “Learning to Trust Your Feelings: Leveraging Self-Awareness in LLMs for Hallucination Mitigation”, et al 2024
- “Can AI Assistants Know What They Don’t Know?”, et al 2024
- “Challenges With Unsupervised LLM Knowledge Discovery”, et al 2023
- “Calibrated Language Models Must Hallucinate”, 2023
- “R-Tuning: Teaching Large Language Models to Refuse Unknown Questions”, et al 2023
- “Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation”, et al 2023
- “Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament”, 2023
- “The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets”, 2023
- “Representation Engineering: A Top-Down Approach to AI Transparency”, et al 2023
- “How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions”, et al 2023
- “Large Language Models Are Not Robust Multiple Choice Selectors”, et al 2023
- “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model”, et al 2023
- “Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned With Human Feedback”, et al 2023
- “How Language Model Hallucinations Can Snowball”, et al 2023
- “Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding”, et al 2023
- “GPT-4 Technical Report § Limitations: Calibration”, OpenAI 2023 (page 12 org openai)
- “Toolformer: Language Models Can Teach Themselves to Use Tools”, et al 2023
- “Predicting Consumer Contracts [With GPT-3]”, 2023
- “Large Language Models As Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards”, 2023
- “Can Large Language Models Reason about Medical Questions?”, et al 2022
- “Language Models (Mostly) Know What They Know”, et al 2022
- “Forecasting Future World Events With Neural Networks”, et al 2022
- “Beyond the Imitation Game: Quantifying and Extrapolating the Capabilities of Language Models”, et al 2022
- “Teaching Models to Express Their Uncertainty in Words”, et al 2022
- “Co-Training Improves Prompt-Based Learning for Large Language Models”, et al 2022
- “AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts”, et al 2021
- “Calibrate Before Use: Improving Few-Shot Performance of Language Models”, et al 2021
- “Reducing Conversational Agents’ Overconfidence through Linguistic Calibration”, et al 2020
- “Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task”, 2024
- “Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity”
- “Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [No]”
- “Language Models Model Us”
- M74108556
- Sort By Magic
- Miscellaneous
- Bibliography