How Do You Change a Chatbot’s Mind? When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
Creativity Has Left the Chat: The Price of Debiasing Language Models
Can Language Models Explain Their Own Classification Behavior?
Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
Do LLMs Know about Hallucination? An Empirical Investigation of LLM’s Hidden States
The Non-Effect of Sampling Temperature on Problem Solving in GPT-3.5/GPT-4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions
Llamas Know What GPTs Don’t Show: Surrogate Models for Confidence Estimation
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Representation Engineering: A Top-Down Approach to AI Transparency
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Large Language Models Are Not Robust Multiple Choice Selectors
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Decomposition Enhances Reasoning via Self-Evaluation Guided Decoding
Toolformer: Language Models Can Teach Themselves to Use Tools
Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Co-training Improves Prompt-based Learning for Large Language Models
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts
Calibrate Before Use: Improving Few-Shot Performance of Language Models
Reducing conversational agents’ overconfidence through linguistic calibration
Situational Awareness and Out-Of-Context Reasoning § Biased Coin Task
Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity
Can AI Outpredict Humans? Results From Metaculus’s Q3 AI Forecasting Benchmark [No]
GPT-3 Gives Some Interesting True and False Answers to Some Questions. But It’s Important to Note That It Gives opposite Answers Just As Often, I Cheery Picked the Most ‘Sensational’ Ones. Usually It Said the opposite Thing, and It Also Role-Plays Sometimes (eg. As a Spy)
2024-paruchuri-figure3-comparingrandomnumbergenerationofllmstotargetdistributionsshowingseveremiscalibrationandmodecollapse.png
2023-openai-figure8-rlhftrainingdestroysgpt4predictioncalibration.png
https://www.lesswrong.com/posts/CkhJAxHeyFCg2EcET/are-language-models-good-at-making-predictions
https://www.lesswrong.com/posts/iaHk9DMCbrYsKuqgS/simple-distribution-approximation-when-sampled-100-times-can-1
How Do You Change a Chatbot’s Mind? When I set out to improve my tainted reputation with chatbots, I discovered a new world of A.I. manipulation
https%253A%252F%252Fwww.nytimes.com%252F2024%252F08%252F30%252Ftechnology%252Fai-chatbot-chatgpt-manipulation.html.html
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament
https%253A%252F%252Farxiv.org%252Fpdf%252F2303.08774%2523page%253D12%2526org%253Dopenai.html
Large Language Models as Fiduciaries: A Case Study Toward Robustly Communicating With Artificial Intelligence Through Legal Standards
https%253A%252F%252Fpapers.ssrn.com%252Fsol3%252Fpapers.cfm%253Fabstract_id%253D4335945.html
https%253A%252F%252Farxiv.org%252Fabs%252F2207.05221%2523anthropic.html
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Wikipedia Bibliography: