Deep reinforcement learning from human preferences
‘instruct-tuning LLMs’ directory
GPT-4 Technical Report § Limitations: Calibration
Towards a Human-like Open-Domain Chatbot
‘inner monologue (AI)’ directory
Creativity Has Left the Chat: The Price of Debiasing Language Models
Consistency-diversity-realism Pareto fronts of conditional image generative models
Epistemic Calibration and Searching the Space of Truth
https://www.anthropic.com/news/claude-2
Introducing the next Generation of Claude
Constitutional AI: Harmlessness from AI Feedback
Mysteries of Mode Collapse