“‘Preference Learning’ Tag”,2019-09-12 ():
![]()
Bibliography for tag
reinforcement-learning/preference-learning, most recent first: 8 related tags, 164 annotations, & 39 links (parent).
- See Also
- Gwern
- Links
- “AI-Generated Poetry Is Indistinguishable from Human-Written Poetry and Is Rated More Favorably”, 2024
- “Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL”, et al 2024
- “Thinking LLMs: General Instruction Following With Thought Generation”, et al 2024
- “Language Models Learn to Mislead Humans via RLHF”, et al 2024
- “Does Style Matter? Disentangling Style and Substance in Chatbot Arena”
- “LLM Applications I Want To See”, 2024
- “SEAL: Systematic Error Analysis for Value ALignment”, et al 2024
- “Hermes 3 Technical Report”, et al 2024
- “Does Refusal Training in LLMs Generalize to the Past Tense?”, 2024
- “Super(ficial)-Alignment: Strong Models May Deceive Weak Models in Weak-To-Strong Generalization”, et al 2024
- “Nemotron-4 340B Technical Report”, et al 2024
- “Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback”, et al 2024
- “Discovering Preference Optimization Algorithms With and for Large Language Models”, et al 2024
- “Beyond Model Collapse: Scaling Up With Synthesized Data Requires Reinforcement”, et al 2024
- “Safety Alignment Should Be Made More Than Just a Few Tokens Deep”, et al 2024
- “AlignEZ: Is Free Self-Alignment Possible?”, et al 2024
- “Aligning LLM Agents by Learning Latent Preference from User Edits”, et al 2024
- “Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data”, et al 2024
- “From r to Q✱: Your Language Model Is Secretly a Q-Function”, et al 2024
- “Dataset Reset Policy Optimization for RLHF”, et al 2024
- “ControlNet++: Improving Conditional Controls With Efficient Consistency Feedback”, et al 2024
- “Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators”, et al 2024
- “TextCraftor: Your Text Encoder Can Be Image Quality Controller”, et al 2024
- “RewardBench: Evaluating Reward Models for Language Modeling”, et al 2024
- “Evaluating Text to Image Synthesis: Survey and Taxonomy of Image Quality Metrics”, et al 2024
- “When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback”, et al 2024
- “V-STaR: Training Verifiers for Self-Taught Reasoners”, et al 2024
- “I Think, Therefore I Am: Benchmarking Awareness of Large Language Models Using AwareBench”, et al 2024
- “Can AI Assistants Know What They Don’t Know?”, et al 2024
- “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training”, et al 2024
- “Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM”, et al 2024
- “A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity”, et al 2024
- “Reasons to Reject? Aligning Language Models With Judgments”, et al 2023
- “Rich Human Feedback for Text-To-Image Generation”, et al 2023
- “Language Model Alignment With Elastic Reset”, et al 2023
- “The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning”, et al 2023
- “Universal Jailbreak Backdoors from Poisoned Human Feedback”, Rando & 2023
- “Diffusion Model Alignment Using Direct Preference Optimization”, et al 2023
- “Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild”, et al 2023
- “Specific versus General Principles for Constitutional AI”, et al 2023
- “Eureka: Human-Level Reward Design via Coding Large Language Models”, et al 2023
- “A General Theoretical Paradigm to Understand Learning from Human Preferences”, et al 2023
- “Interpreting Learned Feedback Patterns in Large Language Models”, et al 2023
- “UltraFeedback: Boosting Language Models With High-Quality Feedback”, et al 2023
- “Motif: Intrinsic Motivation from Artificial Intelligence Feedback”, et al 2023
- “Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack”, et al 2023
- “STARC: A General Framework For Quantifying Differences Between Reward Functions”, et al 2023
- “AceGPT, Localizing Large Language Models in Arabic”, et al 2023
- “RLAIF: Scaling Reinforcement Learning from Human Feedback With AI Feedback”, et al 2023
- “Activation Addition: Steering Language Models Without Optimization”, et al 2023
- “ReST: Reinforced Self-Training (ReST) for Language Modeling”, et al 2023
- “FABRIC: Personalizing Diffusion Models With Iterative Feedback”, et al 2023
- “LLaMA-2: Open Foundation and Fine-Tuned Chat Models”, et al 2023
- “Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations”, et al 2023
- “Introducing Superalignment”, 2023
- “Are Aligned Neural Networks Adversarially Aligned?”, et al 2023
- “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—And Not Going Anywhere”, 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, 2023
- “Microsoft and OpenAI Forge Awkward Partnership As Tech’s New Power Couple: As the Companies Lead the AI Boom, Their Unconventional Arrangement Sometimes Causes Conflict”, 2023
- “Direct Preference Optimization (DPO): Your Language Model Is Secretly a Reward Model”, et al 2023
- “Improving Language Models With Advantage-Based Offline Policy Gradients”, et al 2023
- “LIMA: Less Is More for Alignment”, et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, 2023
- “SELF-ALIGN: Principle-Driven Self-Alignment of Language Models from Scratch With Minimal Human Supervision”, et al 2023
- “Pick-A-Pic: An Open Dataset of User Preferences for Text-To-Image Generation”, et al 2023
- “Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems”, et al 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, 2023
- “OpenAI’s Sam Altman Talks ChatGPT And How Artificial General Intelligence Can ‘Break Capitalism’”, 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React”, et al 2023
- “The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, 2023
- “Self-Instruct: Aligning Language Models With Self-Generated Instructions”, et al 2022
- “HALIE: Evaluating Human-Language Model Interaction”, et al 2022
- “Constitutional AI: Harmlessness from AI Feedback”, et al 2022
- “Solving Math Word Problems With Process & Outcome-Based Feedback”, et al 2022
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, 2022
- “When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, et al 2022
- “Scaling Laws for Reward Model Overoptimization”, et al 2022
- “Teacher Forcing Recovers Reward Functions for Text Generation”, et al 2022
- “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, et al 2022
- “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization”, et al 2022
- “Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, et al 2022
- “Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning Using Past Experience”, et al 2022
- “Improved Policy Optimization for Online Imitation Learning”, et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, et al 2022
- “Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-Time Planning”, et al 2022
- “Inferring Rewards from Language in Context”, et al 2022
- “SURF: Semi-Supervised Reward Learning With Data Augmentation for Feedback-Efficient Preference-Based Reinforcement Learning”, et al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, et al 2022
- “Safe Deep RL in 3D Environments Using Human Feedback”, et al 2022
- “A Survey of Controllable Text Generation Using Transformer-Based Pre-Trained Language Models”, et al 2022
- “WebGPT: Browser-Assisted Question-Answering With Human Feedback”, et al 2021
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, et al 2021
- “Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, et al 2021
- “A General Language Assistant As a Laboratory for Alignment”, et al 2021
- “Cut the CARP: Fishing for Zero-Shot Story Evaluation”, et al 2021
- “Recursively Summarizing Books With Human Feedback”, et al 2021
- “B-Pref: Benchmarking Preference-Based Reinforcement Learning”, et al 2021
- “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, et al 2021
- “Embracing New Techniques in Deep Learning for Estimating Image Memorability”, 2021
- “A Survey of Preference-Based Reinforcement Learning Methods”, et al 2021
- “Learning What To Do by Simulating the Past”, et al 2021
- “Language Models Have a Moral Dimension”, et al 2021
- “Brain-Computer Interface for Generating Personally Attractive Images”, et al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
- “Human-Centric Dialog Training via Offline Reinforcement Learning”, et al 2020
- “Learning to Summarize from Human Feedback”, et al 2020
- “Learning Personalized Models of Human Behavior in Chess”, McIlroy- et al 2020
- “Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy- et al 2020
- “Active Preference-Based Gaussian Process Regression for Reward Learning”, et al 2020
- “Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, et al 2020
- “RL Agents Implicitly Learning Human Preferences”, 2020
- “Reward-Rational (implicit) Choice: A Unifying Formalism for Reward Learning”, et al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, 2020
- “Deep Bayesian Reward Learning from Preferences”, 2019
- “Learning Norms from Stories: A Prior for Value Aligned Agents”, et al 2019
- “Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, 2019
- “Learning Human Objectives by Evaluating Hypothetical Behavior”, et al 2019
- “Preference-Based Learning for Exoskeleton Gait Optimization”, et al 2019
- “Do Massively Pretrained Language Models Make Better Storytellers?”, et al 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, et al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, et al 2019
- “Fine-Tuning Language Models from Human Preferences”, et al 2019
- “Lm-Human-Preferences”, et al 2019
- “Better Rewards Yield Better Summaries: Learning to Summarise Without References”, et al 2019
- “Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, et al 2019
- “Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, et al 2019
- “Reward Learning from Human Preferences and Demonstrations in Atari”, et al 2018
- “StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, et al 2018
- “Toward Diverse Text Generation With Inverse Reinforcement Learning”, et al 2018
- “Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, et al 2018
- “Convergence of Value Aggregation for Imitation Learning”, 2018
- “A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, 2017
- “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, et al 2017
- “DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, et al 2017
- “NIMA: Neural Image Assessment”, 2017
- “Towards Personalized Human AI Interaction—Adapting the Behavior of AI Agents Using Neural Signatures of Subjective Interest”, et al 2017
- “A Deep Architecture for Unified Esthetic Prediction”, 2017
- “Learning Human Behaviors from Motion Capture by Adversarial Imitation”, et al 2017
- “Learning from Human Preferences”, et al 2017
- “Deep Reinforcement Learning from Human Preferences”, et al 2017
- “Learning through Human Feedback [Blog]”, et al 2017
- “Adversarial Ranking for Language Generation”, et al 2017
- “An Invitation to Imitation”, 2015
- “Just Sort It! A Simple and Effective Approach to Active Preference Learning”, 2015
- “Algorithmic and Human Teaching of Sequential Decision Tasks”, 2012
- “Bayesian Active Learning for Classification and Preference Learning”, et al 2011
- “DAgger: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, et al 2010
- “John Schulman’s Homepage”, 2024
- “An Analysis of AI Political Preferences from a European Perspective”
- “Something Weird Is Happening With LLMs and Chess”, 2024
- “Transformers As Variational Autoencoders”
- “The Taming of the AI”
- “Copilot Stops Working on `gender` Related Subjects · Community · Discussion #72603”
- “Transformer-VAE for Program Synthesis”
- “Claude’s Character”, 2024
- “How Did You Do On The AI Art Turing Test?”
- “Tülu 3: The next Era in Open Post-Training”
- “Interpreting Preference Models W/Sparse Autoencoders”
- “When Your AIs Deceive You: Challenges With Partial Observability in RLHF”
- “Learning and Manipulating Learning”
- “Model Mis-Specification and Inverse Reinforcement Learning”
- “Full Toy Model for Preference Learning”
- Sort By Magic
- Wikipedia
- Miscellaneous
- Bibliography