- See Also
-
Links
- “Rewarding Chatbots for Real-World Engagement With Millions of Users”, Et Al 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Et Al 2023
- “The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, 2023
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, 2022
- “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Et Al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
- “Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning Using Past Experience”, Et Al 2022
- “Improved Policy Optimization for Online Imitation Learning”, Et Al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Et Al 2022
- “RL With KL Penalties Is Better Viewed As Bayesian Inference”, Et Al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Et Al 2022
- “Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-time Planning”, Et Al 2022
- “Inferring Rewards from Language in Context”, Et Al 2022
- “SURF: Semi-supervised Reward Learning With Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning”, Et Al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Et Al 2022
- “Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
- “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Et Al 2022
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
- “WebGPT: Browser-assisted Question-answering With Human Feedback”, Et Al 2021
- “Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Et Al 2021
- “A General Language Assistant As a Laboratory for Alignment”, Et Al 2021
- “Cut the CARP: Fishing for Zero-shot Story Evaluation”, Et Al 2021
- “Recursively Summarizing Books With Human Feedback”, Et Al 2021
- “B-Pref: Benchmarking Preference-Based Reinforcement Learning”, Et Al 2021
- “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Et Al 2021
- “A Survey of Preference-Based Reinforcement Learning Methods”, Et Al 2021
- “Learning What To Do by Simulating the Past”, Et Al 2021
- “Language Models Have a Moral Dimension”, Et Al 2021
- “Brain-computer Interface for Generating Personally Attractive Images”, Et Al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
- “Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
- “Learning to Summarize from Human Feedback”, Et Al 2020
- “Learning Personalized Models of Human Behavior in Chess”, McIlroy-Et Al 2020
- “Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Et Al 2020
- “Active Preference-Based Gaussian Process Regression for Reward Learning”, Et Al 2020
- “Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Et Al 2020
- “RL Agents Implicitly Learning Human Preferences”, 2020
- “Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Et Al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, 2020
- “Deep Bayesian Reward Learning from Preferences”, 2019
- “Learning Norms from Stories: A Prior for Value Aligned Agents”, Et Al 2019
- “Learning Human Objectives by Evaluating Hypothetical Behavior”, Et Al 2019
- “Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, 2019
- “Preference-Based Learning for Exoskeleton Gait Optimization”, Et Al 2019
- “Do Massively Pretrained Language Models Make Better Storytellers?”, Et Al 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Et Al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, Et Al 2019
- “Fine-Tuning Language Models from Human Preferences”, Et Al 2019
- “Lm-human-preferences”, Et Al 2019
- “Better Rewards Yield Better Summaries: Learning to Summarise Without References”, Et Al 2019
- “Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, Et Al 2019
- “Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, Et Al 2019
- “Reward Learning from Human Preferences and Demonstrations in Atari”, Et Al 2018
- “StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Et Al 2018
- “Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, Et Al 2018
- “Convergence of Value Aggregation for Imitation Learning”, 2018
- “A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, 2017
- “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, Et Al 2017
- “DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Et Al 2017
- “Towards Personalized Human AI Interaction—adapting the Behavior of AI Agents Using Neural Signatures of Subjective Interest”, Et Al 2017
- “Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Et Al 2017
- “Learning from Human Preferences”, Et Al 2017
- “Learning through Human Feedback”, Et Al 2017
- “Deep Reinforcement Learning from Human Preferences”, Et Al 2017
- “An Invitation to Imitation”, 2015
- “Just Sort It! A Simple and Effective Approach to Active Preference Learning”, 2015
- “Bayesian Active Learning for Classification and Preference Learning”, Et Al 2011
- “DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Et Al 2010
- “Transformers As Variational Autoencoders”
- “Transformer-VAE for Program Synthesis”
- “Learning through Human Feedback [blog]”
- Miscellaneous
- Link Bibliography
See Also
Links
“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Et Al 2023
“Rewarding Chatbots for Real-World Engagement with Millions of Users”, 2023-03-10 ( ; similar)
“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, 2023
“Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×”, 2023-02-06 ( ; backlinks; similar)
“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Et Al 2023
“Big Tech was moving cautiously on AI. Then came ChatGPT. Google, Facebook and Microsoft helped build the scaffolding of AI. Smaller companies are taking it to the masses, forcing Big Tech to react.”, 2023-01-27 ( ; similar)
“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, 2023
“The inside story of ChatGPT: How OpenAI founder Sam Altman built the world’s hottest technology with billions from Microsoft”, 2023-01-25 ( ; backlinks; similar)
“Mysteries of Mode Collapse § Inescapable Wedding Parties”, 2022
“Mysteries of mode collapse § Inescapable wedding parties”, 2022-11-08 ( ; similar; bibliography)
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Et Al 2022
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, 2022-10-14 ( ; similar; bibliography)
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, 2022-08-25 ( ; similar; bibliography)
“Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning Using Past Experience”, Et Al 2022
“Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning using Past Experience”, 2022-08-09 (similar)
“Improved Policy Optimization for Online Imitation Learning”, Et Al 2022
“Improved Policy Optimization for Online Imitation Learning”, 2022-07-29 ( ; similar)
“Quark: Controllable Text Generation With Reinforced Unlearning”, Et Al 2022
“Quark: Controllable Text Generation with Reinforced Unlearning”, 2022-05-26 ( ; similar)
“RL With KL Penalties Is Better Viewed As Bayesian Inference”, Et Al 2022
“RL with KL penalties is better viewed as Bayesian inference”, 2022-05-23 ( ; similar)
“Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Et Al 2022
“Housekeep: Tidying Virtual Households using Commonsense Reasoning”, 2022-05-22 ( ; backlinks; similar)
“Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-time Planning”, Et Al 2022
“Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning”, 2022-04-07 ( ; similar)
“Inferring Rewards from Language in Context”, Et Al 2022
“Inferring Rewards from Language in Context”, 2022-04-05 (similar)
“SURF: Semi-supervised Reward Learning With Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning”, Et Al 2022
“SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning”, 2022-03-18 ( ; similar)
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Et Al 2022
“InstructGPT: Training language models to follow instructions with human feedback”, 2022-03-04 ( ; similar)
“Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
“Safe Deep RL in 3D Environments using Human Feedback”, 2022-01-20 ( ; similar)
“A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Et Al 2022
“A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models”, 2022-01-14 ( ; similar)
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Et Al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”, 2021-12-16 ( ; similar; bibliography)
“WebGPT: Browser-assisted Question-answering With Human Feedback”, Et Al 2021
“WebGPT: Browser-assisted question-answering with human feedback”, 2021-12-16 ( ; similar; bibliography)
“Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Et Al 2021
“Modeling Strong and Human-Like Gameplay with KL-Regularized Search”, 2021-12-14 ( ; similar)
“A General Language Assistant As a Laboratory for Alignment”, Et Al 2021
“A General Language Assistant as a Laboratory for Alignment”, 2021-12-01 ( ; similar; bibliography)
“Cut the CARP: Fishing for Zero-shot Story Evaluation”, Et Al 2021
“Cut the CARP: Fishing for zero-shot story evaluation”, 2021-10-06 ( ; similar)
“Recursively Summarizing Books With Human Feedback”, Et Al 2021
“Recursively Summarizing Books with Human Feedback”, 2021-09-22 ( ; similar; bibliography)
“B-Pref: Benchmarking Preference-Based Reinforcement Learning”, Et Al 2021
“B-Pref: Benchmarking Preference-Based Reinforcement Learning”, 2021-06-08 ( ; similar)
“Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Et Al 2021
“Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem”, 2021-06-03 ( ; backlinks; similar; bibliography)
“A Survey of Preference-Based Reinforcement Learning Methods”, Et Al 2021
“A Survey of Preference-Based Reinforcement Learning Methods”, 2021-05-20 (similar)
“Learning What To Do by Simulating the Past”, Et Al 2021
“Learning What To Do by Simulating the Past”, 2021-04-08 (similar)
“Language Models Have a Moral Dimension”, Et Al 2021
“Language Models have a Moral Dimension”, 2021-03-08 ( ; backlinks; similar)
“Brain-computer Interface for Generating Personally Attractive Images”, Et Al 2021
“Brain-computer interface for generating personally attractive images”, 2021-02-12 ( ; similar)
“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, 2021
“Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets”, 2021 ( ; similar)
“Human-centric Dialog Training via Offline Reinforcement Learning”, Et Al 2020
“Human-centric Dialog Training via Offline Reinforcement Learning”, 2020-10-12 ( ; similar)
“Learning to Summarize from Human Feedback”, Et Al 2020
“Learning to summarize from human feedback”, 2020-09-02 ( ; similar)
“Learning Personalized Models of Human Behavior in Chess”, McIlroy-Et Al 2020
“Learning Personalized Models of Human Behavior in Chess”, 2020-08-23 ( ; similar)
“Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Et Al 2020
“Aligning Superhuman AI with Human Behavior: Chess as a Model System”, 2020-06-02 ( ; similar)
“Active Preference-Based Gaussian Process Regression for Reward Learning”, Et Al 2020
“Active Preference-Based Gaussian Process Regression for Reward Learning”, 2020-05-06 ( ; similar)
“Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Et Al 2020
“Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, 2020-02-21 ( ; backlinks; similar)
“RL Agents Implicitly Learning Human Preferences”, 2020
“RL agents Implicitly Learning Human Preferences”, 2020-02-14 ( ; similar)
“Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Et Al 2020
“Reward-rational (implicit) choice: A unifying formalism for reward learning”, 2020-02-12 ( ; similar)
“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, 2020
“What does BERT dream of? A visual investigation of nightmares in Sesame Street”, 2020-01-13 ( ; backlinks; similar)
“Deep Bayesian Reward Learning from Preferences”, 2019
“Deep Bayesian Reward Learning from Preferences”, 2019-12-10 (similar)
“Learning Norms from Stories: A Prior for Value Aligned Agents”, Et Al 2019
“Learning Norms from Stories: A Prior for Value Aligned Agents”, 2019-12-07 ( ; backlinks; similar)
“Learning Human Objectives by Evaluating Hypothetical Behavior”, Et Al 2019
“Learning Human Objectives by Evaluating Hypothetical Behavior”, 2019-12-05 ( ; similar)
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, 2019
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, 2019-12-05 ( ; similar)
“Preference-Based Learning for Exoskeleton Gait Optimization”, Et Al 2019
“Preference-Based Learning for Exoskeleton Gait Optimization”, 2019-09-26 (similar)
“Do Massively Pretrained Language Models Make Better Storytellers?”, Et Al 2019
“Do Massively Pretrained Language Models Make Better Storytellers?”, 2019-09-24 ( ; backlinks; similar)
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Et Al 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior”, 2019-09-19 ( ; similar)
“Fine-Tuning GPT-2 from Human Preferences”, Et Al 2019
“Fine-Tuning GPT-2 from Human Preferences”, 2019-09-19 ( ; backlinks; similar)
“Fine-Tuning Language Models from Human Preferences”, Et Al 2019
“Fine-Tuning Language Models from Human Preferences”, 2019-09-18 ( ; similar)
“Lm-human-preferences”, Et Al 2019
“lm-human-preferences”, 2019-09-14 ( ; backlinks; similar)
“Better Rewards Yield Better Summaries: Learning to Summarise Without References”, Et Al 2019
“Better Rewards Yield Better Summaries: Learning to Summarise Without References”, 2019-09-03 (backlinks; similar)
“Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, Et Al 2019
“Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, 2019-08-04 (similar)
“Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, Et Al 2019
“Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, 2019-06-30 (similar)
“Reward Learning from Human Preferences and Demonstrations in Atari”, Et Al 2018
“Reward learning from human preferences and demonstrations in Atari”, 2018-11-15 (similar)
“StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Et Al 2018
“StreetNet: Preference Learning with Convolutional Neural Network on Urban Crime Perception”, 2018-11-01 ( ; backlinks; similar; bibliography)
“Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, Et Al 2018
“Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, 2018-02-21 ( ; similar)
“Convergence of Value Aggregation for Imitation Learning”, 2018
“Convergence of Value Aggregation for Imitation Learning”, 2018-01-22 (similar)
“A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, 2017
“A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, 2017-12-12 (similar)
“Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, Et Al 2017
“Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, 2017-09-28 (backlinks; similar)
“DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Et Al 2017
“DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, 2017-09-18 ( ; similar)
“Towards Personalized Human AI Interaction—adapting the Behavior of AI Agents Using Neural Signatures of Subjective Interest”, Et Al 2017
“Towards personalized human AI interaction—adapting the behavior of AI agents using neural signatures of subjective interest”, 2017-09-14 ( ; similar)
“Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Et Al 2017
“Learning human behaviors from motion capture by adversarial imitation”, 2017-07-07 (similar)
“Learning from Human Preferences”, Et Al 2017
“Learning from Human Preferences”, 2017-06-13 (backlinks; similar)
“Learning through Human Feedback”, Et Al 2017
“Learning through human feedback”, 2017-06-12 (backlinks; similar)
“Deep Reinforcement Learning from Human Preferences”, Et Al 2017
“Deep reinforcement learning from human preferences”, 2017-06-12 ( ; similar)
“An Invitation to Imitation”, 2015
“An Invitation to Imitation”, 2015-03-14 ( ; backlinks; similar)
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”, 2015
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”, 2015-02-19 ( ; backlinks; similar)
“Bayesian Active Learning for Classification and Preference Learning”, Et Al 2011
“Bayesian Active Learning for Classification and Preference Learning”, 2011-12-24 ( ; backlinks; similar)
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Et Al 2010
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, 2010-11-02 ( ; backlinks; similar)
“Transformers As Variational Autoencoders”
“Transformer-VAE for Program Synthesis”
“Learning through Human Feedback [blog]”
Miscellaneous
-
https://hal.archives-ouvertes.fr/hal-01972948/document#page=2
-
https://www.frontiersin.org/articles/10.3389/frobt.2017.00071/full
-
https://www.lesswrong.com/posts/LpjjWDBXr88gzcYK2/learning-and-manipulating-learning
-
https://www.lesswrong.com/posts/hcrFxeYYfbFrkKQEJ/full-toy-model-for-preference-learning
Link Bibliography
-
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf#Inescapable_wedding_parties
: “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus: -
https://arxiv.org/abs/2210.07792#eleutherai
: “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, : -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, : -
https://openai.com/blog/webgpt/
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman: -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-assisted Question-answering With Human Feedback”, : -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, : -
https://arxiv.org/abs/2109.10862#openai
: “Recursively Summarizing Books With Human Feedback”, Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano: -
https://trajectory-transformer.github.io/
: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Michael Janner, Qiyang Colin Li, Sergey Levine: -
2018-fu.pdf
: “StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Kaiqun Fu, Zhiqian Chen, Chang-Tien Lu: