- See Also
-
Links
- “LLaMA 2: Open Foundation and Fine-Tuned Chat Models”, Touvron et al 2023
- “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Dzieza 2023
- “Microsoft and OpenAI Forge Awkward Partnership As Tech’s New Power Couple: As the Companies Lead the AI Boom, Their Unconventional Arrangement Sometimes Causes Conflict”, Dotan & Seetharaman 2023
- “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
- “The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
- “LIMA: Less Is More for Alignment”, Zhou et al 2023
- “Bits of Grass: Does GPT Already Know How to Write like Whitman?”, Sawicki et al 2023
- “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Kirstain et al 2023
- “A Radical Plan to Make AI Good, Not Evil”, Knight 2023
- “Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
- “Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
- “OpenAI’s Sam Altman Talks ChatGPT And How Artificial General Intelligence Can ‘Break Capitalism’”, Konrad & Cai 2023
- “Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Tiku et al 2023
- “The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
- “Discovering Language Model Behaviors With Model-Written Evaluations”, Perez et al 2022
- “Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
- “Solving Math Word Problems With Process & Outcome-based Feedback”, Uesato et al 2022
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
- “When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
- “Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
- “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Castricato et al 2022
- “Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
- “Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning Using Past Experience”, Abdulhai et al 2022
- “Improved Policy Optimization for Online Imitation Learning”, Lavington et al 2022
- “Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
- “RL With KL Penalties Is Better Viewed As Bayesian Inference”, Korbak et al 2022
- “Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
- “Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-time Planning”, Qi et al 2022
- “Inferring Rewards from Language in Context”, Lin et al 2022
- “SURF: Semi-supervised Reward Learning With Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning”, Park et al 2022
- “InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
- “Safe Deep RL in 3D Environments Using Human Feedback”, Rahtz et al 2022
- “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
- “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
- “WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
- “Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Jacob et al 2021
- “A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
- “Cut the CARP: Fishing for Zero-shot Story Evaluation”, Shahbul et al 2021
- “Recursively Summarizing Books With Human Feedback”, Wu et al 2021
- “B-Pref: Benchmarking Preference-Based Reinforcement Learning”, Lee et al 2021
- “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Janner et al 2021
- “Embracing New Techniques in Deep Learning for Estimating Image Memorability”, Needell & Bainbridge 2021
- “A Survey of Preference-Based Reinforcement Learning Methods”, Wirth et al 2021
- “Learning What To Do by Simulating the Past”, Lindner et al 2021
- “Language Models Have a Moral Dimension”, Schramowski et al 2021
- “Brain-computer Interface for Generating Personally Attractive Images”, Spape et al 2021
- “Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
- “Human-centric Dialog Training via Offline Reinforcement Learning”, Jaques et al 2020
- “Learning to Summarize from Human Feedback”, Stiennon et al 2020
- “Learning Personalized Models of Human Behavior in Chess”, McIlroy-Young et al 2020
- “Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Young et al 2020
- “Active Preference-Based Gaussian Process Regression for Reward Learning”, Bıyık et al 2020
- “Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Brown et al 2020
- “RL Agents Implicitly Learning Human Preferences”, Wichers 2020
- “Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Jeon et al 2020
- “What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
- “Deep Bayesian Reward Learning from Preferences”, Brown & Niekum 2019
- “Learning Norms from Stories: A Prior for Value Aligned Agents”, Frazier et al 2019
- “Learning Human Objectives by Evaluating Hypothetical Behavior”, Reddy et al 2019
- “Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, Schmidhuber 2019
- “Preference-Based Learning for Exoskeleton Gait Optimization”, Tucker et al 2019
- “Do Massively Pretrained Language Models Make Better Storytellers?”, See et al 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
- “Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
- “Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
- “Lm-human-preferences”, Ziegler et al 2019
- “Better Rewards Yield Better Summaries: Learning to Summarise Without References”, Böhm et al 2019
- “Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, Novoseller et al 2019
- “Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, Jaques et al 2019
- “Reward Learning from Human Preferences and Demonstrations in Atari”, Ibarz et al 2018
- “StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Fu et al 2018
- “Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, Zintgraf et al 2018
- “Convergence of Value Aggregation for Imitation Learning”, Cheng & Boots 2018
- “A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, Wu & Lin 2017
- “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, Warnell et al 2017
- “DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Menda et al 2017
- “Towards Personalized Human AI Interaction—adapting the Behavior of AI Agents Using Neural Signatures of Subjective Interest”, Shih et al 2017
- “Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Merel et al 2017
- “Learning from Human Preferences”, Amodei et al 2017
- “Learning through Human Feedback [blog]”, Leike et al 2017
- “Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
- “An Invitation to Imitation”, Bagnell 2015
- “Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
- “Algorithmic and Human Teaching of Sequential Decision Tasks”, Cakmak & Lopes 2012
- “Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
- “DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Ross et al 2010
- “Transformers As Variational Autoencoders”
- “Transformer-VAE for Program Synthesis”
- Sort By Magic
- Miscellaneous
- Link Bibliography
See Also
Links
“LLaMA 2: Open Foundation and Fine-Tuned Chat Models”, Touvron et al 2023
“AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Dzieza 2023
“Microsoft and OpenAI Forge Awkward Partnership As Tech’s New Power Couple: As the Companies Lead the AI Boom, Their Unconventional Arrangement Sometimes Causes Conflict”, Dotan & Seetharaman 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Roger 2023
“Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”
“The False Promise of Imitating Proprietary LLMs”, Gudibande et al 2023
“LIMA: Less Is More for Alignment”, Zhou et al 2023
“Bits of Grass: Does GPT Already Know How to Write like Whitman?”, Sawicki et al 2023
“Bits of Grass: Does GPT already know how to write like Whitman?”
“Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Kirstain et al 2023
“Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”
“A Radical Plan to Make AI Good, Not Evil”, Knight 2023
“Rewarding Chatbots for Real-World Engagement With Millions of Users”, Irvine et al 2023
“Rewarding Chatbots for Real-World Engagement with Millions of Users”
“Use GPT-3 Incorrectly: Reduce Costs 40× and Increase Speed by 5×”, Pullen 2023
“Use GPT-3 incorrectly: reduce costs 40× and increase speed by 5×”
“OpenAI’s Sam Altman Talks ChatGPT And How Artificial General Intelligence Can ‘Break Capitalism’”, Konrad & Cai 2023
“OpenAI’s Sam Altman Talks ChatGPT And How Artificial General Intelligence Can ‘Break Capitalism’”
“Big Tech Was Moving Cautiously on AI. Then Came ChatGPT. Google, Facebook and Microsoft Helped Build the Scaffolding of AI. Smaller Companies Are Taking It to the Masses, Forcing Big Tech to React.”, Tiku et al 2023
“The inside Story of ChatGPT: How OpenAI Founder Sam Altman Built the World’s Hottest Technology With Billions from Microsoft”, Kahn 2023
“Discovering Language Model Behaviors With Model-Written Evaluations”, Perez et al 2022
“Discovering Language Model Behaviors with Model-Written Evaluations”
“Constitutional AI: Harmlessness from AI Feedback”, Bai et al 2022
“Solving Math Word Problems With Process & Outcome-based Feedback”, Uesato et al 2022
“Solving math word problems with process & outcome-based feedback”
“Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus 2022
“When Life Gives You Lemons, Make Cherryade: Converting Feedback from Bad Responses into Good Labels”, Shi et al 2022
“Scaling Laws for Reward Model Overoptimization”, Gao et al 2022
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, Castricato et al 2022
“CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”
“Sparrow: Improving Alignment of Dialogue Agents via Targeted Human Judgements”, Glaese et al 2022
“Sparrow: Improving alignment of dialogue agents via targeted human judgements”
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Ganguli et al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”
“Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning Using Past Experience”, Abdulhai et al 2022
“Basis for Intentions (BASIS): Efficient Inverse Reinforcement Learning using Past Experience”
“Improved Policy Optimization for Online Imitation Learning”, Lavington et al 2022
“Improved Policy Optimization for Online Imitation Learning”
“Quark: Controllable Text Generation With Reinforced Unlearning”, Lu et al 2022
“Quark: Controllable Text Generation with Reinforced Unlearning”
“RL With KL Penalties Is Better Viewed As Bayesian Inference”, Korbak et al 2022
“RL with KL penalties is better viewed as Bayesian inference”
“Housekeep: Tidying Virtual Households Using Commonsense Reasoning”, Kant et al 2022
“Housekeep: Tidying Virtual Households using Commonsense Reasoning”
“Imitating, Fast and Slow: Robust Learning from Demonstrations via Decision-time Planning”, Qi et al 2022
“Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning”
“Inferring Rewards from Language in Context”, Lin et al 2022
“SURF: Semi-supervised Reward Learning With Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning”, Park et al 2022
“InstructGPT: Training Language Models to Follow Instructions With Human Feedback”, Ouyang et al 2022
“InstructGPT: Training language models to follow instructions with human feedback”
“Safe Deep RL in 3D Environments Using Human Feedback”, Rahtz et al 2022
“A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models”, Zhang et al 2022
“A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models”
“WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Hilton et al 2021
“WebGPT: Improving the factual accuracy of language models through web browsing”
“WebGPT: Browser-assisted Question-answering With Human Feedback”, Nakano et al 2021
“WebGPT: Browser-assisted question-answering with human feedback”
“Modeling Strong and Human-Like Gameplay With KL-Regularized Search”, Jacob et al 2021
“Modeling Strong and Human-Like Gameplay with KL-Regularized Search”
“A General Language Assistant As a Laboratory for Alignment”, Askell et al 2021
“A General Language Assistant as a Laboratory for Alignment”
“Cut the CARP: Fishing for Zero-shot Story Evaluation”, Shahbul et al 2021
“Recursively Summarizing Books With Human Feedback”, Wu et al 2021
“B-Pref: Benchmarking Preference-Based Reinforcement Learning”, Lee et al 2021
“B-Pref: Benchmarking Preference-Based Reinforcement Learning”
“Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Janner et al 2021
“Trajectory Transformer: Reinforcement Learning as One Big Sequence Modeling Problem”
“Embracing New Techniques in Deep Learning for Estimating Image Memorability”, Needell & Bainbridge 2021
“Embracing New Techniques in Deep Learning for Estimating Image Memorability”
“A Survey of Preference-Based Reinforcement Learning Methods”, Wirth et al 2021
“A Survey of Preference-Based Reinforcement Learning Methods”
“Learning What To Do by Simulating the Past”, Lindner et al 2021
“Language Models Have a Moral Dimension”, Schramowski et al 2021
“Brain-computer Interface for Generating Personally Attractive Images”, Spape et al 2021
“Brain-computer interface for generating personally attractive images”
“Process for Adapting Language Models to Society (PALMS) With Values-Targeted Datasets”, Solaiman & Dennison 2021
“Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets”
“Human-centric Dialog Training via Offline Reinforcement Learning”, Jaques et al 2020
“Human-centric Dialog Training via Offline Reinforcement Learning”
“Learning to Summarize from Human Feedback”, Stiennon et al 2020
“Learning Personalized Models of Human Behavior in Chess”, McIlroy-Young et al 2020
“Aligning Superhuman AI With Human Behavior: Chess As a Model System”, McIlroy-Young et al 2020
“Aligning Superhuman AI with Human Behavior: Chess as a Model System”
“Active Preference-Based Gaussian Process Regression for Reward Learning”, Bıyık et al 2020
“Active Preference-Based Gaussian Process Regression for Reward Learning”
“Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”, Brown et al 2020
“Bayesian REX: Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences”
“RL Agents Implicitly Learning Human Preferences”, Wichers 2020
“Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Jeon et al 2020
“Reward-rational (implicit) choice: A unifying formalism for reward learning”
“What Does BERT Dream Of? A Visual Investigation of Nightmares in Sesame Street”, Bäuerle & Wexler 2020
“What does BERT dream of? A visual investigation of nightmares in Sesame Street”
“Deep Bayesian Reward Learning from Preferences”, Brown & Niekum 2019
“Learning Norms from Stories: A Prior for Value Aligned Agents”, Frazier et al 2019
“Learning Norms from Stories: A Prior for Value Aligned Agents”
“Learning Human Objectives by Evaluating Hypothetical Behavior”, Reddy et al 2019
“Learning Human Objectives by Evaluating Hypothetical Behavior”
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”, Schmidhuber 2019
“Reinforcement Learning Upside Down: Don’t Predict Rewards—Just Map Them to Actions”
“Preference-Based Learning for Exoskeleton Gait Optimization”, Tucker et al 2019
“Preference-Based Learning for Exoskeleton Gait Optimization”
“Do Massively Pretrained Language Models Make Better Storytellers?”, See et al 2019
“Do Massively Pretrained Language Models Make Better Storytellers?”
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Ziegler et al 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior”
“Fine-Tuning GPT-2 from Human Preferences”, Ziegler et al 2019
“Fine-Tuning Language Models from Human Preferences”, Ziegler et al 2019
“Lm-human-preferences”, Ziegler et al 2019
“Better Rewards Yield Better Summaries: Learning to Summarise Without References”, Böhm et al 2019
“Better Rewards Yield Better Summaries: Learning to Summarise Without References”
“Dueling Posterior Sampling for Preference-Based Reinforcement Learning”, Novoseller et al 2019
“Dueling Posterior Sampling for Preference-Based Reinforcement Learning”
“Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”, Jaques et al 2019
“Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog”
“Reward Learning from Human Preferences and Demonstrations in Atari”, Ibarz et al 2018
“Reward learning from human preferences and demonstrations in Atari”
“StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Fu et al 2018
“StreetNet: Preference Learning with Convolutional Neural Network on Urban Crime Perception”
“Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”, Zintgraf et al 2018
“Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making”
“Convergence of Value Aggregation for Imitation Learning”, Cheng & Boots 2018
“A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”, Wu & Lin 2017
“A Low-Cost Ethics Shaping Approach for Designing Reinforcement Learning Agents”
“Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”, Warnell et al 2017
“Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”
“DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”, Menda et al 2017
“DropoutDAgger: A Bayesian Approach to Safe Imitation Learning”
“Towards Personalized Human AI Interaction—adapting the Behavior of AI Agents Using Neural Signatures of Subjective Interest”, Shih et al 2017
“Learning Human Behaviors from Motion Capture by Adversarial Imitation”, Merel et al 2017
“Learning human behaviors from motion capture by adversarial imitation”
“Learning from Human Preferences”, Amodei et al 2017
“Learning through Human Feedback [blog]”, Leike et al 2017
“Deep Reinforcement Learning from Human Preferences”, Christiano et al 2017
“An Invitation to Imitation”, Bagnell 2015
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”, Maystre & Grossglauser 2015
“Just Sort It! A Simple and Effective Approach to Active Preference Learning”
“Algorithmic and Human Teaching of Sequential Decision Tasks”, Cakmak & Lopes 2012
“Algorithmic and Human Teaching of Sequential Decision Tasks”
“Bayesian Active Learning for Classification and Preference Learning”, Houlsby et al 2011
“Bayesian Active Learning for Classification and Preference Learning”
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”, Ross et al 2010
“DAGGER: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”
“Transformers As Variational Autoencoders”
“Transformer-VAE for Program Synthesis”
Sort By Magic
Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.
Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.
dialog-training
behavior-adaptation
bayesian-reward-learning
interactive-agent-shaping
deep-learning
Miscellaneous
-
https://twitter.com/BlancheMinerva/status/1662521904727756801
-
https://www.astralcodexten.com/p/constitutional-ai-rlhf-on-steroids
-
https://www.frontiersin.org/articles/10.3389/frobt.2017.00071/full
-
https://www.lesswrong.com/posts/LpjjWDBXr88gzcYK2/learning-and-manipulating-learning
-
https://www.lesswrong.com/posts/hcrFxeYYfbFrkKQEJ/full-toy-model-for-preference-learning
-
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf
-
https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research
-
https://www.wired.com/story/confessions-viral-ai-writer-chatgpt/
Link Bibliography
-
https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
: “AI Is a Lot of Work: As the Technology Becomes Ubiquitous, a Vast Tasker Underclass Is Emerging—and Not Going Anywhere”, Josh Dzieza -
https://www.wsj.com/articles/microsoft-and-openai-forge-awkward-partnership-as-techs-new-power-couple-3092de51
: “Microsoft and OpenAI Forge Awkward Partnership As Tech’s New Power Couple: As the Companies Lead the AI Boom, Their Unconventional Arrangement Sometimes Causes Conflict”, Tom Dotan, Deepa Seetharaman -
https://arxiv.org/abs/2306.07567
: “Large Language Models Sometimes Generate Purely Negatively-Reinforced Text”, Fabien Roger -
https://arxiv.org/abs/2305.15717
: “The False Promise of Imitating Proprietary LLMs”, Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, Dawn Song -
https://arxiv.org/abs/2305.11064
: “Bits of Grass: Does GPT Already Know How to Write like Whitman?”, Piotr Sawicki, Marek Grzes, Fabricio Goes, Dan Brown, Max Peeperkorn, Aisha Khatun -
https://arxiv.org/abs/2305.01569
: “Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation”, Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy -
https://www.wired.com/story/anthropic-ai-chatbots-ethics/
: “A Radical Plan to Make AI Good, Not Evil”, Will Knight -
https://www.forbes.com/sites/alexkonrad/2023/02/03/exclusive-openai-sam-altman-chatgpt-agi-google-search/
: “OpenAI’s Sam Altman Talks ChatGPT And How Artificial General Intelligence Can ‘Break Capitalism’”, Alex Konrad, Kenrick Cai -
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf#Inescapable_wedding_parties
: “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus -
https://arxiv.org/abs/2210.10760#openai
: “Scaling Laws for Reward Model Overoptimization”, Leo Gao, John Schulman, Jacob Hilton -
https://arxiv.org/abs/2210.07792#eleutherai
: “CARP: Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning”, -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, -
https://arxiv.org/abs/2205.11275
: “RL With KL Penalties Is Better Viewed As Bayesian Inference”, Tomasz Korbak, Ethan Perez, Christopher L. Buckley -
https://openai.com/research/webgpt
: “WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing”, Jacob Hilton, Suchir Balaji, Reiichiro Nakano, John Schulman -
https://arxiv.org/abs/2112.09332#openai
: “WebGPT: Browser-assisted Question-answering With Human Feedback”, -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, -
https://arxiv.org/abs/2109.10862#openai
: “Recursively Summarizing Books With Human Feedback”, Jeff Wu, Long Ouyang, Daniel M. Ziegler, Nisan Stiennon, Ryan Lowe, Jan Leike, Paul Christiano -
https://trajectory-transformer.github.io/
: “Trajectory Transformer: Reinforcement Learning As One Big Sequence Modeling Problem”, Michael Janner, Qiyang Colin Li, Sergey Levine -
2018-fu.pdf
: “StreetNet: Preference Learning With Convolutional Neural Network on Urban Crime Perception”, Kaiqun Fu, Zhiqian Chen, Chang-Tien Lu