- See Also
-
Links
- “Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI Co-founder Thinks Tools like GPT-4 Will Be Revolutionary. But He’s Wary of Downsides”, 2023
- “As A.I. Booms, Lawmakers Struggle to Understand the Technology: Tech Innovations Are Again Racing ahead of Washington’s Ability to Regulate Them, Lawmakers and A.I. Experts Said”, 2023
- “Tracr: Compiled Transformers As a Laboratory for Interpretability”, Et Al 2023
- “Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, Et Al 2022
- “Interpreting Neural Networks through the Polytope Lens”, Et Al 2022
- “Mysteries of Mode Collapse § Inescapable Wedding Parties”, 2022
- “Measuring Progress on Scalable Oversight for Large Language Models”, Et Al 2022
- “The Alignment Problem from a Deep Learning Perspective”, 2022
- “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
- “Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, Et Al 2022
- “Researching Alignment Research: Unsupervised Analysis”, Et Al 2022
- “Ethan Caballero on Private Scaling Progress”, 2022
- “DeepMind: The Podcast—Excerpts on AGI”, 2022
- “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Et Al 2022
- “It Looks Like You’re Trying To Take Over The World”, 2022
- “Predictability and Surprise in Large Generative Models”, Et Al 2022
- “Uncalibrated Models Can Improve Human-AI Collaboration”, Et Al 2022
- “DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Et Al 2022
- “LaMDA: Language Models for Dialog Applications”, Et Al 2022
- “Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
- “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Et Al 2022
- “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Et Al 2021
- “A General Language Assistant As a Laboratory for Alignment”, Et Al 2021
- “Unsolved Problems in ML Safety”, Et Al 2021
- “SafetyNet: Safe Planning for Real-world Self-driving Vehicles Using Machine-learned Policies”, Et Al 2021
- “An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions”, Et Al 2021
- “On the Opportunities and Risks of Foundation Models”, Et Al 2021
- “Evaluating Large Language Models Trained on Code”, Et Al 2021
- “Randomness In Neural Network Training: Characterizing The Impact of Tooling”, Et Al 2021
- “Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, 2021
- “Goal Misgeneralization in Deep Reinforcement Learning”, Et Al 2021
- “Artificial Intelligence in China’s Revolution in Military Affairs”, 2021
- “Reward Is Enough”, Et Al 2021
- “Intelligence and Unambitiousness Using Algorithmic Information Theory”, Et Al 2021
- “AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak”, AetherDevSec2021
- “Universal Off-Policy Evaluation”, Et Al 2021
- “Multitasking Inhibits Semantic Drift”, Et Al 2021
- “Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, 2021
- “Language Models Have a Moral Dimension”, Et Al 2021
- “Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, Et Al 2021
- “Agent Incentives: A Causal Perspective”, Et Al 2021
- “Organizational Update from OpenAI”, OpenAI 2020
- “Emergent Road Rules In Multi-Agent Driving Environments”, Et Al 2020
- “Recipes for Safety in Open-domain Chatbots”, Et Al 2020
- “The Radicalization Risks of GPT-3 and Advanced Neural Language Models”, 2020
- “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
- “Aligning AI With Shared Human Values”, Et Al 2020
- “The Scaling Hypothesis”, 2020
- “Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Et Al 2020
- “The Incentives That Shape Behaviour”, Et Al 2020
- “2019 AI Alignment Literature Review and Charity Comparison”, 2019
- “Learning Norms from Stories: A Prior for Value Aligned Agents”, Et Al 2019
- “Optimal Policies Tend to Seek Power”, Et Al 2019
- “Taxonomy of Real Faults in Deep Learning Systems”, Et Al 2019
- “Release Strategies and the Social Impacts of Language Models”, Et Al 2019
- “Scaling Data-driven Robotics With Reward Sketching and Batch Reinforcement Learning”, Et Al 2019
- “Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Et Al 2019
- “Designing Agent Incentives to Avoid Reward Tampering”, Et Al 2019
- “Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective”, Et Al 2019
- “Characterizing Attacks on Deep Reinforcement Learning”, Et Al 2019
- “Categorizing Wireheading in Partially Embedded Agents”, Et Al 2019
- “Risks from Learned Optimization in Advanced Machine Learning Systems”, Et Al 2019
- “GROVER: Defending Against Neural Fake News”, Et Al 2019
- “AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
- “Challenges of Real-World Reinforcement Learning”, Dulac-Et Al 2019
- “DeepMind and Google: the Battle to Control Artificial Intelligence. Demis Hassabis Founded a Company to Build the World’s Most Powerful AI. Then Google Bought Him Out. Hal Hodson Asks Who Is in Charge”, 2019
- “Forecasting Transformative AI: An Expert Survey”, Et Al 2019
- “Evolution As Backstop for Reinforcement Learning”, 2018
- “Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures”, Et Al 2018
- “There Is Plenty of Time at the Bottom: the Economics, Risk and Ethics of Time Compression”, 2018
- “Better Safe Than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning”, Et Al 2018
- “The Alignment Problem for Bayesian History-Based Reinforcement Learners”, 2018
- “Adaptive Mechanism Design: Learning to Promote Cooperation”, Et Al 2018
- “Visceral Machines: Risk-Aversion in Reinforcement Learning With Intrinsic Physiological Rewards”, 2018
- “Programmatically Interpretable Reinforcement Learning”, Et Al 2018
- “The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, Et Al 2018
- “Machine Theory of Mind”, Et Al 2018
- “Safe Exploration in Continuous Action Spaces”, Et Al 2018
- “CycleGAN, a Master of Steganography”, Et Al 2017
- “AI Safety Gridworlds”, Et Al 2017
- “There’s No Fire Alarm for Artificial General Intelligence”, 2017
- “Safe Reinforcement Learning via Shielding”, Et Al 2017
- “CAN: Creative Adversarial Networks, Generating”Art” by Learning About Styles and Deviating from Style Norms”, Et Al 2017
- “DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, Et Al 2017
- “On the Impossibility of Supersized Machines”, Et Al 2017
- “Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks”, Et Al 2017
- “AI Risk Demos”, 2016
- “The Off-Switch Game”, Hadfield-Et Al 2016
- “Combating Reinforcement Learning’s Sisyphean Curse With Intrinsic Fear”, Et Al 2016
- “Why Tool AIs Want to Be Agent AIs”, 2016
- “Concrete Problems in AI Safety”, Et Al 2016
- “Complexity No Bar to AI”, 2014
- “Intelligence Explosion Microeconomics”, 2013
- “Surprisingly Turing-Complete”, 2012
- “Advantages of Artificial Intelligences, Uploads, and Digital Minds”, 2012
- “The Neural Net Tank Urban Legend”, 2011
- “Ontological Crises in Artificial Agents’ Value Systems”, 2011
- “The Basic AI Drives”, 2008
- “Homepage of Paul F. Christiano”, 2023
- Miscellaneous
- Link Bibliography
See Also
Links
“Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI Co-founder Thinks Tools like GPT-4 Will Be Revolutionary. But He’s Wary of Downsides”, 2023
“Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI co-founder thinks tools like GPT-4 will be revolutionary. But he’s wary of downsides”, 2023-03-23 ( ; similar; bibliography)
“As A.I. Booms, Lawmakers Struggle to Understand the Technology: Tech Innovations Are Again Racing ahead of Washington’s Ability to Regulate Them, Lawmakers and A.I. Experts Said”, 2023
“As A.I. Booms, Lawmakers Struggle to Understand the Technology: Tech innovations are again racing ahead of Washington’s ability to regulate them, lawmakers and A.I. experts said”, 2023-03-03 ( ; similar; bibliography)
“Tracr: Compiled Transformers As a Laboratory for Interpretability”, Et Al 2023
“Tracr: Compiled Transformers as a Laboratory for Interpretability”, 2023-01-12 ( ; similar)
“Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, Et Al 2022
“Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula”, 2022-12-02 ( ; similar)
“Interpreting Neural Networks through the Polytope Lens”, Et Al 2022
“Interpreting Neural Networks through the Polytope Lens”, 2022-11-22 ( ; similar)
“Mysteries of Mode Collapse § Inescapable Wedding Parties”, 2022
“Mysteries of mode collapse § Inescapable wedding parties”, 2022-11-08 ( ; similar; bibliography)
“Measuring Progress on Scalable Oversight for Large Language Models”, Et Al 2022
“Measuring Progress on Scalable Oversight for Large Language Models”, 2022-11-04 ( ; similar)
“The Alignment Problem from a Deep Learning Perspective”, 2022
“The alignment problem from a deep learning perspective”, 2022-08-30 (similar)
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, Et Al 2022
“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, 2022-08-25 ( ; similar; bibliography)
“Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, Et Al 2022
“Modeling Transformative AI Risks (MTAIR) Project—Summary Report”, 2022-06-19 ( ; similar)
“Researching Alignment Research: Unsupervised Analysis”, Et Al 2022
“Researching Alignment Research: Unsupervised Analysis”, 2022-06-06 (similar; bibliography)
“Ethan Caballero on Private Scaling Progress”, 2022
“Ethan Caballero on Private Scaling Progress”, 2022-05-05 ( ; similar; bibliography)
“DeepMind: The Podcast—Excerpts on AGI”, 2022
“DeepMind: The Podcast—Excerpts on AGI”, 2022-04-07 ( ; similar; bibliography)
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, Et Al 2022
“Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, 2022-04-04 ( ; similar; bibliography)
“It Looks Like You’re Trying To Take Over The World”, 2022
“It Looks Like You’re Trying To Take Over The World”, 2022-03-06 ( ; backlinks; similar; bibliography)
“Predictability and Surprise in Large Generative Models”, Et Al 2022
“Predictability and Surprise in Large Generative Models”, 2022-02-15 ( ; similar; bibliography)
“Uncalibrated Models Can Improve Human-AI Collaboration”, Et Al 2022
“Uncalibrated Models Can Improve Human-AI Collaboration”, 2022-02-12 ( ; similar)
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, Et Al 2022
“DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers”, 2022-02-08 ( ; similar)
“LaMDA: Language Models for Dialog Applications”, Et Al 2022
“LaMDA: Language Models for Dialog Applications”, 2022-01-20 ( ; similar)
“Safe Deep RL in 3D Environments Using Human Feedback”, Et Al 2022
“Safe Deep RL in 3D Environments using Human Feedback”, 2022-01-20 ( ; similar)
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Et Al 2022
“The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, 2022-01-10 ( ; backlinks; similar; bibliography)
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, Et Al 2021
“Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, 2021-12-08 ( ; similar; bibliography)
“A General Language Assistant As a Laboratory for Alignment”, Et Al 2021
“A General Language Assistant as a Laboratory for Alignment”, 2021-12-01 ( ; similar; bibliography)
“Unsolved Problems in ML Safety”, Et Al 2021
“Unsolved Problems in ML Safety”, 2021-09-28 (backlinks; similar)
“SafetyNet: Safe Planning for Real-world Self-driving Vehicles Using Machine-learned Policies”, Et Al 2021
“SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies”, 2021-09-28 (similar)
“An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions”, Et Al 2021
“An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions”, 2021-08-20 ( ; similar)
“On the Opportunities and Risks of Foundation Models”, Et Al 2021
“On the Opportunities and Risks of Foundation Models”, 2021-08-16 ( ; backlinks; similar; bibliography)
“Evaluating Large Language Models Trained on Code”, Et Al 2021
“Evaluating Large Language Models Trained on Code”, 2021-07-07 ( ; similar)
“Randomness In Neural Network Training: Characterizing The Impact of Tooling”, Et Al 2021
“Randomness In Neural Network Training: Characterizing The Impact of Tooling”, 2021-06-22 ( ; similar)
“Anthropic Raises $124 Million to Build More Reliable, General AI Systems”, 2021
“Anthropic raises $124 million to build more reliable, general AI systems”, 2021-05-28 ( ; similar)
“Goal Misgeneralization in Deep Reinforcement Learning”, Et Al 2021
“Goal Misgeneralization in Deep Reinforcement Learning”, 2021-05-28 ( ; backlinks; similar)
“Artificial Intelligence in China’s Revolution in Military Affairs”, 2021
“Artificial intelligence in China’s revolution in military affairs”, 2021-05-25 ( ; similar)
“Reward Is Enough”, Et Al 2021
“Reward is enough”, 2021-05-24 ( ; similar; bibliography)
“Intelligence and Unambitiousness Using Algorithmic Information Theory”, Et Al 2021
“Intelligence and Unambitiousness Using Algorithmic Information Theory”, 2021-05-13 ( ; similar)
“AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak”, AetherDevSec2021
“AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak”, 2021-04-28 ( ; backlinks; similar)
“Universal Off-Policy Evaluation”, Et Al 2021
“Universal Off-Policy Evaluation”, 2021-04-26 (similar)
“Multitasking Inhibits Semantic Drift”, Et Al 2021
“Multitasking Inhibits Semantic Drift”, 2021-04-15 ( ; backlinks; similar)
“Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, 2021
“Replaying real life: how the Waymo Driver avoids fatal human crashes”, 2021-03-08 ( ; backlinks; similar; bibliography)
“Language Models Have a Moral Dimension”, Et Al 2021
“Language Models have a Moral Dimension”, 2021-03-08 ( ; backlinks; similar)
“Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, Et Al 2021
“Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain”, 2021-03-08 ( ; backlinks; similar)
“Agent Incentives: A Causal Perspective”, Et Al 2021
“Agent Incentives: A Causal Perspective”, 2021-02-02 (similar)
“Organizational Update from OpenAI”, OpenAI 2020
“Organizational Update from OpenAI”, 2020-12-29 ( ; similar)
“Emergent Road Rules In Multi-Agent Driving Environments”, Et Al 2020
“Emergent Road Rules In Multi-Agent Driving Environments”, 2020-11-21 ( ; similar)
“Recipes for Safety in Open-domain Chatbots”, Et Al 2020
“Recipes for Safety in Open-domain Chatbots”, 2020-10-14 ( ; similar)
“The Radicalization Risks of GPT-3 and Advanced Neural Language Models”, 2020
“The Radicalization Risks of GPT-3 and Advanced Neural Language Models”, 2020-09-15 ( ; backlinks; similar)
“Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, 2020
“Matt Botvinick on the spontaneous emergence of learning algorithms”, 2020-08-12 ( ; backlinks; similar; bibliography)
“Aligning AI With Shared Human Values”, Et Al 2020
“Aligning AI With Shared Human Values”, 2020-08-05 ( ; backlinks; similar)
“The Scaling Hypothesis”, 2020
“The Scaling Hypothesis”, 2020-05-28 ( ; backlinks; similar; bibliography)
“Reward-rational (implicit) Choice: A Unifying Formalism for Reward Learning”, Et Al 2020
“Reward-rational (implicit) choice: A unifying formalism for reward learning”, 2020-02-12 ( ; similar)
“The Incentives That Shape Behaviour”, Et Al 2020
“The Incentives that Shape Behaviour”, 2020-01-20
“2019 AI Alignment Literature Review and Charity Comparison”, 2019
“2019 AI Alignment Literature Review and Charity Comparison”, 2019-12-18 ( ; similar; bibliography)
“Learning Norms from Stories: A Prior for Value Aligned Agents”, Et Al 2019
“Learning Norms from Stories: A Prior for Value Aligned Agents”, 2019-12-07 ( ; backlinks; similar)
“Optimal Policies Tend to Seek Power”, Et Al 2019
“Optimal Policies Tend to Seek Power”, 2019-12-03 ( ; backlinks; similar)
“Taxonomy of Real Faults in Deep Learning Systems”, Et Al 2019
“Taxonomy of Real Faults in Deep Learning Systems”, 2019-11-07 ( ; backlinks; similar)
“Release Strategies and the Social Impacts of Language Models”, Et Al 2019
“Release Strategies and the Social Impacts of Language Models”, 2019-11-05 ( ; similar)
“Scaling Data-driven Robotics With Reward Sketching and Batch Reinforcement Learning”, Et Al 2019
“Scaling data-driven robotics with reward sketching and batch reinforcement learning”, 2019-09-26 (similar)
“Fine-Tuning GPT-2 from Human Preferences § Bugs Can Optimize for Bad Behavior”, Et Al 2019
“Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior”, 2019-09-19 ( ; similar)
“Designing Agent Incentives to Avoid Reward Tampering”, Et Al 2019
“Designing agent incentives to avoid reward tampering”, 2019-08-14 ( ; backlinks; similar)
“Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective”, Et Al 2019
“Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective”, 2019-08-13 ( ; similar)
“Characterizing Attacks on Deep Reinforcement Learning”, Et Al 2019
“Characterizing Attacks on Deep Reinforcement Learning”, 2019-07-21 (similar)
“Categorizing Wireheading in Partially Embedded Agents”, Et Al 2019
“Categorizing Wireheading in Partially Embedded Agents”, 2019-06-21 (backlinks; similar)
“Risks from Learned Optimization in Advanced Machine Learning Systems”, Et Al 2019
“Risks from Learned Optimization in Advanced Machine Learning Systems”, 2019-06-05 ( ; backlinks; similar)
“GROVER: Defending Against Neural Fake News”, Et Al 2019
“GROVER: Defending Against Neural Fake News”, 2019-05-29 ( ; similar)
“AI-GAs: AI-generating Algorithms, an Alternate Paradigm for Producing General Artificial Intelligence”, 2019
“AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence”, 2019-05-27 ( ; similar)
“Challenges of Real-World Reinforcement Learning”, Dulac-Et Al 2019
“Challenges of Real-World Reinforcement Learning”, 2019-04-29 (similar)
“DeepMind and Google: the Battle to Control Artificial Intelligence. Demis Hassabis Founded a Company to Build the World’s Most Powerful AI. Then Google Bought Him Out. Hal Hodson Asks Who Is in Charge”, 2019
“DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge”, 2019-03-01 ( ; backlinks; similar; bibliography)
“Forecasting Transformative AI: An Expert Survey”, Et Al 2019
“Forecasting Transformative AI: An Expert Survey”, 2019-01-24 ( ; backlinks; similar)
“Evolution As Backstop for Reinforcement Learning”, 2018
“Evolution as Backstop for Reinforcement Learning”, 2018-12-06 ( ; backlinks; similar; bibliography)
“Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures”, Et Al 2018
“Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures”, 2018-12-04 (similar)
“There Is Plenty of Time at the Bottom: the Economics, Risk and Ethics of Time Compression”, 2018
“There is plenty of time at the bottom: the economics, risk and ethics of time compression”, 2018-10-30 ( ; backlinks; similar)
“Better Safe Than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning”, Et Al 2018
“Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning”, 2018-09-24 (similar)
“The Alignment Problem for Bayesian History-Based Reinforcement Learners”, 2018
“The Alignment Problem for Bayesian History-Based Reinforcement Learners”, 2018-06-22 ( ; similar; bibliography)
“Adaptive Mechanism Design: Learning to Promote Cooperation”, Et Al 2018
“Adaptive Mechanism Design: Learning to Promote Cooperation”, 2018-06-11 ( ; similar)
“Visceral Machines: Risk-Aversion in Reinforcement Learning With Intrinsic Physiological Rewards”, 2018
“Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards”, 2018-05-25 ( ; similar)
“Programmatically Interpretable Reinforcement Learning”, Et Al 2018
“Programmatically Interpretable Reinforcement Learning”, 2018-04-06 (similar)
“The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, Et Al 2018
“The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities”, 2018-03-09 ( ; backlinks; similar)
“Machine Theory of Mind”, Et Al 2018
“Machine Theory of Mind”, 2018-02-21 ( ; similar)
“Safe Exploration in Continuous Action Spaces”, Et Al 2018
“Safe Exploration in Continuous Action Spaces”, 2018-01-26 ( ; similar)
“CycleGAN, a Master of Steganography”, Et Al 2017
“CycleGAN, a Master of Steganography”, 2017-12-08 ( ; backlinks; similar)
“AI Safety Gridworlds”, Et Al 2017
“AI Safety Gridworlds”, 2017-11-27 ( ; similar)
“There’s No Fire Alarm for Artificial General Intelligence”, 2017
“There’s No Fire Alarm for Artificial General Intelligence”, 2017-10-13 ( ; backlinks; similar)
“Safe Reinforcement Learning via Shielding”, Et Al 2017
“Safe Reinforcement Learning via Shielding”, 2017-08-29 (similar)
“CAN: Creative Adversarial Networks, Generating”Art” by Learning About Styles and Deviating from Style Norms”, Et Al 2017
“CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms”, 2017-06-21 ( ; backlinks; similar)
“DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, Et Al 2017
“DeepXplore: Automated Whitebox Testing of Deep Learning Systems”, 2017-05-18 ( ; similar)
“On the Impossibility of Supersized Machines”, Et Al 2017
“On the Impossibility of Supersized Machines”, 2017-03-31 ( ; backlinks; similar)
“Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks”, Et Al 2017
“Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks”, 2017-02-03 ( ; similar)
“AI Risk Demos”, 2016
“AI Risk Demos”, 2016-12-23 ( ; bibliography)
“The Off-Switch Game”, Hadfield-Et Al 2016
“The Off-Switch Game”, 2016-11-24 (similar)
“Combating Reinforcement Learning’s Sisyphean Curse With Intrinsic Fear”, Et Al 2016
“Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear”, 2016-11-03 ( ; similar)
“Why Tool AIs Want to Be Agent AIs”, 2016
“Why Tool AIs Want to Be Agent AIs”, 2016-09-07 ( ; backlinks; similar; bibliography)
“Concrete Problems in AI Safety”, Et Al 2016
“Concrete Problems in AI Safety”, 2016-06-21 ( ; backlinks; similar)
“Complexity No Bar to AI”, 2014
“Complexity no Bar to AI”, 2014-06-01 ( ; backlinks; similar; bibliography)
“Intelligence Explosion Microeconomics”, 2013
“Intelligence Explosion Microeconomics”, 2013-09-13 ( ; similar)
“Surprisingly Turing-Complete”, 2012
“Surprisingly Turing-Complete”, 2012-12-09 ( ; backlinks; similar; bibliography)
“Advantages of Artificial Intelligences, Uploads, and Digital Minds”, 2012
“Advantages of Artificial Intelligences, Uploads, and Digital Minds”, 2012 ( ; similar)
“The Neural Net Tank Urban Legend”, 2011
“The Neural Net Tank Urban Legend”, 2011-09-20 ( ; backlinks; similar; bibliography)
“Ontological Crises in Artificial Agents’ Value Systems”, 2011
“Ontological Crises in Artificial Agents’ Value Systems”, 2011-05-19 ( ; backlinks; similar)
“The Basic AI Drives”, 2008
“The Basic AI Drives”, 2008-06-01 ( ; backlinks; similar)
“Homepage of Paul F. Christiano”, 2023
“Homepage of Paul F. Christiano”, ( ; backlinks; similar; bibliography)
Miscellaneous
-
https://80000hours.org/podcast/episodes/brian-christian-the-alignment-problem/
-
https://blog.acolyer.org/2020/01/13/challenges-of-real-world-rl/
-
https://forum.effectivealtruism.org/posts/TMbPEhdAAJZsSYx2L/the-limited-upside-of-interpretability
-
https://mailchi.mp/938a7eed18c3/an-71avoiding-reward-tamperi
-
https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1
-
https://vkrakovna.wordpress.com/2022/06/02/paradigms-of-ai-alignment-components-and-enablers/
-
https://www.deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
-
https://www.newyorker.com/magazine/2022/01/24/the-rise-of-ai-fighter-pilots
-
https://www.wired.com/story/when-bots-teach-themselves-to-cheat/
Link Bibliography
-
https://nymag.com/intelligencer/2023/03/on-with-kara-swisher-sam-altman-on-the-ai-revolution.html
: “Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI Co-founder Thinks Tools like GPT-4 Will Be Revolutionary. But He’s Wary of Downsides”, Kara Swisher: -
https://www.nytimes.com/2023/03/03/technology/artificial-intelligence-regulation-congress.html
: “As A.I. Booms, Lawmakers Struggle to Understand the Technology: Tech Innovations Are Again Racing ahead of Washington’s Ability to Regulate Them, Lawmakers and A.I. Experts Said”, Cecila Kang, Adam Satariano: -
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse-due-to-rlhf#Inescapable_wedding_parties
: “Mysteries of Mode Collapse § Inescapable Wedding Parties”, Janus: -
https://www.anthropic.com/red_teaming.pdf
: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned”, : -
https://arxiv.org/abs/2206.02841
: “Researching Alignment Research: Unsupervised Analysis”, Jan H. Kirchner, Logan Smith, Jacques Thibodeau, Kyle McDonell, Laria Reynolds: -
http://theinsideview.ai/ethan
: “Ethan Caballero on Private Scaling Progress”, Ethan Caballero, Michaël Trazzi: -
https://www.lesswrong.com/posts/SbAgRYo8tkHwhd9Qx/deepmind-the-podcast-excerpts-on-agi
: “DeepMind: The Podcast—Excerpts on AGI”, William Kiely: -
https://arxiv.org/abs/2204.01691#google
: “Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances”, : -
clippy
: “It Looks Like You’re Trying To Take Over The World”, Gwern Branwen: -
https://arxiv.org/abs/2202.07785#anthropic
: “Predictability and Surprise in Large Generative Models”, : -
https://arxiv.org/abs/2201.03544
: “The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models”, Alexander Pan, Kush Bhatia, Jacob Steinhardt: -
https://arxiv.org/abs/2112.11446#deepmind
: “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”, : -
https://arxiv.org/abs/2112.00861#anthropic
: “A General Language Assistant As a Laboratory for Alignment”, : -
https://arxiv.org/abs/2108.07258
: “On the Opportunities and Risks of Foundation Models”, : -
https://www.sciencedirect.com/science/article/pii/S0004370221000862#deepmind
: “Reward Is Enough”, David Silver, Satinder Singh, Doina Precup, Richard S. Sutton: -
https://blog.waymo.com/2021/03/replaying-real-life.html
: “Replaying Real Life: How the Waymo Driver Avoids Fatal Human Crashes”, Waymo: -
https://www.lesswrong.com/posts/Wnqua6eQkewL3bqsF/matt-botvinick-on-the-spontaneous-emergence-of-learning
: “Matt Botvinick on the Spontaneous Emergence of Learning Algorithms”, Adam Scholl: -
scaling-hypothesis
: “The Scaling Hypothesis”, Gwern Branwen: -
https://www.lesswrong.com/posts/SmDziGM9hBjW9DKmf/2019-ai-alignment-literature-review-and-charity-comparison
: “2019 AI Alignment Literature Review and Charity Comparison”, Larks: -
https://www.economist.com/1843/2019/03/01/deepmind-and-google-the-battle-to-control-artificial-intelligence
: “DeepMind and Google: the Battle to Control Artificial Intelligence. Demis Hassabis Founded a Company to Build the World’s Most Powerful AI. Then Google Bought Him Out. Hal Hodson Asks Who Is in Charge”, Hal Hodson: -
backstop
: “Evolution As Backstop for Reinforcement Learning”, Gwern Branwen: -
2018-everitt.pdf
: “The Alignment Problem for Bayesian History-Based Reinforcement Learners”, Tom Everitt, Marcus Hutter: -
mcts-ai
: “AI Risk Demos”, Gwern Branwen: -
tool-ai
: “Why Tool AIs Want to Be Agent AIs”, Gwern Branwen: -
complexity
: “Complexity No Bar to AI”, Gwern Branwen: -
turing-complete
: “Surprisingly Turing-Complete”, Gwern Branwen: -
tank
: “The Neural Net Tank Urban Legend”, Gwern Branwen: -
https://paulfchristiano.com/
: “Homepage of Paul F. Christiano”, Paul F. Christiano: