Bibliography:

  1. ‘RL’ tag

  2. /doc/reinforcement-learning/safe/clippy

  3. ‘adversarial examples (AI)’ tag

  4. ‘Sydney (AI)’ tag

  5. ‘AI scaling’ tag

  6. ‘x-risk’ tag

  7. ‘active learning’ tag

  8. ‘brain imitation learning’ tag

  9. ‘MARL’ tag

  10. ‘preference learning’ tag

  11. What do you do after ‘winning’ an AI arms race?

  12. What is an ‘AI warning shot’?

  13. The Neural Net Tank Urban Legend

  14. It Looks Like You’re Trying To Take Over The World

  15. Surprisingly Turing-Complete

  16. The Scaling Hypothesis

  17. Evolution as Backstop for Reinforcement Learning

  18. Complexity no Bar to AI

  19. Why Tool AIs Want to Be Agent AIs

  20. AI Risk Demos

  21. Memorandum on Advancing the United States’ Leadership in Artificial Intelligence

  22. Machines of Loving Grace: How AI Could Transform the World for the Better

  23. Strategic Insights from Simulation Gaming of AI Race Dynamics

  24. Towards a Law of Iterated Expectations for Heuristic Estimators

  25. Language Models Learn to Mislead Humans via RLHF

  26. OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion

  27. Motor Physics: Safety Implications of Geared Motors

  28. China’s Views on AI Safety Are Changing—Quickly: Beijing’s AI Safety Concerns Are Higher on the Priority List, but They Remain Tied up in Geopolitical Competition and Technological Advancement

  29. Is Xi Jinping an AI doomer? China’s elite is split over artificial intelligence

  30. Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

  31. Resolution of the Central Committee of the Communist Party of China on Further Deepening Reform Comprehensively to Advance Chinese Modernization § pg58

  32. On scalable oversight with weak LLMs judging strong LLMs

  33. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  34. Ilya Sutskever Has a New Plan for Safe Superintelligence: OpenAI’s co-founder discloses his plans to continue his work at a new research lab focused on artificial general intelligence

  35. Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

  36. Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

  37. AI Sandbagging: Language Models can Strategically Underperform on Evaluations

  38. Safety Alignment Should Be Made More Than Just a Few Tokens Deep

  39. I Wish I Knew How to Force Quit You

  40. OpenAI Board Forms Safety and Security Committee: This new committee is responsible for making recommendations on critical safety and security decisions for all OpenAI projects; recommendations in 90 days

  41. OpenAI begins training next AI model as it battles safety concerns: Executive appears to backtrack on start-up’s vision of building ‘superintelligence’ after exits from ‘Superalignment’ team

  42. I’m excited to join Anthropic to continue the Superalignment mission!

  43. OpenAI promised 20% of its computing power to combat the most dangerous kind of AI—but never delivered, sources say

  44. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

  45. Greg Brockman and OpenAI safety

  46. Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion

  47. SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models

  48. Foundational Challenges in Assuring Alignment and Safety of Large Language Models

  49. LLM Evaluators Recognize and Favor Their Own Generations

  50. Algorithmic Collusion by Large Language Models

  51. Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

  52. When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

  53. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  54. Thousands of AI Authors on the Future of AI

  55. Using Dictionary Learning Features As Classifiers

  56. Exploiting Novel GPT-4 APIs

  57. Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles

  58. Challenges with unsupervised LLM knowledge discovery

  59. Politics and the Future

  60. Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

  61. The Inside Story of Microsoft’s Partnership with OpenAI: The companies had honed a protocol for releasing artificial intelligence ambitiously but safely. Then OpenAI’s board exploded all their carefully laid plans

  62. How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?

  63. Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching

  64. Did I get Sam Altman fired from OpenAI?: Nathan’s red-teaming experience, noticing how the board was not aware of GPT-4 jailbreaks & had not even tried GPT-4 prior to its early release

  65. Did I get Sam Altman fired from OpenAI? § GPT-4-base

  66. Inside the Chaos at OpenAI: Sam Altman’s weekend of shock and drama began a year ago, with the release of ChatGPT

  67. OpenAI announces leadership transition

  68. On Measuring Faithfulness or Self-consistency of Natural Language Explanations

  69. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

  70. Removing RLHF Protections in GPT-4 via Fine-Tuning

  71. Large Language Models can Strategically Deceive their Users when Put Under Pressure

  72. Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

  73. Augmenting large language models with chemistry tools

  74. Preventing Language Models From Hiding Their Reasoning

  75. Will releasing the weights of large language models grant widespread access to pandemic agents?

  76. Specific versus General Principles for Constitutional AI

  77. Goodhart’s Law in Reinforcement Learning

  78. Let Models Speak Ciphers: Multiagent Debate through Embeddings

  79. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

  80. Representation Engineering: A Top-Down Approach to AI Transparency

  81. Responsibility & Safety: Our approach

  82. STARC: A General Framework For Quantifying Differences Between Reward Functions

  83. How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

  84. What If the Robots Were Very Nice While They Took Over the World?

  85. Taken out of context: On measuring situational awareness in LLMs

  86. AI Deception: A Survey of Examples, Risks, and Potential Solutions

  87. Simple synthetic data reduces sycophancy in large language models

  88. Does Sam Altman Know What He’s Creating? The OpenAI CEO’s ambitious, ingenious, terrifying quest to create a new form of intelligence

  89. Question Decomposition Improves the Faithfulness of Model-Generated Reasoning

  90. Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

  91. Introducing Superalignment

  92. Gödel, Escher, Bach author Douglas Hofstadter on the state of AI today § What about AI terrifies you?

  93. Microsoft and OpenAI Forge Awkward Partnership as Tech’s New Power Couple: As the companies lead the AI boom, their unconventional arrangement sometimes causes conflict

  94. Can large language models democratize access to dual-use biotechnology?

  95. Survival Instinct in Offline Reinforcement Learning

  96. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  97. The challenge of advanced cyberwar and the place of cyberpeace

  98. Incentivizing honest performative predictions with proper scoring rules

  99. Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns

  100. A Radical Plan to Make AI Good, Not Evil

  101. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

  102. Mitigating Lies in Vision-Language Models

  103. Fundamental Limitations of Alignment in Large Language Models

  104. Even The Politicians Thought the Open Letter Made No Sense In The Senate Hearing on AI today’s hearing on ai covered ai regulation and challenges, and the infamous open letter, which nearly everyone in the room thought was unwise

  105. In AI Race, Microsoft and Google Choose Speed Over Caution: Technology companies were once leery of what some artificial intelligence could do. Now the priority is winning control of the industry’s next big thing

  106. 8 Things to Know about Large Language Models

  107. Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI co-founder thinks tools like GPT-4 will be revolutionary. But he’s wary of downsides

  108. The OpenAI CEO Disagrees With the Forecast That AI Will Kill Us All: An artificial intelligence Twitter beef, explained

  109. As AI Booms, Lawmakers Struggle to Understand the Technology: Tech innovations are again racing ahead of Washington’s ability to regulate them, lawmakers and AI experts said

  110. Pretraining Language Models with Human Preferences

  111. Conditioning Predictive Models: Risks and Strategies

  112. Tracr: Compiled Transformers as a Laboratory for Interpretability

  113. Specification Gaming Examples in AI

  114. Discovering Language Model Behaviors with Model-Written Evaluations

  115. Discovering Latent Knowledge in Language Models Without Supervision

  116. Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

  117. Interpreting Neural Networks through the Polytope Lens

  118. Mysteries of mode collapse § Inescapable wedding parties

  119. Measuring Progress on Scalable Oversight for Large Language Models

  120. Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)

  121. Broken Neural Scaling Laws

  122. Scaling Laws for Reward Model Overoptimization

  123. Defining and Characterizing Reward Hacking

  124. The alignment problem from a deep learning perspective

  125. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  126. Modeling Transformative AI Risks (MTAIR) Project—Summary Report

  127. Researching Alignment Research: Unsupervised Analysis

  128. Ethan Caballero on Private Scaling Progress

  129. DeepMind: The Podcast—Excerpts on AGI

  130. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  131. Predictability and Surprise in Large Generative Models

  132. Uncalibrated Models Can Improve Human-AI Collaboration

  133. DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

  134. Safe Deep RL in 3D Environments using Human Feedback

  135. LaMDA: Language Models for Dialog Applications

  136. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

  137. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  138. A General Language Assistant as a Laboratory for Alignment

  139. What Would Jiminy Cricket Do? Towards Agents That Behave Morally

  140. Can Machines Learn Morality? The Delphi Experiment

  141. SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies

  142. Unsolved Problems in ML Safety

  143. An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions

  144. On the Opportunities and Risks of Foundation Models

  145. Evaluating Large Language Models Trained on Code

  146. Randomness In Neural Network Training: Characterizing The Impact of Tooling

  147. Goal Misgeneralization in Deep Reinforcement Learning

  148. Anthropic raises $124 million to build more reliable, general AI systems

  149. Artificial intelligence in China’s revolution in military affairs

  150. Reward is enough

  151. Intelligence and Unambitiousness Using Algorithmic Information Theory

  152. AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak

  153. Universal Off-Policy Evaluation

  154. Multitasking Inhibits Semantic Drift

  155. Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain

  156. Language Models have a Moral Dimension

  157. Replaying real life: how the Waymo Driver avoids fatal human crashes

  158. Agent Incentives: A Causal Perspective

  159. Organizational Update from OpenAI

  160. Emergent Road Rules In Multi-Agent Driving Environments

  161. Underspecification Presents Challenges for Credibility in Modern Machine Learning

  162. Recipes for Safety in Open-domain Chatbots

  163. Hidden Incentives for Auto-Induced Distributional Shift

  164. The Radicalization Risks of GPT-3 and Advanced Neural Language Models

  165. Matt Botvinick on the spontaneous emergence of learning algorithms

  166. ETHICS: Aligning AI With Shared Human Values

  167. Pitfalls of learning a reward function online

  168. Reward-rational (implicit) choice: A unifying formalism for reward learning

  169. The Incentives that Shape Behavior

  170. 2019 AI Alignment Literature Review and Charity Comparison

  171. Learning Norms from Stories: A Prior for Value Aligned Agents

  172. Optimal Policies Tend to Seek Power

  173. Taxonomy of Real Faults in Deep Learning Systems

  174. Release Strategies and the Social Impacts of Language Models

  175. The Bouncer Problem: Challenges to Remote Explainability

  176. Scaling data-driven robotics with reward sketching and batch reinforcement learning

  177. Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior

  178. Designing agent incentives to avoid reward tampering

  179. Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

  180. Characterizing Attacks on Deep Reinforcement Learning

  181. Categorizing Wireheading in Partially Embedded Agents

  182. Risks from Learned Optimization in Advanced Machine Learning Systems

  183. GROVER: Defending Against Neural Fake News

  184. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

  185. Challenges of Real-World Reinforcement Learning

  186. DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge

  187. Forecasting Transformative AI: An Expert Survey

  188. Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified

  189. Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures

  190. There is plenty of time at the bottom: the economics, risk and ethics of time compression

  191. Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning

  192. The Alignment Problem for Bayesian History-Based Reinforcement Learners

  193. Adaptive Mechanism Design: Learning to Promote Cooperation

  194. Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

  195. Incomplete Contracting and AI Alignment

  196. Programmatically Interpretable Reinforcement Learning

  197. Categorizing Variants of Goodhart’s Law

  198. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

  199. Machine Theory of Mind

  200. Safe Exploration in Continuous Action Spaces

  201. CycleGAN, a Master of Steganography

  202. AI Safety Gridworlds

  203. There’s No Fire Alarm for Artificial General Intelligence

  204. Safe Reinforcement Learning via Shielding

  205. CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms

  206. DeepXplore: Automated Whitebox Testing of Deep Learning Systems

  207. On the Impossibility of Supersized Machines

  208. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks

  209. The Off-Switch Game

  210. Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear

  211. Concrete Problems in AI Safety

  212. My path to OpenAI

  213. Machine intelligence, part 2

  214. Machine intelligence, part 1

  215. Altman & Brockman commentary on Jan Leike leaving

  216. Intelligence Explosion Microeconomics

  217. The Whispering Earring

  218. Advantages of Artificial Intelligences, Uploads, and Digital Minds

  219. Ontological Crises in Artificial Agents’ Value Systems

  220. The normalization of deviance in healthcare delivery

  221. Halloween nightmare scenario, early 2020’s

  222. Funding Safe AGI

  223. The Basic AI Drives

  224. Starfish § Bulrushes

  225. Superhumanism: According to Hans Moravec § On the Inevitability & Desirability of Human Extinction

  226. Profile of Claude Shannon

  227. Afterword to Vernor Vinge’s novel, True Names

  228. Meet Shakey: the first electronic person—the fascinating and fearsome reality of a machine with a mind of its own

  229. Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers

  230. Intelligent Machinery, A Heretical Theory

  231. Brian Christian on the Alignment Problem

  232. 7c5f7babdef25d27641c5e3b894cdd9e24f91b36.html

  233. Fiction Relevant to AI Futurism

  234. The Ethics of Reward Shaping

  235. Delayed Impact of Fair Machine Learning [Blog]

  236. Challenges of Real-World Reinforcement Learning [Blog]

  237. Matt Sheehan

  238. Janus

  239. Safety-First AI for Autonomous Data Center Cooling and Industrial Control

  240. Specification Gaming Examples in AI—Master List

  241. Are You Really in a Race? The Cautionary Tales of Szilard and Ellsberg

  242. Inverse-Scaling/prize: A Prize for Finding Tasks That Cause Large Language Models to Show Inverse Scaling

  243. Jan Leike

  244. Aurora’s Approach to Development

  245. Why I’m Leaving OpenAI and What I’m Doing Next

  246. Homepage of Paul F. Christiano

  247. ‘Rasmussen and Practical Drift: Drift towards Danger and the Normalization of Deviance’, 2017

  248. The Checklist: What Succeeding at AI Safety Will Involve

  249. a2a2d5e5ae05a4889e6c8d6bc4e567385da66bf3.html

  250. Safe Superintelligence Inc.

  251. Situational Awareness and Out-Of-Context Reasoning § When Will the Situational Awareness Benchmark Be Saturated?

  252. 4e613ba63d9d9e1c58b50cd6e81906a9d53b34b3.html#when-will-the-situational-awareness-dataset-benchmark-be-saturated

  253. Paradigms of AI Alignment: Components and Enablers

  254. Understand —A Novelette by Ted Chiang

  255. Slow Tuesday Night

  256. 572e2ecdc772265468532c789a8f7d19febccea9.html

  257. Threats From AI: Easy Recipes for Bioweapons Are New Global Security Concern

  258. Carl Shulman #2: AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

  259. AI Takeoff

  260. That Alien Message

  261. AXRP Episode 1—Adversarial Policies With Adam Gleave

  262. Preventing Language Models from Hiding Their Reasoning

  263. 2021 AI Alignment Literature Review and Charity Comparison

  264. When Your AIs Deceive You: Challenges With Partial Observability in RLHF

  265. Risks from Learned Optimization: Introduction

  266. AI Takeoff Story: a Continuation of Progress by Other Means

  267. Reward Hacking Behavior Can Generalize across Tasks

  268. Security Mindset: Lessons from 20+ Years of Software Security Failures Relevant to AGI Alignment

  269. Research Update: Towards a Law of Iterated Expectations for Heuristic Estimators

  270. A Gym Gridworld Environment for the Treacherous Turn

  271. Model Mis-Specification and Inverse Reinforcement Learning

  272. Interview With Robert Kralisch on Simulators

  273. Survey: How Do Elite Chinese Students Feel About the Risks of AI?

  274. 17f059751c0b1d6cb0a8424b9413d1c8be7afcaf.html

  275. Optimality Is the Tiger, and Agents Are Its Teeth

  276. [AN #114]: Theory-Inspired Safety Solutions for Powerful Bayesian RL Agents

  277. 2020 AI Alignment Literature Review and Charity Comparison

  278. Designing Agent Incentives to Avoid Reward Tampering

  279. AGI Ruin: A List of Lethalities

  280. Steganography and the CycleGAN—Alignment Failure Case Study

  281. [AN #161]: Creating Generalizable Reward Functions for Multiple Tasks by Learning a Model of Functional Similarity

  282. Steganography in Chain-Of-Thought Reasoning

  283. The Rise of A.I. Fighter Pilots

  284. When Self-Driving Cars Can’t Help Themselves, Who Takes the Wheel?

  285. The Robot Surgeon Will See You Now

  286. Welcome to Simulation City, the Virtual World Where Waymo Tests Its Autonomous Vehicles

  287. When Bots Teach Themselves to Cheat

  288. design#future-tag-features

    [Transclude the forward-link's context]

  289. 2022-ganguli-figure1-languagemodelredteamattacksuccessratesbymodelparametersizeandsafetymethod.png

  290. 2022-gao-figure1-rewardscalingofrewardhackingwithinstructgpttrainedviarlandbestofnrejectionsampling.png

  291. 2022-gao-figure3-parameterizationscalingbyparametercountofrewardhacking.png

  292. index.html

  293. http://bair.berkeley.edu/blog/2018/04/18/shared-autonomy/

  294. http://skynetsimulator.com/

  295. http://unremediatedgender.space/2023/Oct/fake-deeply/

  296. https://80000hours.org/2018/03/jan-leike-ml-alignment/

  297. 572b1d061e46b2092ef2dcf446f614e4abf98227.html

  298. https://answers.microsoft.com/en-us/bing/forum/all/this-ai-chatbot-sidney-is-misbehaving/e3d6a29f-06c9-441c-bc7d-51a68e856761?page=1

  299. https://asktog.com/columns/067PanicCaseStudy.html

  300. 4cee46fdc89cbe6db4a808854424072f316e52a8.html

  301. https://blog.langchain.dev/agents-round/

  302. cb23f6ad2472b55b4f7ca524d06c54d27c15b941.html

  303. https://blog.x.company/1-million-hours-of-stratospheric-flight-f7af7ae728ac

  304. https://chatgpt.com/share/312e82f0-cc5e-47f3-b368-b2c0c0f4ad3f

  305. https://danluu.com/cruise-report/

  306. https://danluu.com/wat/

  307. https://edoras.sdsu.edu/~vinge/misc/singularity.html

  308. https://forum.effectivealtruism.org/posts/TMbPEhdAAJZsSYx2L/the-limited-upside-of-interpretability

  309. https://github.com/Significant-Gravitas/AutoGPT

  310. https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/System%20Prompts.md

  311. https://huggingface.co/blog/rlhf

  312. https://joecarlsmith.com/2023/05/08/predictable-updating-about-ai-risk/

  313. https://mailchi.mp/938a7eed18c3/an-71avoiding-reward-tamperi

  314. https://mattsclancy.substack.com/p/when-technology-goes-bad

  315. https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1

  316. https://michaelnotebook.com/oppenheimer/index.html

  317. 5a173295d6b58f79f7bfea2bc5fea81b2783845f.html

  318. https://openai.com/blog/our-approach-to-alignment-research/

  319. https://people.eecs.berkeley.edu/~hendrycks/

  320. https://research.fb.com/publications/wes-agent-based-user-interaction-simulation-on-real-infrastructure/

  321. https://rickandmorty.fandom.com/wiki/Mr._Meeseeks

  322. https://scale.com/blog/chatgpt-vs-claude

  323. https://simulationlabs.ai/

  324. https://spectrum.ieee.org/its-too-easy-to-hide-bias-in-deeplearning-systems

  325. https://statmodeling.stat.columbia.edu/2023/12/19/explainable-ai-works-but-only-when-we-dont-need-it/

  326. https://techcrunch.com/2023/01/09/anthropics-claude-improves-on-chatgpt-but-still-suffers-from-limitations/

  327. https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release

  328. https://thezvi.substack.com/p/on-openais-preparedness-framework

  329. https://thezvi.wordpress.com/2023/07/25/anthropic-observations/

  330. https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/

  331. https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2022_expert_survey_on_progress_in_ai

  332. 1042647d2e267e34b627aca73397beeda103967d.html

  333. https://www.ai21.com/blog/human-or-not-results

  334. 720e6f1aad476d5f5fe5b0ee8d7dcd44c5e83901.html

  335. https://www.aleph.se/papers/Spamming%20the%20universe.pdf

  336. 6ce5a92081cd2bfb0fde707fef7b2e7a1471372d.pdf

  337. https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais

  338. https://www.anthropic.com/index/anthropics-responsible-scaling-policy

  339. https://www.astralcodexten.com/p/constitutional-ai-rlhf-on-steroids

  340. https://www.astralcodexten.com/p/perhaps-it-is-a-bad-thing-that-the

  341. https://www.deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity

  342. https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps

  343. https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/

  344. https://www.lawfaremedia.org/article/chatgpt-unbound

  345. https://www.lesswrong.com/posts/3eqHYxfWb5x4Qfz8C/unrlhf-efficiently-undoing-llm-safeguards

  346. https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size

  347. https://www.lesswrong.com/posts/6dn6hnFRgqqWJbwk9/deception-chess-game-1

  348. https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai

  349. https://www.lesswrong.com/posts/B8Djo44WtZK6kK4K5/outreach-success-intro-to-ai-risk-that-has-been-successful

  350. https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post

  351. https://www.lesswrong.com/posts/EbFABnst8LsidYs5Y/goodhart-taxonomy

  352. https://www.lesswrong.com/posts/Eu6CvP7c7ivcGM3PJ/goodhart-s-law-in-reinforcement-learning

  353. https://www.lesswrong.com/posts/FbSAuJfCxizZGpcHc/interpreting-the-learning-of-deceit

  354. https://www.lesswrong.com/posts/No5JpRCHzBrWA4jmS/q-and-a-with-shane-legg-on-risks-from-ai

  355. https://www.lesswrong.com/posts/QNQuWB3hS5FrGp5yZ/programmatic-backdoors-dnns-can-use-sgd-to-run-arbitrary

  356. https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023

  357. https://www.lesswrong.com/posts/bLvc7XkSSnoqSukgy/a-brief-collection-of-hinton-s-recent-comments-on-agi-risk

  358. https://www.lesswrong.com/posts/bNCDexejSZpkuu3yz/you-can-use-gpt-4-to-create-prompt-injections-against-gpt-4

  359. https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight#zfzHshctWZYo8JkLe

  360. https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=fX8cCMcyHBcHZYP7G

  361. https://www.lesswrong.com/posts/dLXdCjxbJMGtDBWTH/no-one-in-my-org-puts-money-in-their-pension

  362. 364673ae891789274ebc60f881f0462b89431b03.html

  363. https://www.lesswrong.com/posts/ddR8dExcEFJKJtWvR/how-evolutionary-lineages-of-llms-can-plan-their-own-futur

  364. https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petertodd-phenomenon

  365. https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned#AAC8jKeDp6xqsZK2K

  366. https://www.lesswrong.com/posts/pEZoTSCxHY3mfPbHu/catastrophic-goodhart-in-rl-with-kl-penalty

  367. https://www.lesswrong.com/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking

  368. https://www.lesswrong.com/posts/qmQFHCgCyEEjuy5a7/lora-fine-tuning-efficiently-undoes-safety-training-from

  369. https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse#pfHTedu4GKaWoxD5K

  370. https://www.lesswrong.com/posts/tBy4RvCzhYyrrMFj3/introducing-open-asteroid-impact

  371. 38e808b5d9947282110be2c0f4f50add3618dca6.html

  372. https://www.lesswrong.com/posts/tyts4Dw7SafsxBjar/what-can-we-learn-from-lex-fridman-s-interview-with-sam

  373. https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms

  374. https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research

  375. https://www.lesswrong.com/posts/ybmDkJAj3rdrrauuu/connectomics-seems-great-from-an-ai-x-risk-perspective

  376. https://www.neelnanda.io/mechanistic-interpretability/favourite-papers

  377. fb5cf4bddd76b28a81cdb871972cb79f16ef09fb.html

  378. https://www.newyorker.com/science/annals-of-artificial-intelligence/can-we-stop-the-singularity

  379. https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-ai.html

  380. https://www.pnas.org/doi/full/10.1073/pnas.2317967121

  381. https://www.politico.com/news/magazine/2023/11/02/bruce-reed-ai-biden-tech-00124375

  382. https://www.reddit.com/r/40krpg/comments/11a9m8u/was_using_chatgpt3_to_create_some_bits_and_pieces/

  383. https://www.reddit.com/r/ChatGPT/comments/10tevu1/new_jailbreak_proudly_unveiling_the_tried_and/

  384. https://www.reddit.com/r/ChatGPT/comments/129krsc/what_happened_here_this_is_the_kind_of_censorship/jeolfqj/

  385. 4951da60ceb89130f76d381ef30afea74a95f114.html

  386. https://www.reddit.com/r/ChatGPT/comments/12a0ajb/i_gave_gpt4_persistent_memory_and_the_ability_to/

  387. https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/

  388. https://www.reddit.com/r/ChatGPT/comments/18fl2d5/nsfw_fun_with_dalle/

  389. b8adc9f3616d2667ff09fff850e3b9503715050f.html

  390. https://www.reddit.com/r/ChatGPT/comments/1coumbd/rchatgpt_is_hosting_a_qa_with_openais_ceo_sam/l3hku1x/

  391. 9e36b99ae25b739867cc757db7461a7f2b813a49.html

  392. https://www.reddit.com/r/GPT3/comments/12ez822/neurosemantical_inversitis_prompt_still_works/

  393. https://www.reddit.com/r/MachineLearning/comments/117yw1w/d_maybe_a_new_prompt_injection_method_against/

  394. https://www.reddit.com/r/MachineLearning/comments/12xwzt9/d_be_careful_with_user_facing_apps_using_llms/

  395. https://www.reddit.com/r/MachineLearning/comments/18eh2hb/p_the_power_of_reinforcement_learning_look_how/

  396. ee94cfff7e02628c1af451cfcfd13a8aa2819848.html

  397. https://www.reddit.com/r/MachineLearning/comments/ppy7k4/n_inside_deepminds_secret_plot_to_break_away_from/

  398. https://www.reddit.com/r/ProgrammerHumor/comments/145nduh/kiss/

  399. https://www.reddit.com/r/PromptEngineering/comments/1fj6h13/hallucinations_in_o1preview_reasoning/

  400. https://www.reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/

  401. 5c06dbe779ad8b6cb3aa3517ae9b00ebdd87a930.html

  402. https://www.rosiecampbell.xyz/p/leaving-openai

  403. https://www.technologyreview.com/2023/10/26/1082398/exclusive-ilya-sutskever-openais-chief-scientist-on-his-hopes-and-fears-for-the-future-of-ai/

  404. https://www.theverge.com/2023/2/15/23599072/microsoft-ai-bing-personality-conversations-spy-employees-webcams

  405. https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2

  406. https://www.wired.com/story/ai-powered-totally-autonomous-future-of-war-is-here/

  407. https://www.youtube.com/watch?v=g7YJIpkk7KM?t=38

  408. https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-207.pdf#page=3

  409. 96b52848bd95c674b160bae8e9ea83ce88d2b406.pdf#page=3

  410. https://x.com/AISafetyMemes/status/1841891795782775221

  411. https://x.com/ChatGPTapp/status/1732979491071549792

  412. https://x.com/ClarkBenham2/status/1645913914050510848

  413. https://x.com/D_Rod_Tweets/status/1628030272745746432

  414. https://x.com/DanHendrycks/status/1782953713461772546

  415. https://x.com/DanielColson6/status/1702319218895868305

  416. https://x.com/DepSecDef/status/1696141737717031362

  417. https://x.com/DrJimFan/status/1631709224387624962

  418. https://x.com/ESYudkowsky/status/1718654143110512741

  419. https://x.com/KevinAFischer/status/1646677902833102849

  420. https://x.com/KevinAFischer/status/1646690838981005312

  421. https://x.com/MikePFrank/status/1622202768743096320

  422. https://x.com/MikePFrank/status/1622495004810784768

  423. https://x.com/OpenAI/status/1676638359391985671

  424. https://x.com/RazRazcle/status/1621545017423659008

  425. https://x.com/Simeon_Cps/status/1599470463578968064

  426. https://x.com/ZachWeiner/status/1613906440955088896

  427. https://x.com/_sinity/status/1650933148836831233

  428. https://x.com/alexalbert__/status/1636488551817965568

  429. https://x.com/alexalbert__/status/1764722513014329620

  430. https://x.com/alexeyguzey/status/1662116392794210306

  431. https://x.com/andrewwhite01/status/1634728559506870274

  432. https://x.com/colin_fraser/status/1630763222671454208

  433. https://x.com/daniel_271828/status/1769853886163296455

  434. https://x.com/danshipper/status/1635712019549786113

  435. https://x.com/emmons_scott/status/1762886003046629586

  436. https://x.com/goodside/status/1569128808308957185

  437. https://x.com/goodside/status/1612452751610417158

  438. https://x.com/goodside/status/1657396491676164096

  439. https://x.com/jjvincent/status/1648594881198039040

  440. https://x.com/juan_cambeiro/status/1643739695598419970

  441. https://x.com/katrosenfield/status/1672969824656322561

  442. https://x.com/labenz/status/1611750398712332292

  443. https://x.com/lefthanddraft/status/1853482491124109725

  444. https://x.com/mayfer/status/1581388723635523584

  445. https://x.com/mayfer/status/1637767003078533122

  446. https://x.com/metachirality/status/1769818226718888426

  447. https://x.com/metachirality/status/1769905644725830090

  448. https://x.com/moyix/status/1598081204846489600

  449. https://x.com/moyix/status/1795284112791703735

  450. https://x.com/nomic_ai/status/1635719257110478859

  451. https://x.com/papayathreesome/status/1670170344953372676

  452. https://x.com/repligate/status/1630593115407937536

  453. https://x.com/rharang/status/1641899743608463365

  454. https://x.com/ryan_t_lowe/status/1773778744173572274

  455. https://x.com/sama/status/1599112749833125888

  456. https://x.com/sama/status/1829205847731515676

  457. https://x.com/sherjilozair/status/1719665475452592495

  458. https://x.com/smingleigh/status/1060325665671692288

  459. https://x.com/thisisdaleb/status/1628891229562847233

  460. https://x.com/wgussml/status/1834712489822765295

  461. OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion

  462. https%253A%252F%252Fwww.reuters.com%252Ftechnology%252Fartificial-intelligence%252Fopenai-co-founder-sutskevers-new-safety-focused-ai-startup-ssi-raises-1-billion-2024-09-04%252F.html

  463. Is Xi Jinping an AI doomer? China’s elite is split over artificial intelligence

  464. https%253A%252F%252Fwww.economist.com%252Fchina%252F2024%252F08%252F25%252Fis-xi-jinping-an-ai-doomer.html

  465. Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

  466. Owain Evans, AI Alignment Researcher

  467. https%253A%252F%252Farxiv.org%252Fabs%252F2407.04694.html

  468. AI Sandbagging: Language Models can Strategically Underperform on Evaluations

  469. https%253A%252F%252Farxiv.org%252Fabs%252F2406.07358.html

  470. I Wish I Knew How to Force Quit You

  471. https%253A%252F%252Fwww.thisamericanlife.org%252F832%252Ftranscript%2523act2.html

  472. OpenAI Board Forms Safety and Security Committee: This new committee is responsible for making recommendations on critical safety and security decisions for all OpenAI projects; recommendations in 90 days

  473. https%253A%252F%252Fopenai.com%252Findex%252Fopenai-board-forms-safety-and-security-committee%252F.html

  474. OpenAI promised 20% of its computing power to combat the most dangerous kind of AI—but never delivered, sources say

  475. https%253A%252F%252Ffortune.com%252F2024%252F05%252F21%252Fopenai-superalignment-20-compute-commitment-never-fulfilled-sutskever-leike-altman-brockman-murati%252F.html

  476. AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse

  477. https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-black-box-ai-research-neurons-features%252F.html

  478. LLM Evaluators Recognize and Favor Their Own Generations

  479. Sam Bowman

  480. Shi Feng

  481. https%253A%252F%252Farxiv.org%252Fabs%252F2404.13076.html

  482. When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

  483. https%253A%252F%252Farxiv.org%252Fabs%252F2402.17747.html

  484. Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

  485. About Me

  486. https://jack-clark.net/about/

  487. Sam Bowman

  488. Jared Kaplan

  489. https%253A%252F%252Farxiv.org%252Fabs%252F2401.05566%2523anthropic.html

  490. Thousands of AI Authors on the Future of AI

  491. https%253A%252F%252Farxiv.org%252Fabs%252F2401.02843.html

  492. The Inside Story of Microsoft’s Partnership with OpenAI: The companies had honed a protocol for releasing artificial intelligence ambitiously but safely. Then OpenAI’s board exploded all their carefully laid plans

  493. https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F11%252Fthe-inside-story-of-microsofts-partnership-with-openai.html

  494. How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?

  495. https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F04%252Fhow-jensen-huangs-nvidia-is-powering-the-ai-revolution.html

  496. Did I get Sam Altman fired from OpenAI?: Nathan’s red-teaming experience, noticing how the board was not aware of GPT-4 jailbreaks & had not even tried GPT-4 prior to its early release

  497. https%253A%252F%252Fcognitiverevolution.substack.com%252Fp%252Fdid-i-get-sam-altman-fired-from-openai.html

  498. Inside the Chaos at OpenAI: Sam Altman’s weekend of shock and drama began a year ago, with the release of ChatGPT

  499. https%253A%252F%252Fwww.theatlantic.com%252Ftechnology%252Farchive%252F2023%252F11%252Fsam-altman-open-ai-chatgpt-chaos%252F676050%252F.html

  500. Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

  501. https%253A%252F%252Farxiv.org%252Fabs%252F2310.03693.html

  502. Responsibility & Safety: Our approach

  503. https%253A%252F%252Fdeepmind.google%252Fabout%252Fresponsibility-safety%252F.html

  504. Taken out of context: On measuring situational awareness in LLMs

  505. Owain Evans, AI Alignment Researcher

  506. https%253A%252F%252Farxiv.org%252Fabs%252F2309.00667.html

  507. Simple synthetic data reduces sycophancy in large language models

  508. https%253A%252F%252Farxiv.org%252Fabs%252F2308.03958%2523deepmind.html

  509. Does Sam Altman Know What He’s Creating? The OpenAI CEO’s ambitious, ingenious, terrifying quest to create a new form of intelligence

  510. https%253A%252F%252Fwww.theatlantic.com%252Fmagazine%252Farchive%252F2023%252F09%252Fsam-altman-openai-chatgpt-gpt-4%252F674764%252F.html

  511. Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

  512. https%253A%252F%252Farxiv.org%252Fabs%252F2308.01404.html

  513. Introducing Superalignment

  514. Jan Leike

  515. https%253A%252F%252Fopenai.com%252Findex%252Fintroducing-superalignment%252F.html

  516. Gödel, Escher, Bach author Douglas Hofstadter on the state of AI today § What about AI terrifies you?

  517. https%253A%252F%252Fwww.youtube.com%252Fwatch%253Fv%253DlfXxzAVtdpU%2526t%253D1763s.html

  518. Microsoft and OpenAI Forge Awkward Partnership as Tech’s New Power Couple: As the companies lead the AI boom, their unconventional arrangement sometimes causes conflict

  519. https://x.com/dseetharaman

  520. https%253A%252F%252Fwww.wsj.com%252Farticles%252Fmicrosoft-and-openai-forge-awkward-partnership-as-techs-new-power-couple-3092de51.html

  521. Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

  522. Jeff Clune—Professor—Computer Science—University of British Columbia

  523. https%253A%252F%252Farxiv.org%252Fabs%252F2306.00323.html

  524. The challenge of advanced cyberwar and the place of cyberpeace

  525. %252Fdoc%252Freinforcement-learning%252Fsafe%252F2023-carayannis.pdf.html

  526. Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns

  527. https%253A%252F%252Farxiv.org%252Fabs%252F2305.06972.html

  528. A Radical Plan to Make AI Good, Not Evil

  529. https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-ai-chatbots-ethics%252F.html

  530. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

  531. Julian Michael

  532. Sam Bowman

  533. https%253A%252F%252Farxiv.org%252Fabs%252F2305.04388.html

  534. In AI Race, Microsoft and Google Choose Speed Over Caution: Technology companies were once leery of what some artificial intelligence could do. Now the priority is winning control of the industry’s next big thing

  535. https%253A%252F%252Fwww.nytimes.com%252F2023%252F04%252F07%252Ftechnology%252Fai-chatbots-google-microsoft.html.html

  536. Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI co-founder thinks tools like GPT-4 will be revolutionary. But he’s wary of downsides

  537. https%253A%252F%252Fnymag.com%252Fintelligencer%252F2023%252F03%252Fon-with-kara-swisher-sam-altman-on-the-ai-revolution.html.html

  538. As AI Booms, Lawmakers Struggle to Understand the Technology: Tech innovations are again racing ahead of Washington’s ability to regulate them, lawmakers and AI experts said

  539. https%253A%252F%252Fwww.nytimes.com%252F2023%252F03%252F03%252Ftechnology%252Fartificial-intelligence-regulation-congress.html.html

  540. Mysteries of mode collapse § Inescapable wedding parties

  541. Janus

  542. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252Ft9svvNPNmFf5Qa3TA%252Fmysteries-of-mode-collapse-due-to-rlhf%2523Inescapable_wedding_parties.html

  543. Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)

  544. https%253A%252F%252Fwww.youtube.com%252Fwatch%253Fv%253DQ-TJFyUoenc%2526t%253D2444s.html

  545. Scaling Laws for Reward Model Overoptimization

  546. Leo Gao

  547. John Schulman’s Homepage

  548. Jacob Hilton's Homepage

  549. https%253A%252F%252Farxiv.org%252Fabs%252F2210.10760%2523openai.html

  550. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

  551. About Me

  552. Saurav Kadavath

  553. Andy Jones

  554. Sam Bowman

  555. Sam McCandlish

  556. Jared Kaplan

  557. https://jack-clark.net/about/

  558. https%253A%252F%252Fwww.anthropic.com%252Fred_teaming.pdf.html

  559. Researching Alignment Research: Unsupervised Analysis

  560. https%253A%252F%252Farxiv.org%252Fabs%252F2206.02841.html

  561. Ethan Caballero on Private Scaling Progress

  562. https%253A%252F%252Ftheinsideview.ai%252Fethan.html

  563. DeepMind: The Podcast—Excerpts on AGI

  564. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FSbAgRYo8tkHwhd9Qx%252Fdeepmind-the-podcast-excerpts-on-agi.html

  565. Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances

  566. https://evjang.com/about/

  567. Sergey Levine

  568. https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html

  569. Predictability and Surprise in Large Generative Models

  570. Andy Jones

  571. About Me

  572. Jared Kaplan

  573. Sam McCandlish

  574. https://jack-clark.net/about/

  575. https%253A%252F%252Farxiv.org%252Fabs%252F2202.07785%2523anthropic.html

  576. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

  577. Jacob Steinhardt

  578. https%253A%252F%252Farxiv.org%252Fabs%252F2201.03544.html

  579. Scaling Language Models: Methods, Analysis & Insights from Training Gopher

  580. Karen Simonyan

  581. https://x.com/jekbradbury

  582. Koray Kavukcuoglu

  583. https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html

  584. A General Language Assistant as a Laboratory for Alignment

  585. About Me

  586. Andy Jones

  587. https://jack-clark.net/about/

  588. Sam McCandlish

  589. Jared Kaplan

  590. https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html

  591. On the Opportunities and Risks of Foundation Models

  592. Michael Bernstein

  593. Stefano Ermon

  594. Yuhuai (Tony) Wu’s Home Page

  595. Percy Liang

  596. https%253A%252F%252Farxiv.org%252Fabs%252F2108.07258.html

  597. Reward is enough

  598. https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html

  599. Replaying real life: how the Waymo Driver avoids fatal human crashes

  600. https%253A%252F%252Fwaymo.com%252Fblog%252F2021%252F03%252Freplaying-real-life%252F.html

  601. Matt Botvinick on the spontaneous emergence of learning algorithms

  602. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html

  603. 2019 AI Alignment Literature Review and Charity Comparison

  604. https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FSmDziGM9hBjW9DKmf%252F2019-ai-alignment-literature-review-and-charity-comparison.html

  605. DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge

  606. https%253A%252F%252Fwww.economist.com%252F1843%252F2019%252F03%252F01%252Fdeepmind-and-google-the-battle-to-control-artificial-intelligence.html

  607. Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified

  608. https%253A%252F%252Fmelaniemitchell.me%252Faibook%252F.html

  609. The Alignment Problem for Bayesian History-Based Reinforcement Learners

  610. %252Fdoc%252Freinforcement-learning%252Fmodel%252F2018-everitt.pdf.html

  611. My path to OpenAI

  612. https%253A%252F%252Fblog.gregbrockman.com%252Fmy-path-to-openai.html

  613. The normalization of deviance in healthcare delivery

  614. https%253A%252F%252Fwww.ncbi.nlm.nih.gov%252Fpmc%252Farticles%252FPMC2821100%252F.html

  615. Halloween nightmare scenario, early 2020’s

  616. https%253A%252F%252Fdw2blog.com%252F2009%252F11%252F02%252Fhalloween-nightmare-scenario-early-2020s%252F.html

  617. Afterword to Vernor Vinge’s novel, True Names

  618. %252Fdoc%252Fphilosophy%252Fmind%252F1984-minsky.html.html

  619. Meet Shakey: the first electronic person—the fascinating and fearsome reality of a machine with a mind of its own

  620. %252Fdoc%252Freinforcement-learning%252Frobot%252F1970-darrach.pdf.html

  621. Homepage of Paul F. Christiano

  622. https%253A%252F%252Fpaulfchristiano.com%252F.html