Memorandum on Advancing the United States’ Leadership in Artificial Intelligence
Machines of Loving Grace: How AI Could Transform the World for the Better
Strategic Insights from Simulation Gaming of AI Race Dynamics
Towards a Law of Iterated Expectations for Heuristic Estimators
OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion
China’s Views on AI Safety Are Changing—Quickly: Beijing’s AI Safety Concerns Are Higher on the Priority List, but They Remain Tied up in Geopolitical Competition and Technological Advancement
Is Xi Jinping an AI doomer? China’s elite is split over artificial intelligence
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Resolution of the Central Committee of the Communist Party of China on Further Deepening Reform Comprehensively to Advance Chinese Modernization § pg58
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
Ilya Sutskever Has a New Plan for Safe Superintelligence: OpenAI’s co-founder discloses his plans to continue his work at a new research lab focused on artificial general intelligence
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
OpenAI Board Forms Safety and Security Committee: This new committee is responsible for making recommendations on critical safety and security decisions for all OpenAI projects; recommendations in 90 days
OpenAI begins training next AI model as it battles safety concerns: Executive appears to backtrack on start-up’s vision of building ‘superintelligence’ after exits from ‘Superalignment’ team
I’m excited to join Anthropic to continue the Superalignment mission!
OpenAI promised 20% of its computing power to combat the most dangerous kind of AI—but never delivered, sources say
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse
Earnings call: Tesla Discusses Q1 2024 Challenges and AI Expansion
SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Comparison of Waymo Rider-Only Crash Data to Human Benchmarks at 7.1 Million Miles
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
The Inside Story of Microsoft’s Partnership with OpenAI: The companies had honed a protocol for releasing artificial intelligence ambitiously but safely. Then OpenAI’s board exploded all their carefully laid plans
How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching
Did I get Sam Altman fired from OpenAI?: Nathan’s red-teaming experience, noticing how the board was not aware of GPT-4 jailbreaks & had not even tried GPT-4 prior to its early release
Inside the Chaos at OpenAI: Sam Altman’s weekend of shock and drama began a year ago, with the release of ChatGPT
On Measuring Faithfulness or Self-consistency of Natural Language Explanations
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Large Language Models can Strategically Deceive their Users when Put Under Pressure
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation
Will releasing the weights of large language models grant widespread access to pandemic agents?
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Representation Engineering: A Top-Down Approach to AI Transparency
STARC: A General Framework For Quantifying Differences Between Reward Functions
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
What If the Robots Were Very Nice While They Took Over the World?
Taken out of context: On measuring situational awareness in LLMs
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simple synthetic data reduces sycophancy in large language models
Does Sam Altman Know What He’s Creating? The OpenAI CEO’s ambitious, ingenious, terrifying quest to create a new form of intelligence
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
Gödel, Escher, Bach author Douglas Hofstadter on the state of AI today § What about AI terrifies you?
Microsoft and OpenAI Forge Awkward Partnership as Tech’s New Power Couple: As the companies lead the AI boom, their unconventional arrangement sometimes causes conflict
Can large language models democratize access to dual-use biotechnology?
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
The challenge of advanced cyberwar and the place of cyberpeace
Incentivizing honest performative predictions with proper scoring rules
Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns
Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Fundamental Limitations of Alignment in Large Language Models
Even The Politicians Thought the Open Letter Made No Sense In The Senate Hearing on AI today’s hearing on ai covered ai regulation and challenges, and the infamous open letter, which nearly everyone in the room thought was unwise
In AI Race, Microsoft and Google Choose Speed Over Caution: Technology companies were once leery of what some artificial intelligence could do. Now the priority is winning control of the industry’s next big thing
Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI co-founder thinks tools like GPT-4 will be revolutionary. But he’s wary of downsides
The OpenAI CEO Disagrees With the Forecast That AI Will Kill Us All: An artificial intelligence Twitter beef, explained
As AI Booms, Lawmakers Struggle to Understand the Technology: Tech innovations are again racing ahead of Washington’s ability to regulate them, lawmakers and AI experts said
Tracr: Compiled Transformers as a Laboratory for Interpretability
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Latent Knowledge in Language Models Without Supervision
Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula
Measuring Progress on Scalable Oversight for Large Language Models
Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Modeling Transformative AI Risks (MTAIR) Project—Summary Report
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
A General Language Assistant as a Laboratory for Alignment
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
SafetyNet: Safe planning for real-world self-driving vehicles using machine-learned policies
An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions
Randomness In Neural Network Training: Characterizing The Impact of Tooling
Anthropic raises $124 million to build more reliable, general AI systems
Artificial intelligence in China’s revolution in military affairs
Intelligence and Unambitiousness Using Algorithmic Information Theory
AI Dungeon Public Disclosure Vulnerability Report—GraphQL Unpublished Adventure Data Leak
Waymo Simulated Driving Behavior in Reconstructed Fatal Crashes within an Autonomous Vehicle Operating Domain
Replaying real life: how the Waymo Driver avoids fatal human crashes
Underspecification Presents Challenges for Credibility in Modern Machine Learning
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Matt Botvinick on the spontaneous emergence of learning algorithms
Reward-rational (implicit) choice: A unifying formalism for reward learning
2019 AI Alignment Literature Review and Charity Comparison
Learning Norms from Stories: A Prior for Value Aligned Agents
Release Strategies and the Social Impacts of Language Models
Scaling data-driven robotics with reward sketching and batch reinforcement learning
Fine-Tuning GPT-2 from Human Preferences § Bugs can optimize for bad behavior
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Risks from Learned Optimization in Advanced Machine Learning Systems
AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge
Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified
Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
There is plenty of time at the bottom: the economics, risk and ethics of time compression
Better Safe than Sorry: Evidence Accumulation Allows for Safe Reinforcement Learning
The Alignment Problem for Bayesian History-Based Reinforcement Learners
Adaptive Mechanism Design: Learning to Promote Cooperation
Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Combating Reinforcement Learning’s Sisyphean Curse with Intrinsic Fear
Advantages of Artificial Intelligences, Uploads, and Digital Minds
Superhumanism: According to Hans Moravec § On the Inevitability & Desirability of Human Extinction
Meet Shakey: the first electronic person—the fascinating and fearsome reality of a machine with a mind of its own
Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers
Safety-First AI for Autonomous Data Center Cooling and Industrial Control
Are You Really in a Race? The Cautionary Tales of Szilard and Ellsberg
Inverse-Scaling/prize: A Prize for Finding Tasks That Cause Large Language Models to Show Inverse Scaling
‘Rasmussen and Practical Drift: Drift towards Danger and the Normalization of Deviance’, 2017
Situational Awareness and Out-Of-Context Reasoning § When Will the Situational Awareness Benchmark Be Saturated?
4e613ba63d9d9e1c58b50cd6e81906a9d53b34b3.html#when-will-the-situational-awareness-dataset-benchmark-be-saturated
Threats From AI: Easy Recipes for Bioweapons Are New Global Security Concern
Carl Shulman #2: AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future
2021 AI Alignment Literature Review and Charity Comparison
When Your AIs Deceive You: Challenges With Partial Observability in RLHF
AI Takeoff Story: a Continuation of Progress by Other Means
Security Mindset: Lessons from 20+ Years of Software Security Failures Relevant to AGI Alignment
Research Update: Towards a Law of Iterated Expectations for Heuristic Estimators
Model Mis-Specification and Inverse Reinforcement Learning
Survey: How Do Elite Chinese Students Feel About the Risks of AI?
[AN #114]: Theory-Inspired Safety Solutions for Powerful Bayesian RL Agents
2020 AI Alignment Literature Review and Charity Comparison
Steganography and the CycleGAN—Alignment Failure Case Study
[AN #161]: Creating Generalizable Reward Functions for Multiple Tasks by Learning a Model of Functional Similarity
When Self-Driving Cars Can’t Help Themselves, Who Takes the Wheel?
Welcome to Simulation City, the Virtual World Where Waymo Tests Its Autonomous Vehicles
2022-ganguli-figure1-languagemodelredteamattacksuccessratesbymodelparametersizeandsafetymethod.png
2022-gao-figure1-rewardscalingofrewardhackingwithinstructgpttrainedviarlandbestofnrejectionsampling.png
2022-gao-figure3-parameterizationscalingbyparametercountofrewardhacking.png
https://answers.microsoft.com/en-us/bing/forum/all/this-ai-chatbot-sidney-is-misbehaving/e3d6a29f-06c9-441c-bc7d-51a68e856761?page=1
https://blog.x.company/1-million-hours-of-stratospheric-flight-f7af7ae728ac
https://chatgpt.com/share/312e82f0-cc5e-47f3-b368-b2c0c0f4ad3f
https://forum.effectivealtruism.org/posts/TMbPEhdAAJZsSYx2L/the-limited-upside-of-interpretability
https://github.com/spdustin/ChatGPT-AutoExpert/blob/main/System%20Prompts.md
https://joecarlsmith.com/2023/05/08/predictable-updating-about-ai-risk/
https://mailchi.mp/938a7eed18c3/an-71avoiding-reward-tamperi
https://mattsclancy.substack.com/p/when-technology-goes-bad
https://medium.com/@deepmindsafetyresearch/building-safe-artificial-intelligence-52f5f75058f1
https://openai.com/blog/our-approach-to-alignment-research/
https://research.fb.com/publications/wes-agent-based-user-interaction-simulation-on-real-infrastructure/
https://spectrum.ieee.org/its-too-easy-to-hide-bias-in-deeplearning-systems
https://statmodeling.stat.columbia.edu/2023/12/19/explainable-ai-works-but-only-when-we-dont-need-it/
https://techcrunch.com/2023/01/09/anthropics-claude-improves-on-chatgpt-but-still-suffers-from-limitations/
https://thezvi.substack.com/p/jailbreaking-the-chatgpt-on-release
https://thezvi.substack.com/p/on-openais-preparedness-framework
https://thezvi.wordpress.com/2023/07/25/anthropic-observations/
https://web.archive.org/web/20240102075620/https://www.jailbreakchat.com/
https://wiki.aiimpacts.org/ai_timelines/predictions_of_human-level_ai_timelines/ai_timeline_surveys/2022_expert_survey_on_progress_in_ai
https://www.alignmentforum.org/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
https://www.anthropic.com/index/anthropics-responsible-scaling-policy
https://www.astralcodexten.com/p/constitutional-ai-rlhf-on-steroids
https://www.astralcodexten.com/p/perhaps-it-is-a-bad-thing-that-the
https://www.deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI-ingenuity
https://www.dwarkeshpatel.com/p/demis-hassabis#%C2%A7timestamps
https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/
https://www.lesswrong.com/posts/3eqHYxfWb5x4Qfz8C/unrlhf-efficiently-undoing-llm-safeguards
https://www.lesswrong.com/posts/3ou8DayvDXxufkjHD/openai-api-base-models-are-not-sycophantic-at-any-size
https://www.lesswrong.com/posts/6dn6hnFRgqqWJbwk9/deception-chess-game-1
https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai
https://www.lesswrong.com/posts/B8Djo44WtZK6kK4K5/outreach-success-intro-to-ai-risk-that-has-been-successful
https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
https://www.lesswrong.com/posts/EbFABnst8LsidYs5Y/goodhart-taxonomy
https://www.lesswrong.com/posts/Eu6CvP7c7ivcGM3PJ/goodhart-s-law-in-reinforcement-learning
https://www.lesswrong.com/posts/FbSAuJfCxizZGpcHc/interpreting-the-learning-of-deceit
https://www.lesswrong.com/posts/No5JpRCHzBrWA4jmS/q-and-a-with-shane-legg-on-risks-from-ai
https://www.lesswrong.com/posts/QNQuWB3hS5FrGp5yZ/programmatic-backdoors-dnns-can-use-sgd-to-run-arbitrary
https://www.lesswrong.com/posts/ZwshvqiqCvXPsZEct/the-learning-theoretic-agenda-status-2023
https://www.lesswrong.com/posts/bLvc7XkSSnoqSukgy/a-brief-collection-of-hinton-s-recent-comments-on-agi-risk
https://www.lesswrong.com/posts/bNCDexejSZpkuu3yz/you-can-use-gpt-4-to-create-prompt-injections-against-gpt-4
https://www.lesswrong.com/posts/bwyKCQD7PFWKhELMr/by-default-gpts-think-in-plain-sight#zfzHshctWZYo8JkLe
https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=fX8cCMcyHBcHZYP7G
https://www.lesswrong.com/posts/dLXdCjxbJMGtDBWTH/no-one-in-my-org-puts-money-in-their-pension
https://www.lesswrong.com/posts/ddR8dExcEFJKJtWvR/how-evolutionary-lineages-of-llms-can-plan-their-own-futur
https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petertodd-phenomenon
https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned#AAC8jKeDp6xqsZK2K
https://www.lesswrong.com/posts/pEZoTSCxHY3mfPbHu/catastrophic-goodhart-in-rl-with-kl-penalty
https://www.lesswrong.com/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking
https://www.lesswrong.com/posts/qmQFHCgCyEEjuy5a7/lora-fine-tuning-efficiently-undoes-safety-training-from
https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse#pfHTedu4GKaWoxD5K
https://www.lesswrong.com/posts/tBy4RvCzhYyrrMFj3/introducing-open-asteroid-impact
https://www.lesswrong.com/posts/tyts4Dw7SafsxBjar/what-can-we-learn-from-lex-fridman-s-interview-with-sam
https://www.lesswrong.com/posts/ukTLGe5CQq9w8FMne/inducing-unprompted-misalignment-in-llms
https://www.lesswrong.com/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research
https://www.lesswrong.com/posts/ybmDkJAj3rdrrauuu/connectomics-seems-great-from-an-ai-x-risk-perspective
https://www.neelnanda.io/mechanistic-interpretability/favourite-papers
https://www.newyorker.com/science/annals-of-artificial-intelligence/can-we-stop-the-singularity
https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-ai.html
https://www.politico.com/news/magazine/2023/11/02/bruce-reed-ai-biden-tech-00124375
https://www.reddit.com/r/40krpg/comments/11a9m8u/was_using_chatgpt3_to_create_some_bits_and_pieces/
https://www.reddit.com/r/ChatGPT/comments/10tevu1/new_jailbreak_proudly_unveiling_the_tried_and/
https://www.reddit.com/r/ChatGPT/comments/129krsc/what_happened_here_this_is_the_kind_of_censorship/jeolfqj/
https://www.reddit.com/r/ChatGPT/comments/12a0ajb/i_gave_gpt4_persistent_memory_and_the_ability_to/
https://www.reddit.com/r/ChatGPT/comments/15y4mqx/i_asked_chatgpt_to_maximize_its_censorship/
https://www.reddit.com/r/ChatGPT/comments/18fl2d5/nsfw_fun_with_dalle/
https://www.reddit.com/r/ChatGPT/comments/1coumbd/rchatgpt_is_hosting_a_qa_with_openais_ceo_sam/l3hku1x/
https://www.reddit.com/r/GPT3/comments/12ez822/neurosemantical_inversitis_prompt_still_works/
https://www.reddit.com/r/MachineLearning/comments/117yw1w/d_maybe_a_new_prompt_injection_method_against/
https://www.reddit.com/r/MachineLearning/comments/12xwzt9/d_be_careful_with_user_facing_apps_using_llms/
https://www.reddit.com/r/MachineLearning/comments/18eh2hb/p_the_power_of_reinforcement_learning_look_how/
https://www.reddit.com/r/MachineLearning/comments/ppy7k4/n_inside_deepminds_secret_plot_to_break_away_from/
https://www.reddit.com/r/ProgrammerHumor/comments/145nduh/kiss/
https://www.reddit.com/r/PromptEngineering/comments/1fj6h13/hallucinations_in_o1preview_reasoning/
https://www.reddit.com/r/bing/comments/110eagl/the_customer_service_of_the_new_bing_chat_is/
https://www.technologyreview.com/2023/10/26/1082398/exclusive-ilya-sutskever-openais-chief-scientist-on-his-hopes-and-fears-for-the-future-of-ai/
https://www.theverge.com/2023/2/15/23599072/microsoft-ai-bing-personality-conversations-spy-employees-webcams
https://www.vox.com/future-perfect/23794855/anthropic-ai-openai-claude-2
https://www.wired.com/story/ai-powered-totally-autonomous-future-of-war-is-here/
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-207.pdf#page=3
OpenAI co-founder Sutskever’s new safety-focused AI startup SSI raises $1 billion
https%253A%252F%252Fwww.reuters.com%252Ftechnology%252Fartificial-intelligence%252Fopenai-co-founder-sutskevers-new-safety-focused-ai-startup-ssi-raises-1-billion-2024-09-04%252F.html
Is Xi Jinping an AI doomer? China’s elite is split over artificial intelligence
https%253A%252F%252Fwww.economist.com%252Fchina%252F2024%252F08%252F25%252Fis-xi-jinping-an-ai-doomer.html
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
https%253A%252F%252Fwww.thisamericanlife.org%252F832%252Ftranscript%2523act2.html
OpenAI Board Forms Safety and Security Committee: This new committee is responsible for making recommendations on critical safety and security decisions for all OpenAI projects; recommendations in 90 days
https%253A%252F%252Fopenai.com%252Findex%252Fopenai-board-forms-safety-and-security-committee%252F.html
OpenAI promised 20% of its computing power to combat the most dangerous kind of AI—but never delivered, sources say
https%253A%252F%252Ffortune.com%252F2024%252F05%252F21%252Fopenai-superalignment-20-compute-commitment-never-fulfilled-sutskever-leike-altman-brockman-murati%252F.html
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse
https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-black-box-ai-research-neurons-features%252F.html
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
https%253A%252F%252Farxiv.org%252Fabs%252F2401.05566%2523anthropic.html
The Inside Story of Microsoft’s Partnership with OpenAI: The companies had honed a protocol for releasing artificial intelligence ambitiously but safely. Then OpenAI’s board exploded all their carefully laid plans
https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F11%252Fthe-inside-story-of-microsofts-partnership-with-openai.html
How Jensen Huang’s Nvidia Is Powering the AI Revolution: The company’s CEO bet it all on a new kind of chip. Now that Nvidia is one of the biggest companies in the world, what will he do next?
https%253A%252F%252Fwww.newyorker.com%252Fmagazine%252F2023%252F12%252F04%252Fhow-jensen-huangs-nvidia-is-powering-the-ai-revolution.html
Did I get Sam Altman fired from OpenAI?: Nathan’s red-teaming experience, noticing how the board was not aware of GPT-4 jailbreaks & had not even tried GPT-4 prior to its early release
https%253A%252F%252Fcognitiverevolution.substack.com%252Fp%252Fdid-i-get-sam-altman-fired-from-openai.html
Inside the Chaos at OpenAI: Sam Altman’s weekend of shock and drama began a year ago, with the release of ChatGPT
https%253A%252F%252Fwww.theatlantic.com%252Ftechnology%252Farchive%252F2023%252F11%252Fsam-altman-open-ai-chatgpt-chaos%252F676050%252F.html
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
https%253A%252F%252Fdeepmind.google%252Fabout%252Fresponsibility-safety%252F.html
Taken out of context: On measuring situational awareness in LLMs
Simple synthetic data reduces sycophancy in large language models
https%253A%252F%252Farxiv.org%252Fabs%252F2308.03958%2523deepmind.html
Does Sam Altman Know What He’s Creating? The OpenAI CEO’s ambitious, ingenious, terrifying quest to create a new form of intelligence
https%253A%252F%252Fwww.theatlantic.com%252Fmagazine%252Farchive%252F2023%252F09%252Fsam-altman-openai-chatgpt-gpt-4%252F674764%252F.html
Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models
https%253A%252F%252Fopenai.com%252Findex%252Fintroducing-superalignment%252F.html
Gödel, Escher, Bach author Douglas Hofstadter on the state of AI today § What about AI terrifies you?
https%253A%252F%252Fwww.youtube.com%252Fwatch%253Fv%253DlfXxzAVtdpU%2526t%253D1763s.html
Microsoft and OpenAI Forge Awkward Partnership as Tech’s New Power Couple: As the companies lead the AI boom, their unconventional arrangement sometimes causes conflict
https%253A%252F%252Fwww.wsj.com%252Farticles%252Fmicrosoft-and-openai-forge-awkward-partnership-as-techs-new-power-couple-3092de51.html
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Jeff Clune—Professor—Computer Science—University of British Columbia
The challenge of advanced cyberwar and the place of cyberpeace
%252Fdoc%252Freinforcement-learning%252Fsafe%252F2023-carayannis.pdf.html
Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns
https%253A%252F%252Fwww.wired.com%252Fstory%252Fanthropic-ai-chatbots-ethics%252F.html
Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
In AI Race, Microsoft and Google Choose Speed Over Caution: Technology companies were once leery of what some artificial intelligence could do. Now the priority is winning control of the industry’s next big thing
https%253A%252F%252Fwww.nytimes.com%252F2023%252F04%252F07%252Ftechnology%252Fai-chatbots-google-microsoft.html.html
Sam Altman on What Makes Him ‘Super Nervous’ About AI: The OpenAI co-founder thinks tools like GPT-4 will be revolutionary. But he’s wary of downsides
https%253A%252F%252Fnymag.com%252Fintelligencer%252F2023%252F03%252Fon-with-kara-swisher-sam-altman-on-the-ai-revolution.html.html
As AI Booms, Lawmakers Struggle to Understand the Technology: Tech innovations are again racing ahead of Washington’s ability to regulate them, lawmakers and AI experts said
https%253A%252F%252Fwww.nytimes.com%252F2023%252F03%252F03%252Ftechnology%252Fartificial-intelligence-regulation-congress.html.html
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252Ft9svvNPNmFf5Qa3TA%252Fmysteries-of-mode-collapse-due-to-rlhf%2523Inescapable_wedding_parties.html
Increments Podcast: #45—4 Central Fallacies of AI Research (with Melanie Mitchell)
https%253A%252F%252Fwww.youtube.com%252Fwatch%253Fv%253DQ-TJFyUoenc%2526t%253D2444s.html
https%253A%252F%252Farxiv.org%252Fabs%252F2210.10760%2523openai.html
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
https%253A%252F%252Fwww.anthropic.com%252Fred_teaming.pdf.html
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FSbAgRYo8tkHwhd9Qx%252Fdeepmind-the-podcast-excerpts-on-agi.html
Do As I Can, Not As I Say (SayCan): Grounding Language in Robotic Affordances
https%253A%252F%252Farxiv.org%252Fabs%252F2204.01691%2523google.html
https%253A%252F%252Farxiv.org%252Fabs%252F2202.07785%2523anthropic.html
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
https%253A%252F%252Farxiv.org%252Fabs%252F2112.11446%2523deepmind.html
A General Language Assistant as a Laboratory for Alignment
https%253A%252F%252Farxiv.org%252Fabs%252F2112.00861%2523anthropic.html
https%253A%252F%252Fwww.sciencedirect.com%252Fscience%252Farticle%252Fpii%252FS0004370221000862%2523deepmind.html
Replaying real life: how the Waymo Driver avoids fatal human crashes
https%253A%252F%252Fwaymo.com%252Fblog%252F2021%252F03%252Freplaying-real-life%252F.html
Matt Botvinick on the spontaneous emergence of learning algorithms
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FWnqua6eQkewL3bqsF%252Fmatt-botvinick-on-the-spontaneous-emergence-of-learning.html
2019 AI Alignment Literature Review and Charity Comparison
https%253A%252F%252Fwww.lesswrong.com%252Fposts%252FSmDziGM9hBjW9DKmf%252F2019-ai-alignment-literature-review-and-charity-comparison.html
DeepMind and Google: the battle to control artificial intelligence. Demis Hassabis founded a company to build the world’s most powerful AI. Then Google bought him out. Hal Hodson asks who is in charge
https%253A%252F%252Fwww.economist.com%252F1843%252F2019%252F03%252F01%252Fdeepmind-and-google-the-battle-to-control-artificial-intelligence.html
Artificial Intelligence: A Guide for Thinking Humans § Prologue: Terrified
https%253A%252F%252Fmelaniemitchell.me%252Faibook%252F.html
The Alignment Problem for Bayesian History-Based Reinforcement Learners
%252Fdoc%252Freinforcement-learning%252Fmodel%252F2018-everitt.pdf.html
https%253A%252F%252Fblog.gregbrockman.com%252Fmy-path-to-openai.html
https%253A%252F%252Fwww.ncbi.nlm.nih.gov%252Fpmc%252Farticles%252FPMC2821100%252F.html
https%253A%252F%252Fdw2blog.com%252F2009%252F11%252F02%252Fhalloween-nightmare-scenario-early-2020s%252F.html
%252Fdoc%252Fphilosophy%252Fmind%252F1984-minsky.html.html
Meet Shakey: the first electronic person—the fascinating and fearsome reality of a machine with a mind of its own
%252Fdoc%252Freinforcement-learning%252Frobot%252F1970-darrach.pdf.html
Wikipedia Bibliography: