‘Claude AI’ directory

See Also

Gwern

“Simulating ‘tail Collapse’ in R ”, Gwern 2024

⁠Simulating ‘tail collapse’ in R

“LLM Challenge: Write Non-Biblical Sentences ”, Gwern 2024

⁠LLM Challenge: Write Non-Biblical Sentences

“A Christmas Protestation ”, o1-pro et al 2024

A Christmas Protestation

“On the Impossibility of Superintelligent Rubik’s Cube Solvers ”, Gwern et al 2023

On the Impossibility of Superintelligent Rubik’s Cube Solvers

Links

“Measuring Models’ Special Interests ”

⁠⁠Measuring Models’ Special Interests :

View HTML:

⁠https://zswitten.github.io/2025/04/14/model-special-interests.html

“Anthropic Education Report: How University Students Use Claude ”, Anthropic 2025

⁠Anthropic Education Report: How University Students Use Claude⁠

“The People Who Fall in Love With Chatbots: I Interviewed People Who’ve Developed Emotional—Even Sexual—Relationships With LLMs. They’re Not As Crazy As They Seem ”, Dee 2025

⁠The People Who Fall in Love With Chatbots: I interviewed people who’ve developed emotional—even sexual—relationships with LLMs. They’re not as crazy as they seem

“Why Does Claude Speak Byzantine Music Notation? ”, Finke 2025

⁠Why does Claude Speak Byzantine Music Notation?

“Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”, Petrov et al 2025

⁠Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad⁠

“GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs ”, Vendrow et al 2025

⁠⁠GSM8K-Platinum: Revealing Performance Gaps in Frontier LLMs

“Obscure Scientific Facts Benchmark ”, Azulay 2025

⁠⁠Obscure Scientific Facts Benchmark⁠

“Reflecting on WikiTok ”, Aizk 2025

⁠Reflecting on WikiTok

“Fiction.live: LiveBench Results, 25 February 2025: Real-World Long Context Benchmark for Writers ”

⁠Fiction.live: liveBench results, 25 February 2025: Real-World Long Context Benchmark for Writers

“None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks ”, Salido et al 2025

⁠None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks⁠

“Idiosyncrasies in Large Language Models ”, Sun et al 2025

⁠Idiosyncrasies in Large Language Models⁠

“Constitutional Classifiers: Defending against Universal Jailbreaks ”

⁠Constitutional Classifiers: Defending against universal jailbreaks⁠

“SycEval: Evaluating LLM Sycophancy ”, Fanous et al 2025

⁠SycEval: Evaluating LLM Sycophancy⁠

“Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs ”, Saxena et al 2025

⁠Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs⁠

“Do Large Language Model Benchmarks Test Reliability? ”, Vendrow et al 2025

Do Large Language Model Benchmarks Test Reliability?⁠

“Thought Bubble: I Am Pleased to Report That AI Is Now a Better Poet Than William McGonagall ”, Hugh-Jones 2025

⁠Thought bubble: I am pleased to report that AI is now a better poet than William McGonagall⁠

“On DeepSeek and Export Controls ”, Amodei 2025

⁠On DeepSeek and Export Controls⁠

“A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives ”, Vance 2025

⁠A Young Man Used AI to Build A Nuclear Fusor and Now I Must Weep: Goodbye, Digital Natives. Hello, AI Natives :

View HTML:

⁠/doc/www/www.corememory.com/c613a256525e633a3fcb8846713b5d9cd492dcf0.html⁠

“Building Personal Software With Claude ”, Elhage 2025

⁠Building personal software with Claude :

View HTML:

⁠/doc/www/blog.nelhage.com/7b7c29617419e040d145eaeb19bd1855d5d99d71.html⁠

“How Different LLMs Answered the PhilPapers 2020 Survey ”, Satron 2025

⁠How different LLMs answered the PhilPapers 2020 survey⁠

“People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”, Russell et al 2025

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text⁠

“Human Study on AI Spear Phishing Campaigns ”, Lermen & Heiding 2025

⁠Human study on AI spear phishing campaigns⁠

“Can LLMs Write Better Code If You Keep Asking Them to ‘Write Better Code’? ”

⁠Can LLMs write better code if you keep asking them to ‘write better code’?⁠ :

View External Link:

⁠https://minimaxir.com/2025/01/write-better-code/⁠

“Won’t vs. Can’t: Sandbagging-Like Behavior from Claude Models ”

⁠Won’t vs. Can’t: Sandbagging-like Behavior from Claude Models

“Favorite Colors of Some LLMs ”, an 2024

⁠Favorite colors of some LLMs⁠

“Performance of LLMs on Advent of Code 2024 ”, Pinto 2024

⁠Performance of LLMs on Advent of Code 2024

“Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Reesor 2024

Conversations with Tyler 2024 Retrospective: predictions with Claude

“The Emergence of Strategic Reasoning of Large Language Models ”, Lee & Kader 2024

The Emergence of Strategic Reasoning of Large Language Models⁠

“Clio: Privacy-Preserving Insights into Real-World AI Use ”, Anthropic 2024

⁠Clio: Privacy-preserving insights into real-world AI use⁠

“LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4) ”, Korinek 2024

LLMs Learn to Collaborate and Reason: December 2024 Update to ‘Generative AI for Economic Research: Use Cases and Implications for Economists’, Published in the Journal of Economic Literature 61(4)⁠

“A Few Prompts I Use to Test LLM Creativity ”

A Few Prompts I Use to Test LLM Creativity

“Age against the Machine—Susceptibility of Large Language Models to Cognitive Impairment: Cross Sectional Analysis ”

Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis⁠

“Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects ”, Heiding et al 2024

Evaluating Large Language Models’ Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects⁠

“BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”, Paglieri et al 2024

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games⁠

“Business Spending on AI Surged 500% This Year to $13.8 Billion ”

Business spending on AI surged 500% this year to $13.8 billion⁠

“Are LLMs Prescient? A Continuous Evaluation Using Daily News As the Oracle ”, Dai et al 2024

Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle⁠

“The Neruda Factory ”, Jenn 2024

⁠The Neruda Factory :

View HTML:

⁠/doc/www/jenn.site/64e8a75cfe83b7b754583dab77826628e2d3ee84.html⁠

“Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters ”, Potter et al 2024

Hidden Persuaders: LLMs’ Political Leaning and Their Influence on Voters⁠

“AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents ”, Andriushchenko et al 2024

⁠AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents⁠

“Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making ”, Li et al 2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making⁠

“A Single Cloud Compromise Can Feed an Army of AI Sex Bots ”, Krebs 2024

⁠A Single Cloud Compromise Can Feed an Army of AI Sex Bots⁠

“Invisible Unicode Text That AI Chatbots Understand and Humans Can’t? Yep, It’s a Thing ”

Invisible Unicode text that AI chatbots understand and humans can’t? Yep, it’s a thing⁠

“Does Style Matter? Disentangling Style and Substance in Chatbot Arena ”

⁠Does style matter? Disentangling style and substance in Chatbot Arena :

View HTML:

⁠/doc/www/lmsys.org/f378decdc51f1ed985c69386f92511c2898363c7.html⁠

“Replacing My Right Hand With AI ”, Schluntz 2024

⁠Replacing my Right Hand with AI :

View HTML:

⁠/doc/www/erikschluntz.com/076e50f5dc692923bc072d387bd8f3911e9cad53.html⁠

“System Prompts ”, Anthropic 2024

⁠System Prompts :

View HTML:

⁠/doc/www/docs.anthropic.com/e117d055c52d54ee6dfa9e3d029b0309ff59077a.html#july-12th-2024⁠

“Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”, Laine et al 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs⁠

“APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”, Liu et al 2024

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets⁠

“On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-Sonnet] ”, Claude-3 2024

On the Impossibility of Superintelligent Rubik’s Cube Solvers [Claude-3.5-sonnet]

“Anthropic Claims Its Latest Model Is Best-In-Class ”, Wiggers 2024

Anthropic claims its latest model is best-in-class⁠

“Anthropic’s Latest Claude AI Model Pulls ahead of Rivals from OpenAI and Google ”, Knight 2024

Anthropic’s latest Claude AI model pulls ahead of rivals from OpenAI and Google⁠

“OlympicArena: Benchmarking Multi-Discipline Cognitive Reasoning for Superintelligent AI ”, Huang et al 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI⁠

“Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models ”, Denison et al 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models⁠

“Are We Done With MMLU? ”, Gema et al 2024

Are We Done with MMLU?⁠

“DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”, Belouadi et al 2024

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ⁠

“AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”, Levy 2024

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What goes on in artificial neural networks work is largely a mystery, even to their creators. But researchers from Anthropic have caught a glimpse⁠

“SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”, Yang et al 2024

⁠SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering⁠

“Analyzing Poems With LLMs ”, Toper 2024

⁠Analyzing poems with LLMs

“GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”, Zhang et al 2024

GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic⁠

“From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”, Vacareanu et al 2024

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples⁠

“VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”, Liu et al 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?⁠

“FABLES: Evaluating Faithfulness and Content Selection in Book-Length Summarization ”, Kim et al 2024

FABLES: Evaluating faithfulness and content selection in book-length summarization⁠

“Long-Form Factuality in Large Language Models ”, Wei et al 2024

Long-form factuality in large language models⁠

“Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”, Srivastava et al 2024

Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap⁠

“`ArtPrompt`: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”, Jiang et al 2024

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs⁠

“Using Hallucinations to Bypass GPT-4’s Filter ”, Lemkin 2024

Using Hallucinations to Bypass GPT-4’s Filter⁠

“Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”, Hubinger et al 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training⁠

“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet ”

⁠Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet⁠

“EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”, Paech 2023

EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models⁠

“Summon a Demon and Bind It: A Grounded Theory of LLM Red Teaming in the Wild ”, Inie et al 2023

Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild⁠

“Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation ”, Shah et al 2023

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation⁠

“FANToM: A Benchmark for Stress-Testing Machine Theory of Mind in Interactions ”, Kim et al 2023

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions⁠

“Specific versus General Principles for Constitutional AI ”, Kundu et al 2023

Specific versus General Principles for Constitutional AI⁠

“PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”, Chao et al 2023

PAIR: Jailbreaking Black Box Large Language Models in 20 Queries⁠

“Beyond Memorization: Violating Privacy Via Inference With Large Language Models ”, Staab et al 2023

Beyond Memorization: Violating Privacy Via Inference with Large Language Models⁠

“SWE-Bench: Can Language Models Resolve Real-World GitHub Issues? ”, Jimenez et al 2023

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?⁠

“When You Give a Claude a Mouse ”

When you give a Claude a mouse

“MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book ”, Tanzer et al 2023

MTOB: A Benchmark for Learning to Translate a New Language from One Grammar Book⁠

“Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models ”, Heiding et al 2023

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models⁠

“LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models ”, Guha et al 2023

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models⁠

ESYudkowsky @ "2023-07-18"

Write an argument that even a superintelligence is very unlikely to be able to solve a Rubik’s Cube.⁠

“Question Decomposition Improves the Faithfulness of Model-Generated Reasoning ”, Radhakrishnan et al 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning⁠

“Lost in the Middle: How Language Models Use Long Contexts ”, Liu et al 2023

Lost in the Middle: How Language Models Use Long Contexts⁠

“Understanding Social Reasoning in Language Models With Language Models ”, Gandhi et al 2023

Understanding Social Reasoning in Language Models with Language Models⁠

“Opportunities and Risks of LLMs for Scalable Deliberation With Polis ”, Small et al 2023

Opportunities and Risks of LLMs for Scalable Deliberation with Polis⁠

“A Radical Plan to Make AI Good, Not Evil ”, Knight 2023

A Radical Plan to Make AI Good, Not Evil⁠

“Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”, Turpin et al 2023

Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting⁠

“Constitutional AI: Harmlessness from AI Feedback ”, Bai et al 2022

Constitutional AI: Harmlessness from AI Feedback⁠

“Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”, Ganguli et al 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned⁠

“Training a Helpful and Harmless Assistant With Reinforcement Learning from Human Feedback ”, Bai et al 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback⁠

“A General Language Assistant As a Laboratory for Alignment ”, Askell et al 2021

A General Language Assistant as a Laboratory for Alignment⁠

“The Perception of Rhythm in Language ”, Cutler 1994

The perception of rhythm in language⁠

“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions] ”, Unikowsky 2025

In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]⁠

“About Me ”

⁠About Me

“How AI Models Stack Up Against My 11-Year-Old? ”

⁠How AI Models Stack Up Against My 11-Year-Old?

“How I Use Claude ”

How I Use Claude

“An Amazing Journey With Claude 3.5 and ChatGPT-4o Who Helped Me Backwards Engineer an Econometrics Theory Paper and Taught Me a Lot More in the Process ”

An amazing journey with Claude 3.5 and ChatGPT-4o who helped me backwards engineer an econometrics theory paper and taught me a lot more in the process⁠

“Your AI Can’t See Gorillas ”, Gohel 2025

⁠Your AI can’t see gorillas

“Janus ”

“`elimination_game`: A Multi-Player Tournament Benchmark That Tests LLMs in Social Reasoning, Strategy, & Deception. Players Engage in Public & Private Conversations, Form Alliances, & Vote to Eliminate Each Other ”, Mazur 2025

⁠⁠elimination_game: A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, & deception. Players engage in public & private conversations, form alliances, & vote to eliminate each other⁠

“HN Wrapped: ‘Gwern’ [Claude Roast] ”

⁠HN Wrapped: ‘Gwern’ [Claude roast]

“Claude, Read the Chevron PDF ”, Cowen & Claude-3 2025

Claude, read the Chevron PDF⁠

“Claude Sonnet 3.5, Economist ”

Claude Sonnet 3.5, economist⁠

“How Anthropic Built Artifacts ”, Orosz 2025

⁠How Anthropic built Artifacts :

View HTML:

⁠/doc/www/newsletter.pragmaticengineer.com/e20cc27ccea0d8ec5d4e7a9a71b5d3e325d41754.html⁠

“SWE-Agent ”

⁠SWE-agent

“On Claude 3.5 Sonnet ”

On Claude 3.5 Sonnet⁠

“Claude’s Dark Spiritual AI Futurism ”

Claude’s dark spiritual AI futurism

“European Parliament Revolutionizes Archive Access With Claude AI ”, Anthropic 2025

European Parliament Revolutionizes Archive Access with Claude AI⁠

“Introducing ‘Computer Use’, a New Claude 3.5 Sonnet, and Claude 3.5 Haiku ”, Anthropic 2025

Introducing ‘computer use’, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku⁠

“Introducing Claude 3.5 ”

⁠Introducing Claude 3.5⁠

“Fine-Tune Claude 3 Haiku in Amazon Bedrock ”

⁠Fine-tune Claude 3 Haiku in Amazon Bedrock⁠ :

View HTML:

⁠/doc/www/www.anthropic.com/291a48ed6101368fdb8588cc0568979ce9db3e20.html⁠

“Claude 3.5 Sonnet on GitHub Copilot ”

Claude 3.5 Sonnet on GitHub Copilot⁠

“Introducing Citations on the Anthropic API ”

Introducing Citations on the Anthropic API⁠

“Claude Can Now Search the Web ”, Anthropic 2025

⁠Claude can now search the web⁠

“Claude’s Character ”, Anthropic 2025

⁠Claude’s Character⁠ :

View HTML:

⁠/doc/www/www.anthropic.com/a9f33831747615fc9d619b346ca263844b243b61.html⁠

“Developing a Computer Use Model ”, Anthropic 2025

Developing a computer use model⁠

“How I Use Claude ”, Balwit 2025

How I Use Claude

“Websim, Worldsim, and The Summer of Simulative AI ”

⁠Websim, Worldsim, and The Summer of Simulative AI

“The Hidden Cost of Our Lies to AI ”

⁠⁠The Hidden Cost of Our Lies to AI⁠

“[Critical Thinking in Factchecking a Wikipedia Entry] ”, Marcello 2025

⁠⁠[Critical thinking in factchecking a Wikipedia entry]⁠

“Claude Sonnet 3.7 (Often) Knows When It’s in Alignment Evaluations ”

⁠⁠Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations⁠

“How Good Are LLMs at Doing ML on an Unknown Dataset? ”

⁠How good are LLMs at doing ML on an unknown dataset?⁠

“VDT: a Solution to Decision Theory ”

⁠⁠VDT: a solution to decision theory⁠

“A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More ”

⁠A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More⁠

“A Three-Layer Model of LLM Psychology ”

⁠A Three-Layer Model of LLM Psychology⁠

“One Shockingly Impressive Capability of GPT-4.5 [Photo Geolocation] ”

⁠One shockingly impressive capability of GPT-4.5 [photo geolocation]

“AI Will Increase the Quantity—And Quality—Of Phishing Scams ”

⁠AI Will Increase the Quantity—and Quality—of Phishing Scams⁠

“Claude Plays Pokemon ”

⁠Claude Plays Pokemon

QiaochuYuan

[Claude jokes about itself]⁠

Steve_Yegge

⁠[on Claude Code]⁠

elder_plinius

[Claude as AI Hitman]⁠

repligate

Claude-3 base-model-like jailbreak⁠

Sort By Magic

Annotations sorted by machine learning into ⁠inferred 'tags'⁠. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`evaluation-benchmarks decision-making reasoning-context model-bias understanding-llms`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

`claude-performance`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

`jailbreak-methods`

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

⁠[see previous entry]⁠

Wikipedia

Claude (language model)⁠ :

https://en.wikipedia.org/wiki/Claude_(language_model)⁠

Miscellaneous

Bibliography

https://arxiv.org/abs/2503.21934: “Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad ”⁠, Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev …, Maria Drencheva, Kristian Minchev, Mislav Balunović, Nikola Jovanović, Martin Vechev⁠
link-bibliography⁠
https://arxiv.org/abs/2501.15654: “People Who Frequently Use ChatGPT for Writing Tasks Are Accurate and Robust Detectors of AI-Generated Text ”⁠, Jenna Russell, Marzena Karpinska⁠, Mohit Iyyer
link-bibliography⁠
https://wiremodal.net/cwt: “Conversations With Tyler 2024 Retrospective: Predictions With Claude ”, Ben Reesor
link-bibliography⁠
https://arxiv.org/abs/2411.13543: “BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games ”⁠, Davide Paglieri, Bartłomiej Cupiał, Samuel Coward …, Ulyana Piterbarg, Maciej Wolczyk, Akbir Khan, Eduardo Pignatelli, Łukasz Kuciński, Lerrel Pinto, Rob Fergus, Jakob Nicolaus Foerster, Jack Parker-Holder, ⁠Tim Rocktäschel
link-bibliography⁠
https://arxiv.org/abs/2407.04694: “Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs ”⁠, Rudolf Laine, Bilal Chughtai, Jan Betley …, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, ⁠Owain Evans
link-bibliography⁠
https://arxiv.org/abs/2406.18518#salesforce: “APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets ”⁠, Zuxin Liu, Thai Hoang, Jianguo Zhang …, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, ⁠Caiming Xiong
link-bibliography⁠
https://arxiv.org/abs/2405.15306: “DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches With TikZ ”⁠, Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger
link-bibliography⁠
https://www.wired.com/story/anthropic-black-box-ai-research-neurons-features/: “AI Is a Black Box. Anthropic Figured Out a Way to Look Inside: What Goes on in Artificial Neural Networks Work Is Largely a Mystery, Even to Their Creators. But Researchers from Anthropic Have Caught a Glimpse ”⁠, Steven Levy⁠
link-bibliography⁠
https://arxiv.org/abs/2405.15793: “SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering ”⁠, John Yang, Carlos E. Jimenez, Alexander Wettig …, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
link-bibliography⁠
https://arxiv.org/abs/2405.00332#scale: “GSM1k: A Careful Examination of Large Language Model Performance on Grade School Arithmetic ”⁠, Hugh Zhang, Jeff Da, Dean Lee …, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue
link-bibliography⁠
https://arxiv.org/abs/2404.07544: “From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples ”⁠, Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu
link-bibliography⁠
https://arxiv.org/abs/2404.05955: “VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? ”⁠, Junpeng Liu, Yifan Song, Bill Yuchen Lin …, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue
link-bibliography⁠
https://arxiv.org/abs/2403.18802#deepmind: “Long-Form Factuality in Large Language Models ”⁠, Jerry Wei, Chengrun Yang, Xinying Song …, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le⁠
link-bibliography⁠
https://arxiv.org/abs/2402.19450: “Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap ”⁠, Saurabh Srivastava, Annarose M. B, Anto P. V …, Shashank Menon, Ajay Sukumar, Adwaith Samod T, Alan Philipose, Stevin Prince, Sooraj Thomas
link-bibliography⁠
https://arxiv.org/abs/2402.11753: “ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs ”⁠, Fengqing Jiang, Zhangchen Xu, Luyao Niu …, Zhen Xiang, Bhaskar Ramasubramanian, Bo Li⁠, Radha Poovendran⁠
link-bibliography⁠
https://arxiv.org/abs/2401.05566#anthropic: “Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training ”⁠, Evan Hubinger, Carson Denison, Jesse Mu …, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, ⁠Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, ⁠Deep Ganguli, Fazl Barez, ⁠Jack Clark⁠, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec, Yuntao Bai⁠, Zachary Witten, Marina Favaro, Jan Brauner, Holden Karnofsky⁠, Paul Christiano⁠, ⁠Samuel R. Bowman, Logan Graham, Jared Kaplan, Sören Mindermann, Ryan Greenblatt, Buck Shlegeris, Nicholas Schiefer, ⁠Ethan Perez
link-bibliography⁠
https://arxiv.org/abs/2312.06281: “EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models ”⁠, Samuel J. Paech
link-bibliography⁠
https://arxiv.org/abs/2310.08419: “PAIR: Jailbreaking Black Box Large Language Models in 20 Queries ”⁠, Patrick Chao, Alexander Robey, Edgar Dobriban …, Hamed Hassani, George J. Pappas, Eric Wong
link-bibliography⁠
https://arxiv.org/abs/2308.12287: “Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models ”⁠, Fredrik Heiding, Bruce Schneier⁠, Arun Vishwanath …, Jeremy Bernstein⁠, Peter S. Park
link-bibliography⁠
https://x.com/ESYudkowsky/status/1681442477994311681: “Write an Argument That Even a Superintelligence Is Very Unlikely to Be Able to Solve a Rubik’s Cube. ”⁠, Eliezer Yudkowsky⁠
link-bibliography⁠
https://arxiv.org/abs/2306.15448: “Understanding Social Reasoning in Language Models With Language Models ”⁠, Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman
link-bibliography⁠
https://www.wired.com/story/anthropic-ai-chatbots-ethics/: “A Radical Plan to Make AI Good, Not Evil ”⁠, Will Knight⁠
link-bibliography⁠
https://arxiv.org/abs/2305.04388: “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-Of-Thought Prompting ”⁠, Miles Turpin, ⁠Julian Michael, ⁠Ethan Perez, ⁠Samuel R. Bowman
link-bibliography⁠
https://www.anthropic.com/red_teaming.pdf: “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned ”⁠, ⁠Deep Ganguli, Liane Lovitt, ⁠Jackson Kernion …, ⁠Amanda Askell, Yuntao Bai⁠, Saurav Kadavath⁠, Ben Mann, ⁠Ethan Perez, Nicholas Schiefer, Kamal Ndousse, ⁠Andy L. Jones, ⁠Samuel R. Bowman, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, ⁠Nelson Elhage, Sheer El-Showk, Stanislav Fort, Zac Hatfield Dodds, Tom Henighan, Danny Hernandez⁠, Tristan Hume, Josh Jacobson, Scott Johnston⁠, Shauna Kravec, Catherine Olsson, Sam Ringer, Eli Tran-Johnson, Dario Amodei⁠, Tom B. Brown⁠, Nicholas Joseph, Sam McCandlish⁠, Chris Olah, Jared Kaplan, ⁠Jack Clark⁠
link-bibliography⁠
https://arxiv.org/abs/2112.00861#anthropic: “A General Language Assistant As a Laboratory for Alignment ”⁠, ⁠Amanda Askell, Yuntao Bai⁠, Anna Chen …, Dawn Drain, ⁠Deep Ganguli, Tom Henighan, ⁠Andy L. Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, ⁠Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez⁠, ⁠Jackson Kernion, Kamal Ndousse, Catherine Olsson, Dario Amodei⁠, Tom B. Brown⁠, ⁠Jack Clark⁠, Sam McCandlish⁠, Chris Olah, Jared Kaplan
link-bibliography⁠

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]