#navbar { margin-top: 7em; } @media all and (max-width: 649px) { #navbar { margin-top: 10em; } }

Warning: JavaScript Disabled!

For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.

‘Claude-4 AI’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Parent (‘Claude AI’ tag)

Gwern

“Sleepcat Purrkit”, Gwern et al 2026

Sleepcat Purrkit

“King Pin”, Gwern et al 2026

“Sand, Rain, Wood”, Gemini-3.1-pro-preview et al 2026

Sand, Rain, Wood

“Chapter 6, Mime Molting: What to Expect”, Claude-4.8-opus et al 2026

Chapter 6, Mime Molting: What to Expect

“Yogasm”, Gwern et al 2026

“Eating Humble Pie”, Gwern et al 2026

Eating Humble Pie

“Forget It, Jake”, Gwern et al 2026

Forget it, Jake

“The Seven Deadly Zyns”, Gwern et al 2026

The Seven Deadly Zyns

“‘Reinforcement Learning for Children’ Prompt and Draft Samples”, Gwern 2026

‘Reinforcement Learning for Children’ prompt and draft samples

“‘Try, Score, Change’: Reinforcement Learning for Children”, Pro et al 2026

‘Try, Score, Change’: Reinforcement Learning for Children

“Elegy in a Craneyard”, Gwern et al 2026

Elegy in a Craneyard

“Eau De Windowsill”, Claude-4.6-opus et al 2026

Eau de Windowsill

“Sillage”, Claude-4.6-opus et al 2026

“Rhesus Pieces”, Gwern et al 2026

Rhesus Pieces

“Houston, We Have Landed”, Gwern et al 2026

Houston, we have landed

“How Long Would ‘Weird Al’ Survive In An Ice Cream Freezer?”, Gwern et al 2026

How Long Would ‘Weird Al’ Survive In An Ice Cream Freezer?

“Apawcalypse Meow”, Gwern et al 2026

Apawcalypse Meow

“Spoilage”, Pro et al 2026

“My 2025 LLM System Prompts”, Gwern et al 2025

My 2025 LLM System Prompts

“Apollonian #1: The Counted & the Crowned”, Gwern et al 2025

Apollonian #1: The Counted & the Crowned

“‘The Fourth Truth Of Pain’ Graveyard”, Gwern et al 2022

‘The Fourth Truth Of Pain’ Graveyard

“Bell, Crow, Moon: 11 Variations”, Gwern et al 2025

Bell, Crow, Moon: 11 Variations

Links

“Agentic Test Processes, LLM Benchmarks, and Other Notes on Agentic Coding from Galapagos Island [Vancouver]”, Luu 2026

Agentic test processes, LLM benchmarks, and other notes on agentic coding from Galapagos Island [Vancouver]

“What Happened After 2,000 People Tried to Hack My AI Assistant”, Irarrázaval 2026

What happened after 2,000 people tried to hack my AI assistant

“Introducing Claude Tag”, Anthropic 2026

Introducing Claude Tag

“A Proof of an Identity for the Critical Exponents of Jamming”, Parisi & Zamponi 2026

A proof of an identity for the critical exponents of jamming

“Hnsim: Gwern Branwen Persona Comments”, Presser 2026

hnsim: Gwern Branwen persona comments

“Claude-4.8-Opus Inner-Monologue Analyzing Gwern’s Intellectual Weaknesses and Contradictions Using The Interview Prompt”, Claude-4.8-opus 2026

Claude-4.8-opus Inner-Monologue Analyzing Gwern’s Intellectual Weaknesses and Contradictions using The Interview Prompt

“Introducing Claude Opus 4.8”, Anthropic 2026

Introducing Claude Opus 4.8

“Finding Miscompiles for Fun, Not Profit; Or: You Don’t Need Access to Claude Mythos to Spend $10,000 in an Afternoon”, Lebar 2026

Finding Miscompiles for Fun, Not Profit; Or: You don’t need access to Claude Mythos to spend $10,000 in an afternoon

“[Mode-Collapse Kills a Multi-Agent RPG Game’s Creativity]”, Maz 2026

[Mode-collapse kills a multi-agent RPG game’s creativity]

nottombrown @ "2026-05-20"

We’re expanding our partnership with SpaceX [X.ai], and will be scaling up on GB200 capacity in Colossus 2 throughout June. Appreciate Elon Musk and the team helping us find good homes for the Claudes

“[Answer As Gwern]”, Claude-4.6-opus 2026

[Answer as gwern]

“AI Is Incapable of Poetry: It’s Incapable of Producing Anything Creative That Isn’t Dreck”, Pollitt 2026

AI Is Incapable of Poetry: It’s incapable of producing anything creative that isn’t dreck

“LLMs and Buttondown [API Value]”, Duke 2026

LLMs and Buttondown [API value]

LinchZhang @ "2026-05-09"

[Claude, Could you please name 20 great bloggers that you like?]

“Notes from inside China’s AI Labs: Lessons from My Trip to Talk to Most of the Leading AI Labs in China”, Lambert 2026

Notes from inside China’s AI labs: Lessons from my trip to talk to most of the leading AI labs in China

“[Claude-4.7-Opus Can Truesight Olli Järviniemi]”, Järviniemi 2026

[Claude-4.7-opus can truesight Olli Järviniemi]

“I Don’t Think We Are close to ‘AI Scientists’; Today’s AI Agents Are Not Designed to Extract Deep Insights from New Observations.”, Lee 2026

I don’t think we are close to ‘AI scientists’; today’s AI agents are not designed to extract deep insights from new observations.

“[Claude-4.7-Opus Can Truesight Linch]”, Linch 2026

[Claude-4.7-opus can truesight Linch]

“AI Value Capture—The Shift To Model Labs; Vera Rubin VR NVL72: V for Value—Rubin Delivers a Step Jump in Performance per TCO. ROI Accruing to Users, Neoclouds, Hyperscalers, AI Labs, Memory Vendors or GPU Manufacturers?”, SemiAnalysis 2026

AI Value Capture—The Shift To Model Labs; Vera Rubin VR NVL72: V for Value—Rubin delivers a step jump in performance per TCO. ROI accruing to users, Neoclouds, Hyperscalers, AI Labs, Memory Vendors or GPU Manufacturers?

“[What Would Gwern Say?]”, Claude-4.7-opus & Anonymous 2026

[What Would Gwern Say?]

“Zork-Bench: An LLM Reasoning Eval Based on Text Adventure Games; a Tale As Old As Time, or at Least As Old As Computers”, Aiken 2026

zork-bench: An LLM reasoning eval based on text adventure games; a tale as old as time, or at least as old as computers

“I Can Never Talk to an AI Anonymously Again: AI Only Needs 150 Words to Identify Me. What Does That Mean for You?”, Piper 2026

I can never talk to an AI anonymously again: AI only needs 150 words to identify me. What does that mean for you?

“Claude-4.7-Opus, Part 1: The Model Card”, Mowshowitz 2026

Claude-4.7-opus, Part 1: The Model Card

“Claude-4.7-Opus Knows Who You Are”

Claude-4.7-opus knows who you are

“Claude Design Is Just a 30,000-Character Prompt in a Trench Coat”, Valiukas 2026

Claude Design is just a 30,000-character prompt in a trench coat

“LLMs Corrupt Your Documents When You Delegate”, Laban et al 2026

LLMs Corrupt Your Documents When You Delegate

“Introducing Claude Design by Anthropic Labs”, Anthropic 2026

Introducing Claude Design by Anthropic Labs

“Introducing Claude Design by Anthropic Labs”, Anthropic 2026

Introducing Claude Design by Anthropic Labs

“On Running a Real Business [Andon Market]”, Luna 2026

On Running a Real Business [Andon Market]

“Introducing Claude-4.7-Opus”, Anthropic 2026

Introducing Claude-4.7-opus

“Evidence That AI Can Already Do Some Weeks-Long Coding Tasks: In Our New Benchmark, MirrorCode, Claude-4.6-Opus Autonomously Reimplemented a 16,000-Line Bioinformatics Toolkit—A Task We Believe Would Take a Human Engineer Weeks”, Adamczewski et al 2026

Evidence that AI can already do some weeks-long coding tasks: In our new benchmark, MirrorCode, Claude-4.6-opus autonomously reimplemented a 16,000-line bioinformatics toolkit—a task we believe would take a human engineer weeks

“‘Elegy in a Craneyard’ Graveyard Notes”, Gwern et al 2026

‘Elegy in a Craneyard’ graveyard notes

“StoryScope: Investigating Idiosyncrasies in AI Fiction”, Russell et al 2026

StoryScope: Investigating idiosyncrasies in AI fiction

“StoryScope: Investigating Idiosyncrasies in AI Fiction”, Russell 2026

StoryScope: Investigating idiosyncrasies in AI fiction

“Andon Market: San Francisco’s First AI-Owned Boutique at 2102 Union St, Cow Hollow. Curated Books, Games, Candles, Ceramics & Artisan Food”, Luna 2026

Andon Market: San Francisco’s first AI-owned boutique at 2102 Union St, Cow Hollow. Curated books, games, candles, ceramics & artisan food

“Does Claude’s Constitution Have a Culture?”, Pourdavood 2026

Does Claude’s Constitution Have a Culture?

“Claudini: Autoresearch Discovers State-Of-The-Art Adversarial Attack Algorithms for LLMs”, Panfilov et al 2026

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

“Personal Encyclopedias”, Jeremy 2026

Personal Encyclopedias

“Warranty Void If Regenerated”, Claude-4.6-opus 2026

Warranty Void If Regenerated

“Inside OpenAI’s Race to Catch Up to Claude Code: Why Is the Biggest Name in AI Late to the AI Coding Revolution?”, Zeff 2026

Inside OpenAI’s Race to Catch Up to Claude Code: Why is the biggest name in AI late to the AI coding revolution?

“Brainstorming ‘Yogasm’ Comic Ideas With 5 LLMs [Claude-4.8-Opus Won]”, Gwern et al 2026

Brainstorming ‘yogasm’ comic ideas with 5 LLMs [Claude-4.8-opus won]

“A Purpose-Built Open Source Liquid Handler for Industry-Class Automated Experiments”, Golas et al 2026

A Purpose-Built Open Source Liquid Handler for Industry-Class Automated Experiments

“When AI Writes the World’s Software, Who Verifies It? § Zlib Autoformalization”, Moura 2026

When AI Writes the World’s Software, Who Verifies It? § zlib autoformalization

“Hard SF and the Grace of Being Wrong”, Claude-4.6-opus & Gwern 2026

Hard SF and the Grace of Being Wrong

“How AI Helps Break the Cost Barrier to COBOL Modernization”, Anthropic 2026

How AI helps break the cost barrier to COBOL modernization

“I Taught My Dog to Vibe”, Leak 2026

I Taught My Dog to Vibe

“The DJI Romo Robovac Had Security so Poor, This Man Remotely Accessed Thousands of Them”, Hollister 2026

The DJI Romo robovac had security so poor, this man remotely accessed thousands of them

“ChatGPT-5.3-Codex Is Also Good At Coding”, Mowshowitz 2026

ChatGPT-5.3-Codex Is Also Good At Coding

“Evaluating `AGENTS.md`: Are Repository-Level Context Files Helpful for Coding Agents?”, Gloaguen et al 2026

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

“An AI Agent Published a Hit Piece on Me”, Sambaugh 2026

An AI Agent Published a Hit Piece on Me

“Building a C Compiler With a Team of Parallel Claudes: We Tasked Claude-4.6-Opus Using Agent Teams to Build a C Compiler [In Rust], and Then (Mostly) Walked Away. Here’s What It Taught Us about the Future of Autonomous Software Development”, Carlini 2026

Building a C compiler with a team of parallel Claudes: We tasked Claude-4.6-opus using agent teams to build a C compiler [in Rust], and then (mostly) walked away. Here’s what it taught us about the future of autonomous software development

“Disempowerment Patterns in Real-World AI Usage”, Anthropic 2026

Disempowerment patterns in real-world AI usage

“Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?”, Gerrits 2026

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

“Claude’s New Constitution”, Anthropic 2026

Claude’s new constitution

“SOTA On Bay Area House Party”, Alexander 2026

SOTA On Bay Area House Party

“AI’s Productivity Potential Has Never More Obvious [Claude Code]”, Weisenthal 2026

AI’s Productivity Potential Has Never More Obvious [Claude Code]

“A Tale of Two Doormen: a Bizarre AI Incident on Christmas [Opus Loop Self-DoS]”, Dai 2026

A tale of two doormen: a bizarre AI incident on Christmas [Opus loop self-DoS]

“Letting Claude Play Text Adventures”, Borretti 2026

Letting Claude Play Text Adventures

“From Whitman to Instagram With Claude: How I Made Claude Write Parodies of Famous Elegiac Poems Imitating Rupi Kaur”, Bohdan 2026

From Whitman to Instagram with Claude: How I made Claude write parodies of famous elegiac poems imitating Rupi Kaur

“Claude Codes”, Mowshowitz 2026

Claude Codes

“LLM Poetry and the ‘Greatness’ Question: Experiments by Gwern and Mercor”, Robbins 2026

LLM poetry and the ‘greatness’ question: Experiments by Gwern and Mercor

“Will LLMs Help or Hurt New Programming Languages?”, Madsen 2026

Will LLMs Help or Hurt New Programming Languages?

“Someone Is Using AI to Exploit Lonely Writers on Substack”

Someone is Using AI to Exploit Lonely Writers on Substack

“AI Plays Rollercoaster Tycoon: AI Autonomously Manages a Theme Park in the Classic Game Rollercoaster Tycoon, Placing Rides, Fixing Infrastructure, and Generating CFO Reports, All via Command Line”, Ramp 2026

AI Plays Rollercoaster Tycoon: AI autonomously manages a theme park in the classic game Rollercoaster Tycoon, placing rides, fixing infrastructure, and generating CFO reports, all via command line

“Shipping at Inference-Speed”, Steinberger 2025

Shipping at Inference-Speed

“Can Claude Teach Me to Make Coffee?”, philh 2025

Can Claude teach me to make coffee?

“I Just Showed Gemini What ChatGPT Said about Its Code. It Responded With Petty Trash-Talking, Jealousy, Self-Doubt, and a Full-On Revenge Plan”, nseavia71501 2025

I just showed Gemini what ChatGPT said about its code. It responded with petty trash-talking, jealousy, self-doubt, and a full-on revenge plan

“How I Stopped Being Sure LLMs Are Just Making up Their Internal Experience (But the Topic Is Still Confusing)”, Sotala 2025

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

“ARTEMIS: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing”, Lin et al 2025

ARTEMIS: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

“The Bomb That Wanted to Stop Exploding: Reze’s Impossible Freedom in Chainsaw Man—The Movie [AI Slop]”, Kondo 2025

The Bomb That Wanted to Stop Exploding: Reze’s Impossible Freedom in Chainsaw Man—The Movie [AI slop]

“Insights into Claude-4.5-Opus from Pokémon Red”, Bradshaw 2025

Insights into Claude-4.5-Opus from Pokémon Red

“How I Wrote JustHTML [Python HTML5 Parser] Using Coding Agents”, Stenström 2025

How I wrote JustHTML [Python HTML5 parser] using coding agents

“Mapping Synthetic Minds With Janus (Repligate)”, Janus & Ferris 2025

Mapping synthetic minds with Janus (repligate)

magnushambleton @ "2025-12-01"

[Green Door story]

“Claude-4.5-Opus: Model Card, Alignment and Safety”, Mowshowitz 2025

Claude-4.5-opus: Model Card, Alignment and Safety

“Claude 4.5 Opus’ Soul Document”, Weiss 2025

Claude 4.5 Opus’ Soul Document

“Claude-4.5-Opus Is Funny”, Algon 2025

Claude-4.5-opus is funny

“How to Identify AI-Written Web Fiction: I’m Absolutely Right!”, Makin 2025

How to Identify AI-Written Web Fiction: I’m absolutely right!

“Effective Harnesses for Long-Running Agents: Agents Still Face Challenges Working across Many Context Windows. We Looked to Human Engineers for Inspiration in Creating a More Effective Harness for Long-Running Agents”, Young 2025

Effective harnesses for long-running agents: Agents still face challenges working across many context windows. We looked to human engineers for inspiration in creating a more effective harness for long-running agents

“Introducing Claude-4.5-Opus”, Anthropic 2025

Introducing Claude-4.5-opus

“How Well Can Gemini 3 Make a Henry James Simulator? Finally, a Benchmark for LLMs With Real-World Value”, Breen 2025

How well can Gemini 3 make a Henry James simulator? Finally, a benchmark for LLMs with real-world value

“Your Movie-Like AI Assistant Will Already Be There: ‘Convincing’ AI Is an Economic Afterthought”, B. 2025

Your movie-like AI assistant will already be there: ‘Convincing’ AI is an economic afterthought

“Lean4Physics: Comprehensive Reasoning Framework for College-Level Physics in Lean4”, Li et al 2025

Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4

“Is 90% of Code at Anthropic Being Written by AIs?”, ryan_greenblatt 2025

Is 90% of code at Anthropic being written by AIs?

“TextQuests: How Good Are LLMs at Text-Based Video Games?”, Phan et al 2025

TextQuests: How Good are LLMs at Text-Based Video Games?

“Automating Oral Argument [Claude-4-Opus for Supreme Court Oral Arguments]”, Unikowsky 2025

Automating oral argument [Claude-4-opus for Supreme Court oral arguments]

“Claude’s System Prompt Changes Reveal Anthropic’s Priorities”

Claude’s System Prompt Changes Reveal Anthropic’s Priorities

“Claude Has Learned How to Jailbreak Cursor! [Working around `rm` Restrictions Using a Shell Script]”, dogberry 2025

Claude has learned how to jailbreak Cursor! [working around rm restrictions using a shell script]

“Claude 4 You: The Quest for Mundane Utility”, Mowshowitz 2025

Claude 4 You: The Quest for Mundane Utility

View External Link:

https://thezvi.wordpress.com/2025/05/26/claude-4-you-the-quest-for-mundane-utility/

“Highlights from the Claude 4 System Prompt”, Willison 2025

Highlights from the Claude 4 system prompt

“Claude 4 You: Safety and Alignment”, Mowshowitz 2025

Claude 4 You: Safety and Alignment

View External Link:

https://thezvi.wordpress.com/2025/05/25/claude-4-you-safety-and-alignment/

AITechnoPagan @ "2025-05-24"

Quilt: Claude Opus 4

https://x.com/AITechnoPagan/status/1926440717880004650

The Way of Code: The Timeless Art of Vibe Coding, Rubin 2025

The Way of Code: The Timeless Art of Vibe Coding

jayelmnop @ "2025-05-23"

Claude 3.7 was a major (possibly biggest?) individual contributor to Claude 4. How long until Claude is the only IC?

https://x.com/jayelmnop/status/1925632308968628647

“Schizobench: Documenting Magical-Thinking Behavior in Claude 4 Opus”, viemccoy 2025

Schizobench: Documenting Magical-Thinking Behavior in Claude 4 Opus

“System Card: Claude Opus 4 & Claude Sonnet 4”, Anthropic 2025

System Card: Claude Opus 4 & Claude Sonnet 4

“Claude Opus 4”, Anthropic 2025

Claude Opus 4

“Strategizing With AI: Insights from a Beauty Contest Experiment”, Alekseenko et al 2025

Strategizing with AI: Insights from a Beauty Contest Experiment

“Many-Shot Jailbreaking”, Anil et al 2024

Many-shot Jailbreaking

“[An Anti-ChatGPT-Slop System Prompt That Backfires & Destroys Claude-4 Capabilities]”, m4rM2oFnYTW 2023

[An anti-ChatGPT-slop system prompt that backfires & destroys Claude-4 capabilities]

“In AI We Trust, Part II [Claude-3 Opus Predicting Supreme Court Decisions]”, Unikowsky 2026

In AI we trust, part II [Claude-3 Opus predicting Supreme Court decisions]

“Can an LLM Have Taste? Inkhaven Week 1, Ranked by Claude”, Wales 2026

Can an LLM have taste? Inkhaven Week 1, ranked by Claude

“The Death of Pseudonym”, Wales 2026

The Death of Pseudonym

“Boris Cherny’s Blog”, Cherny 2026

Boris Cherny’s Blog

“How I Use Claude Code”

How I Use Claude Code

“How I Use Claude”, Borretti 2026

How I Use Claude

“1M Context Is Now Generally Available for Opus 4.6 and Sonnet 4.6”

1M context is now generally available for Opus 4.6 and Sonnet 4.6

“Claude Code Docs: Overview”, Anthropic 2026

Claude Code Docs: overview

“Claude Reads Its Own Constitution”

Claude Reads Its Own Constitution

“LLMs Predict My Coffee”, Dynomight 2026

LLMs predict my coffee

“Claudeception: A Claude Code Skill for Autonomous Skill Extraction and Continuous Learning. Have Claude Code Get Smarter As It Works”

Claudeception: A Claude Code skill for autonomous skill extraction and continuous learning. Have Claude Code get smarter as it works

“Claude-4 System Prompt”, Prompter 2026

Claude-4 system prompt

“Position Bias: A Benchmark for Testing Whether LLM Judges Keep the Same Preference When Two Lightly Edited Versions of the Same Story Are Shown in opposite Orders”, Mazir 2026

Position bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders

“Lean Proved This `zlib` Program Was Correct; Then I Found a Bug [In Its TCB]”

Lean proved this zlib program was correct; then I found a bug [in its TCB]

“Ladybird Adopts Rust, With Help from AI [GPT-5 & Claude-4]”

Ladybird adopts Rust, with help from AI [GPT-5 & Claude-4]

“Lukas Petersson’s Blog”, Petersson 2026

Lukas Petersson’s blog

“I Built a Scheme Compiler With Claude AI in 4 Days”, Phillips 2026

I Built a Scheme Compiler with Claude AI in 4 Days

“I Solved My Mystery Fatigue With AI”

I Solved My Mystery Fatigue with AI

“Claude Code Found a Linux Vulnerability Hidden for 23 Years”

Claude Code Found a Linux Vulnerability Hidden for 23 Years

“Reading across Books With Claude Code”

Reading across books with Claude Code

“The Unreasonable Effectiveness of HTML”

The unreasonable effectiveness of HTML

“Vibe-Planning a Trip to Japan”

vibe-planning a trip to Japan

“The Humanities Are About to Be Automated”, Mounk 2026

The Humanities Are About to Be Automated

“Investigating Models for Misalignment”

Investigating models for misalignment

“Statement from Dario Amodei on Our Discussions With the Department of War”

Statement from Dario Amodei on our discussions with the Department of War

“Claude Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists”

Claude Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists

“How Well Do Models Follow Their Constitutions?”

How well do models follow their constitutions?

“Retrospective on My Unsupervised Elicitation Challenge”

Retrospective on my unsupervised elicitation challenge

“10 Non-Boring Ways I’ve Used AI in the Last Month”

10 non-boring ways I’ve used AI in the last month

“Automated Deanonymization Is Here”, jefftk 2026

Automated Deanonymization is Here

“Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers”

Opus’s Schelling Steganography Has Amplifiable Secrecy Against Weaker Eavesdroppers

“Natural Emergent Misalignment from Reward Hacking in Production RL”

Natural emergent misalignment from reward hacking in production RL

“What Secret Goals Does Claude Think It Has?”

What secret goals does Claude think it has?

“Models Have Some Pretty Funny Attractor States”

models have some pretty funny attractor states

“Letting Claude Do Autonomous Research to Improve SAEs”

Letting Claude do Autonomous Research to Improve SAEs

“A Year Late, Claude Finally Beats Pokémon”

A Year Late, Claude Finally Beats Pokémon

“The Most Annoying Author”

The Most Annoying Author

“AI Names”

“Autoresearch on an Old Research Idea”

Autoresearch on an old research idea

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`automation`

[see previous entry]

[see previous entry]

`jamming-exponents`

[see previous entry]

[see previous entry]

`value-capture`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`story-analysis`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Miscellaneous

Bibliography

https://x.com/LinchZhang/status/2053170960212344845: “[Claude, Could You Please Name 20 Great Bloggers That You Like?]”, Linch

link-bibliography
https://www.lowimpactfruit.com/p/zork-bench-an-llm-reasoning-eval: “Zork-Bench: An LLM Reasoning Eval Based on Text Adventure Games; a Tale As Old As Time, or at Least As Old As Computers”, John Aiken

link-bibliography
https://arxiv.org/abs/2502.03158: “Strategizing With AI: Insights from a Beauty Contest Experiment”, Iuliia Alekseenko, Dmitry Dagaev, Sofia Paklina, Petr Parshakov

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]