#navbar { margin-top: 7em; } @media all and (max-width: 649px) { #navbar { margin-top: 10em; } }

Warning: JavaScript Disabled!

For support of key website features (link annotation popups/popovers & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc.), you must enable JavaScript.

‘GPT-5’ directory

See Also
Gwern
Links
Miscellaneous
Bibliography

See Also

Parent (‘GPT’ tag)

Gwern

“Brainstorming ‘Yogasm’ Comic Ideas With 5 LLMs [Claude-4.8-Opus Won]”, Gwern et al 2026

Brainstorming ‘yogasm’ comic ideas with 5 LLMs [Claude-4.8-opus won]

“Traitor Joe’s”, Gwern et al 2026

Traitor Joe’s

“LLM Challenge: Write Non-Biblical Sentences”, Gwern 2024

LLM Challenge: Write Non-Biblical Sentences

“‘Elegy in a Craneyard’ Graveyard Notes”, Gwern et al 2026

‘Elegy in a Craneyard’ graveyard notes

“Elegy in a Craneyard”, Gwern et al 2026

Elegy in a Craneyard

“Accountant Who Owns a Rabbit”, Pro et al 2026

Accountant Who Owns a Rabbit

“ChatGPT—Human Perception at a Red Light”, Gwern & Pro 2026

ChatGPT—Human Perception at a Red Light

“Human Perception at a Red Light”, Gwern & Pro 2026

Human Perception at a Red Light

“Oh, This Old Thing?”, Gwern et al 2026

Oh, This Old Thing?

“How Long Would ‘Weird Al’ Survive In An Ice Cream Freezer?”, Gwern et al 2026

How Long Would ‘Weird Al’ Survive In An Ice Cream Freezer?

“Apawcalypse Meow”, Gwern et al 2026

Apawcalypse Meow

“Fine Art versus Fiiine Art”, Gwern 2026

Fine Art versus Fiiine Art

“Spoilage”, Gwern & Pro 2026

“Spoilage”, Pro et al 2026

“My 2025 LLM System Prompts”, Gwern et al 2025

My 2025 LLM System Prompts

“Apollonian #1: The Counted & the Crowned”, Gwern et al 2025

Apollonian #1: The Counted & the Crowned

“‘The Fourth Truth Of Pain’ Graveyard”, Gwern et al 2022

‘The Fourth Truth Of Pain’ Graveyard

“[GPT-5 Free Association Experiment for Autonomous Image Generation]”, Gwern & GPT-5 2025

[GPT-5 free association experiment for autonomous image generation]

“Explain Free Energy Minimization Right Now, You Piece of S—T!”, Gwern & GPT-5 2025

Explain Free Energy Minimization Right Now, You Piece of S—t!

“O3 Is Full of Crimes”, Gwern 2025

o3 is full of crimes

“Scaling ‘Diminishing Returns’”, Gwern 2024

Scaling ‘diminishing returns’

Links

“Stochastically Evolving Ellipsoids With Symmetries”, Abuya et al 2026

Stochastically evolving ellipsoids with symmetries

“Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization”, Zhou 2026

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

“Finding Miscompiles for Fun, Not Profit; Or: You Don’t Need Access to Claude Mythos to Spend $10,000 in an Afternoon”, Lebar 2026

Finding Miscompiles for Fun, Not Profit; Or: You don’t need access to Claude Mythos to spend $10,000 in an afternoon

“A Prize-Winning Story Published in Granta Was (Very Likely) Written by AI”

A prize-winning story published in Granta was (very likely) written by AI

View External Link:

https://lithub.com/a-prize-winning-story-published-in-granta-was-very-likely-written-by-ai/

“An OpenAI Model Has Disproved a Central Conjecture in Discrete Geometry [Planar Unit Distance Problem]”, OpenAI 2026

An OpenAI model has disproved a central conjecture in discrete geometry [planar unit distance problem]

“AI Is Incapable of Poetry: It’s Incapable of Producing Anything Creative That Isn’t Dreck”, Pollitt 2026

AI Is Incapable of Poetry: It’s incapable of producing anything creative that isn’t dreck

“LLMs Corrupt Your Documents When You Delegate”, Laban et al 2026

LLMs Corrupt Your Documents When You Delegate

“The AI Revolution in Math Has Arrived: AI Is Being Used to Prove New Results at a Rapid Pace. Mathematicians Think This Is Just the Beginning”, Kakaes 2026

The AI Revolution in Math Has Arrived: AI is being used to prove new results at a rapid pace. Mathematicians think this is just the beginning

“Short Proofs in Combinatorics, Probability and Number Theory II”, Alexeev et al 2026

Short proofs in combinatorics, probability and number theory II

“StoryScope: Investigating Idiosyncrasies in AI Fiction”, Russell et al 2026

StoryScope: Investigating idiosyncrasies in AI fiction

“StoryScope: Investigating Idiosyncrasies in AI Fiction”, Russell 2026

StoryScope: Investigating idiosyncrasies in AI fiction

“Short Proofs in Combinatorics and Number Theory”, Alexeev et al 2026

Short proofs in combinatorics and number theory

“Cream of Can”, Gwern et al 2026

Cream of Can

“[Poetry Typography Design Experiment: Side-By-Side Pindaric Ode]”, Pro 2026

[Poetry typography design experiment: side-by-side Pindaric ode]

“Introducing GPT-5.4-Mini and GPT-5.4-Nano: Fast and Efficient Models Optimized for Coding and Sub-Agents”, OpenAI 2026

Introducing GPT-5.4-mini and GPT-5.4-nano: Fast and efficient models optimized for coding and sub-agents

“GPT-5.4 Is A Substantial Upgrade”, Mowshowitz 2026

GPT-5.4 Is A Substantial Upgrade

“Inside OpenAI’s Race to Catch Up to Claude Code: Why Is the Biggest Name in AI Late to the AI Coding Revolution?”, Zeff 2026

Inside OpenAI’s Race to Catch Up to Claude Code: Why is the biggest name in AI late to the AI coding revolution?

“Introducing GPT-5.4 [GPT-5.4 Pro]”, OpenAI 2026

Introducing GPT-5.4 [GPT-5.4 Pro]

“Introducing GPT-5.4 [GPT-5.4 Thinking]”, OpenAI 2026

Introducing GPT-5.4 [GPT-5.4 Thinking]

“[Unique-8 That Xor to 2¹⁶ − 1]”, Pro 2026

[unique-8 that xor to 2¹⁶ − 1]

“ChatGPT-5.3-Codex Is Also Good At Coding”, Mowshowitz 2026

ChatGPT-5.3-Codex Is Also Good At Coding

“Evaluating `AGENTS.md`: Are Repository-Level Context Files Helpful for Coding Agents?”, Gloaguen et al 2026

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

“Introducing GPT-5.3-Codex: Expanding Codex across the Full Spectrum of Professional Work on a Computer”, OpenAI 2026

Introducing GPT-5.3-Codex: Expanding Codex across the full spectrum of professional work on a computer

“Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?”, Gerrits 2026

Playing With AI: How Do State-Of-The-Art Large Language Models Perform in the 1977 Text-Based Adventure Game Zork?

“LLM Poetry and the ‘Greatness’ Question: Experiments by Gwern and Mercor”, Robbins 2026

LLM poetry and the ‘greatness’ question: Experiments by Gwern and Mercor

“Shipping at Inference-Speed”, Steinberger 2025

Shipping at Inference-Speed

“Introducing GPT-5.2-Codex: The Most Advanced Agentic Coding Model for Professional Software Engineering and Defensive Cybersecurity.”, OpenAI 2025

Introducing GPT-5.2-Codex: The most advanced agentic coding model for professional software engineering and defensive cybersecurity.

“I Ported JustHTML from Python to JavaScript With Codex CLI & GPT-5.2 in 4.5 Hours”, Willison 2025

I ported JustHTML from Python to JavaScript with Codex CLI & GPT-5.2 in 4.5 hours

“GPT-5.2 Is Frontier Only For The Frontier”, Mowshowitz 2025

GPT-5.2 Is Frontier Only For The Frontier

“GPT-5.2-Thinking-20251213 System Prompt”, Walls & GPT-5.2 2025

GPT-5.2-Thinking-20251213 system prompt

“Introducing GPT-5.2: The Most Advanced Frontier Model for Professional Work & Long-Running Agents”, OpenAI 2025

Introducing GPT-5.2: The most advanced frontier model for professional work & long-running agents

“Introducing GPT-5.2 Pro § Science and Math”, OpenAI 2025

Introducing GPT-5.2 Pro § Science and Math

“ARTEMIS: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing”, Lin et al 2025

ARTEMIS: Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

“How Well Can Gemini 3 Make a Henry James Simulator? Finally, a Benchmark for LLMs With Real-World Value”, Breen 2025

How well can Gemini 3 make a Henry James simulator? Finally, a benchmark for LLMs with real-world value

“GPT-5.1: A Smarter, More Conversational ChatGPT § GPT-5.1 Thinking”, OpenAI 2025

GPT-5.1: A smarter, more conversational ChatGPT § GPT-5.1 Thinking

“ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases”, Zhong et al 2025

ImpossibleBench: Measuring LLMs’ Propensity of Exploiting Test Cases

“We Tested Claude Sonnet 4.5 for Writing and Editing: 5 Tests across Blind Comparisons, Editorial Standards, and Deadlines—Here’s What Changed Our Setup”, Parrott 2025

We Tested Claude Sonnet 4.5 for Writing and Editing: 5 tests across blind comparisons, editorial standards, and deadlines—here’s what changed our setup

“Evaluating Long Context (Reasoning) Ability”

Evaluating Long Context (Reasoning) Ability

“The QMA Singularity [GPT-5-Thinking Proves a Key Lemma]”, Aaronson 2025

The QMA Singularity [GPT-5-Thinking proves a key lemma]

“I Talked to Sam Altman about the GPT-5 Launch Fiasco: Over Dinner, OpenAI CEO’s Addressed Criticism of GPT-5’s Rollout, the AI Bubble, Brain-Computer Interfaces, Buying Google Chrome, and More”, Heath 2025

I talked to Sam Altman about the GPT-5 launch fiasco: Over dinner, OpenAI CEO’s addressed criticism of GPT-5’s rollout, the AI bubble, brain-computer interfaces, buying Google Chrome, and more

andonlabs @ "2025-08-13"

[GPT-5 on Vending Machine sales benchmark]

https://x.com/andonlabs/status/1955692437558677529

“GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o”, Mowshowitz 2025

GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o

“GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card”, Mowshowitz 2025

GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card

View External Link:

https://thezvi.wordpress.com/2025/08/11/gpt-5s-are-alive-basic-facts-benchmarks-and-the-model-card/

“GPT-5 AMA With OpenAI’s Sam Altman and Some of the GPT-5 Team”, Altman 2025

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team

“Details about METR’s Evaluation of OpenAI GPT-5”, METR 2025

Details about METR’s evaluation of OpenAI GPT-5

“GPT-5 Is Here: Our Smartest, Fastest, and Most Useful Model Yet, With Thinking Built In. Available to Everyone”, OpenAI 2025

GPT-5 is here: Our smartest, fastest, and most useful model yet, with thinking built in. Available to everyone

“Introducing GPT-5 for Developers: The Best Model for Coding and Agentic Tasks [API]”, OpenAI 2025

Introducing GPT-5 for developers: The best model for coding and agentic tasks [API]

“GPT-5 Pro: Scaled but Efficient Parallel Test-Time Compute, to Provide the Highest Quality and Most Comprehensive Answers”, OpenAI 2025

GPT-5 Pro: scaled but efficient parallel test-time compute, to provide the highest quality and most comprehensive answers

“GPT-5: It Just Does Stuff—Putting the AI in Charge”, Mollick 2025

GPT-5: It Just Does Stuff—Putting the AI in Charge

khoomeik @ "2025-08-07"

[GPT-5 was a <100× GPT-4 scaleup]

“TextQuests: How Good Are LLMs at Text-Based Video Games?”, Phan et al 2025

TextQuests: How Good are LLMs at Text-Based Video Games?

“ChatGPT Is My Static Site Generator”, Pilkenton 2025

ChatGPT is my static site generator

“OpenAI Is Expected to Release a ‘Materially Better’ GPT-5 for Its Chatbot Mid-Year, Sources Say”, Hays & Rafieyan 2024

OpenAI is expected to release a ‘materially better’ GPT-5 for its chatbot mid-year, sources say

“Microsoft Swallows OpenAI’s Core Team § Compute Is King”, Patel & Nishball 2023

Microsoft Swallows OpenAI’s Core Team § Compute Is King

“Inside the Chaos at OpenAI: Sam Altman’s Weekend of Shock and Drama Began a Year Ago, With the Release of ChatGPT”, Hao & Warzel 2023

Inside the Chaos at OpenAI: Sam Altman’s weekend of shock and drama began a year ago, with the release of ChatGPT

“OpenAI Chief Seeks New Microsoft Funds to Build ‘Superintelligence’: Sam Altman Expects Big Tech Group Will Back Start-Up’s Mission to Create Software As Intelligent As Humans”, Murgia 2023

OpenAI chief seeks new Microsoft funds to build ‘superintelligence’: Sam Altman expects Big Tech group will back start-up’s mission to create software as intelligent as humans

wagieeacc @ "2023-10-17"

GPT-5 hardware rumor

“Altman on Scaling”, Thibs 2023

Altman on scaling

“In Sudden Alarm, Tech Doyens Call for a Pause on ChatGPT: Tech Luminaries, Renowned Scientists, and Elon Musk Warn of an ‘Out-Of-Control Race’ to Develop and Deploy Ever-More-Powerful AI Systems § GPT-5”, Knight & Dave 2023

In Sudden Alarm, Tech Doyens Call for a Pause on ChatGPT: Tech luminaries, renowned scientists, and Elon Musk warn of an ‘out-of-control race’ to develop and deploy ever-more-powerful AI systems § GPT-5

“GPT-5 Scheduled To Complete Training December”, Chen 2023

GPT-5 Scheduled To Complete Training December

davidtayar5 @ "2023-02-10"

Context on the NVIDIA ChatGPT opportunity—and ramifications of large language model enthusiasm

“Sidestepping Evaluation Awareness and Anticipating Misalignment With Production Evaluations”

Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations

“Erdős #659”, Grayzel 2026

Erdős #659

“Can AI Do Replications? GPT-5.2 vs GPT-5.4 vs Refine.ink”, Wiebe 2026

Can AI do replications? GPT-5.2 vs GPT-5.4 vs Refine.ink

“Moretti Replication Published in AER”, Wiebe 2026

Moretti replication published in AER

“ChatGPT—Poem Review and Critique”

ChatGPT—Poem Review and Critique

“LLMs Predict My Coffee”, Dynomight 2026

LLMs predict my coffee

“A Ramsey-Style Problem on Hypergraphs”

A Ramsey-style Problem on Hypergraphs

“AI Progress Is about to Speed Up”

AI progress is about to speed up

“Position Bias: A Benchmark for Testing Whether LLM Judges Keep the Same Preference When Two Lightly Edited Versions of the Same Story Are Shown in opposite Orders”, Mazir 2026

Position bias: A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in opposite orders

“ImpossibleBench”

ImpossibleBench

“Ladybird Adopts Rust, With Help from AI [GPT-5 & Claude-4]”

Ladybird adopts Rust, with help from AI [GPT-5 & Claude-4]

“Rohan Pandey Homepage”, Pandey 2026

Rohan Pandey homepage

“Erdős Problem #1196—Discussion Thread”

Erdős Problem #1196—Discussion thread

“Erdős Problem #783”, Tao 2026

Erdős Problem #783

“Erdős Problem #858—Discussion Thread”

Erdős Problem #858—Discussion thread

“The Current SOTA Model Was Released without Safety Evals”

The current SOTA model was released without safety evals

“How Well Do Models Follow Their Constitutions?”

How well do models follow their constitutions?

“Did Claude 3 Opus Align Itself via Gradient Hacking?”

Did Claude 3 Opus align itself via gradient hacking?

“Models Have Some Pretty Funny Attractor States”

models have some pretty funny attractor states

“Microsoft Prepares for OpenAI’s GPT-5 Model”

Microsoft prepares for OpenAI’s GPT-5 model

“OpenAI Fires an Employee for Prediction Market Insider Trading”

OpenAI Fires an Employee for Prediction Market Insider Trading

View External Link:

https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/

sama

[o3-full & o4-mini to launch earlier, GPT-5 delayed for capability improvement, integration polishing, & hardware availability]

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`position-bias`

[see previous entry]

`text-gaming`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`higher-order-proof`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`gpt-ramifications`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

Miscellaneous

Bibliography

https://www.theverge.com/command-line-newsletter/759897/sam-altman-chatgpt-openai-social-media-google-chrome-interview: “I Talked to Sam Altman about the GPT-5 Launch Fiasco: Over Dinner, OpenAI CEO’s Addressed Criticism of GPT-5’s Rollout, the AI Bubble, Brain-Computer Interfaces, Buying Google Chrome, and More”, Alex Heath

link-bibliography
https://openai.com/index/introducing-gpt-5/#gpt-5-pro: “GPT-5 Pro: Scaled but Efficient Parallel Test-Time Compute, to Provide the Highest Quality and Most Comprehensive Answers”, OpenAI

link-bibliography
https://x.com/khoomeik/status/1953560406381015259: “[GPT-5 Was a <100× GPT-4 Scaleup]”, Rohan Pandey

link-bibliography
https://www.semianalysis.com/p/microsoft-swallows-openais-core-team#%C2%A7compute-is-king: “Microsoft Swallows OpenAI’s Core Team § Compute Is King”, Dylan Patel, Daniel Nishball

link-bibliography
https://www.theatlantic.com/technology/archive/2023/11/sam-altman-open-ai-chatgpt-chaos/676050/: “Inside the Chaos at OpenAI: Sam Altman’s Weekend of Shock and Drama Began a Year Ago, With the Release of ChatGPT”, Karen Hao, Charlie Warzel

link-bibliography
https://www.lesswrong.com/posts/CfpAXccrBvWpQw9xj/algorithmic-improvement-is-probably-faster-than-scaling-now?commentId=LnyB6PDhazjSXQbAY: “Altman on Scaling”, Jacques Thibs

link-bibliography
https://x.com/davidtayar5/status/1627690520456691712: “Context on the NVIDIA ChatGPT Opportunity—And Ramifications of Large Language Model Enthusiasm”, Morgan Stanley

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]