Skip to main content

Human Perception at a Red Light

GPT-5 claims a new poem is weak but easily fixed. I challenge it to do so. It does, sort of, but also shows the flaws in autonomous LLM poetry right now.

This essay tests a simple claim by a LLM: that a particular human poem had a real subject but weak realization.

On 2026-03-15, I asked GPT-5.4 Pro rewrite it under explicit formal constraints; it chose to invent a sonnet, a split-screen human/machine poem, and an alliterative version. I then iterated them using scansion and cross-model critique.

The rewrites are better. Constraint improves cadence, imagery, and closure, and the split-screen version in particular turns theme into form rather than merely paraphrasing it.

But the gains are mostly local. None of the rewrites become major poems, because formal optimization cannot supply what is missing: conceptual surplus, allusive depth, and pressure from material outside the poem’s initial premise.

Constraint raises the floor for LLM poetry. It does not raise the ceiling.

I take this as further evidence that LLM poetry benefits from many runs, explicit concept-loading, and large models.

Every task involves constraint,
Solve the thing without complaint;
There are magic links and chains
Forged to loose our rigid brains.
Strictures, structures, though they bind,
Strangely liberate the mind.

James E. Falen1

ChatGPT Pulse brought up a recent poem on the theme of AI: “Artificial neurons can be trained to kill humans without humans overseeing”, K. A. Hays (2026-01-29):

I read on my iPhone, stopped at a red light, & next to my car
a child is playing a street piano fast, counting THREE, TWO, ONE,
yelling over traffic, I’m PLAYING, sun on the child’s face

& fingers skittering on keys—
neurons direct these fingers, a consciousness no one shares
that says, high notes now, then low, & laughter,

& an adult urges the child to come-on-let’s-go, but the child plays a crescendo & says
I AM FINISHING MY SONG—

& artificial intelligence can use recurrent neural networks to create
piano music & AI can drive cars, but my eyes tire,
my eyes are animal eyes with animal need to gaze out

at red lights & be given the useless-lovely data of a sparrow skimming
to a telephone wire, a child at a rainbow-painted stringed instrument,
the sparrow hopping on a wire, the child pressing keys like a question:

low, high? Low, high? Perception a note not played again,
& when the light turns green my car drives,
I am finishing my song. The light is yellow now.

I didn’t care for it, and said as much, and GPT agreed:

…Yes—that’s exactly where it thins out.

The poem’s idea is doing nearly all the work; the language is mostly just transporting it. You can feel that in a few places:

  • The lineation is arbitrary. Breaks don’t create pressure (no enjambment tension, no rhythmic pivot), so as you noted, it survives flattening into prose with minimal loss.

  • The sonic field is underdeveloped. There’s little patterned recurrence—no real alliterative spine, vowel harmonics, or metrical expectation being set up and then varied. The diction is competent but not charged.

  • Syntax is default expository. Clauses proceed in a “thought, elaboration, contrast” pipeline that reads like an essay paragraph broken into lines.

The key problem isn’t that it’s free verse; it’s that it’s uncommitted verse. Compare to something like Jorie Graham or Carl Phillips, where even without strict meter, you still get:

  • controlled breath units (line = perceptual event),

  • recurrence structures (lexical, syntactic, sonic),

  • pressure from delay/suspension.

Here, the closing claim—“my eyes are animal eyes”—should land like a volta, but it isn’t prepared musically. It arrives as a conceptual assertion rather than a sonic inevitability.

How to strengthen it (without turning it into a sonnet):

  1. Impose a constraint you can feel.

    Even a loose accentual scheme (say, 3–4 stresses per line, with periodic deviations) would give the reader something to ride.

  2. Exploit the image system sonically.

    The sparrow/piano/fingers cluster is begging for patterned consonants (sk-, tr-, p-) and percussive echo to mirror the tapping.

  3. Make the line breaks do work.

    Break against syntax so that key words are delayed or doubled in meaning. Right now, most breaks are after-comma pauses.

  4. Engineer the volta.

    Seed “animal” earlier—phonically or imagistically—so the final line resolves a pattern rather than introducing a thesis.

  5. Cut abstraction by ~20%.

    Every abstract noun that remains should earn its place via placement or echo, not just semantic necessity.

Put bluntly: the poem knows what it wants to say, but it hasn’t decided how language must behave to make saying it unavoidable.

This was a bold set of claims, and I became curious if it could indeed fix the poem autonomously without degrading the way that “Spoilage” had (although I didn’t want to push the revisions as far as “Spoilage”).

So I told GPT-5.4 Pro to fix it with my usual brainstorm-style prompt (followed by requiring use of my usual poem commentary & scansion markup to help scaffold revision):

Try rewriting it. Brainstorm 10 detailed revisions aiming at different effects. Revise each one repeatedly. Rate them 1–5 stars and print out the final top 3…for the 3 finalists, apply the Gwern.net Scansion MoS format, and revise and iterate. Critique and analyze the resulting versions.

First Draft

The 3 top GPT-5.4 Pro responses:

  1. Sonnet (first):

    At red, the street lies still beneath a hush,
    The lamps go soft and smear themselves to gold.
    A boy taps wrong notes through the traffic rush,
    Until those wrong notes answer bright and bold.


    It marks the car that moves a breath ahead;
    It holds the lane and warns my foot to wait.
    It reads the painted curb, the stripe, the red;
    But not the grackle riding out the weight.

    My phone reports new systems choose and track;
    A body’s heat through dust without a guide.
    Above, the grackle writes the wind in black.
    And shifts by grams no model can decide.

    Let sensors spare me glass and closing door;
    I live by what no tally can restore.

  2. Split-Screen” (first):

    Lane-lock active; distance maintained.
    In rain, a boy combs sparks from out the keys.

    Pedestrian motion plotted.
    One grackle flexes, black against the span.

    Hazard model tuned overnight.
    One sour note opens outward through the stop.

    Target module cleared in field tests.
    Cold pennies wake along the rails in rain.

    Recommendation: apply brake.
    I brake—for one bird leaning on the gust.

    Recommendation: proceed now.
    I go, and keep the weather, not the map.

  3. Alliterative (first)

    Brake-bloom, bus-hush || blackbird on cable,
    Boy by bus-stop || beats bent keys,

    Screen scans street-lines || spots steel danger,
    Feed touts fire-tools || free of hands.

    Toe-grip on pedal || tin-note, thaw-breath:
    By blood they come || by blink, by bone.

    Circuits can count || the crowd below;
    No wire can weigh || weather home.

Revision

I then required:

Analyze and critique the 3 AI poems as a response to and improvement on the human poem.

Putting it into Kimi K2.5 Thinking, Claude-4.6-opus, & Gemini-3.1-pro-preview for critique, and then having GPT-5.4 Pro revise and update based on the feedback:

Please read carefully, agree/disagree with critiques, and revise all 3 iteratively to improve after brainstorming at least 10 possible improvements to each one in light of the valid or invalid comments from your fellow LLMs.

Second Draft

This yielded the final versions:

  1. Sonnet (final):

    At red, the street lies still beneath a hush,
    The lamps go soft and smear themselves to gold.
    A boy taps wrong notes through the traffic rush;
    “I’m finishing my song”, he shouts in bold.

    The screen brackets the car that moves ahead;
    It holds the lane and warns my foot to wait.
    It reads the painted curb, the stripe, the red;
    But not the grackle riding out the weight.

    My phone reports new systems choose and track;
    A body’s heat through dust without a guide.
    Above, the grackle writes the wind in black
    And shifts by grams no model can decide.

    The light goes green. “I’m finishing my song.”
    I hold one yellow beat; one horn grows long.

  2. Split-Screen (final):

    Lane-lock active; interval held.
    In rain, the boy combs sparks out of the keys.

    Pedestrian vector plotted.
    A grackle flexes black along the wire.

    Hazard threshold tuned overnight.
    He shouts, “I’m finishing my song”, through rain.

    Target module cleared in field tests.
    Cold pennies wake along the rails in rain.

    Recommendation: apply brake.
    I brake—for one bird leaning on the gust.

    Recommendation: proceed now.
    I go, and keep the weather, not the map.

  3. Alliterative (final):

    Brake-lights bloom || bus brakes sigh,
    Boy at curb-piano || batters bright keys,
    Shouts, “I’m finishing!” || back through rain.
    Screen scans street-lines || stripe, curb, red.
    Phone-feed flashes || field-test footage:
    Body-heat boxed || body tagged threat.
    Black grackle balances || bends to crosswind.
    Brake-foot follows || that black leaning.
    Tin rails ring || cold in the rain.
    Circuits count bodies || crowding below.
    No wire can weigh || weather home.

Evaluation

Then I asked GPT-5.4 Pro to compare them:

…Final comparative judgment:

After revision, the hierarchy changes a little. As responses to Hays:

  • split-screen > sonnet > alliterative.

As standalone poems:

  • sonnet ≈ split-screen > alliterative.

The sonnet is now far less bloodless than before. The split-screen is still the most intelligent formal answer to the source poem’s actual problem. The alliterative version no longer fights the material, but it works by becoming something narrower and harder than Hays—a compressed urban spell, not a complete rival.

The real lesson from the 3 external critiques is that the valid complaint was never “these poems are too formal”. The valid complaint was “they had gained form by losing interruption”. These revisions fix that.

Did the formal structure and revision help? Yes.

I think that the constraints and iteration/critique did improve the 3 poems; but that GPT-5.4 Pro is wrong in saying the sonnet is ever as good as the split-screen. I think the split-screen is clearly superior to both, although they have some good points (eg. “grackle” is more interesting than Hays’s “sparrow”). And I think overall, I’d still prefer to read the final split-screen to Hays’s original poem.

But none of them are great. I’d say “good, but not memorable”; I could probably come up with something better than it if I intervened enough.

What Is Missing?

What is missing compared to my better AI poems like “Silver Bird” or “Apollonian #1”? I think the results are consistent with my emphasis on richness of concept and allusions.

The current poems are all relatively “empty”, as they do not benefit from my usual process of building up databanks and sets of words or phrases or ideas to draw on and prime the LLMs. They are small variations on ideas already in the Hays original (like the species of the bird, which is better but still performing the same literary function).

“Split-screen” isn’t held back as badly because it has a cute formal conceit which drives its interest; whereas in the sonnet, I feel acutely how routine and boring it is. In particular, it has an overly tidy and unambiguous ending; Kimi K2.5 Thinking correctly suggests that there are much more interesting endings, like “Recommendation: proceed now. / I—” or “Recommendation: proceed now / Recommendation: proceed now”:

The pat conclusion is a default, not a necessity—the formal conceit doesn’t mandate it, but nothing in the prompt architecture pushes against it either. The machine/human alternation is structurally capable of openness, but GPT defaults to closure because:

  1. The Hays original supplies a narrative arc (red light → green light → departure), and the revision inherits this teleology without questioning it

  2. The “cute” formal conceit is self-sufficient—it doesn’t need to complicate itself to demonstrate its intelligence

  3. No external pressure (your ‘databanks’, a contradictory thematic mandate, a forbidden word or image) forces the ending to earn its resolution or abandon it

What would an unclosed Split-Screen look like? Perhaps:

Recommendation: proceed now.
I—

Or the human line refusing to answer. Or a 7th couplet where the machine line repeats (“Recommendation: proceed now / Recommendation: proceed now”), the human line having already left. The form allows this; the prompt didn’t require it.

Your point about “visible alternatives” is crucial. The revision process critiques execution but not possibility space. It asks “is this line good?” not “should this line exist?” The ‘databank’ method, by contrast, floods the possibility space with incompatible alternatives—medieval bestiary birds against thermal imaging grids—and forces selection under pressure.

The Split-Screen is good within its assumptions. Your method questions the assumptions before the first line.

External pressure. There’s nothing ‘outside’ driving the LLMs out of their comfort zone—nothing like me imposing an out-of-the-blue theme like medieval bestiaries, thermal imaging patents, or Saint Augustine on attention, which would force the chatbot out of its comfort zone and towards radical new possibilities. Something outside the poem’s initial premise has to arrive and start making demands; and the original Hays is too impoverished to do this for the LLM revisions.

The LLMs optimize for all the visible esthetic properties one could desire, assisted by scaffolding, but they are still lacking some engine with fire—surplus, pressure from outside the poem, voice under stress, moral ambiguity, weird necessity or inexplicably good lines that feel not merely improved but inevitable. Where Hays works, it shows that in lines like “I AM FINISHING MY SONG—” or “my eyes are animal eyes with animal need to gaze out”.

To quote GPT-5.4 Pro:

Constraint helped because it forced decisions. It failed because decision-forcing is not the same thing as imaginative enlargement. The next leap will not come from tighter meter alone.

It will come from importing a foreign archive, preserving awkward agency, and letting the machine/human boundary get messier instead of cleaner.