“Bits of Grass: Does GPT Already Know How to Write like Whitman?”, Piotr Sawicki, Marek Grzes, Fabricio Goes, Dan Brown, Max Peeperkorn, Aisha Khatun2023-05-10 (, , ; backlinks)⁠:

[see: mode collapse] This study examines the ability of GPT-3.5, gpt-3.5-turbo (ChatGPT) and GPT-4 models to generate poems in the style of specific authors using zero-shot and many-shot prompts (which use the maximum context length of 8,192 tokens).

We assess the performance of models that are not fine-tuned for generating poetry in the style of specific authors [see Sawicki et al 2023b for finetuning experiments, which are successful because they are not affected by RLHF tuning / mode collapse], via automated evaluation.

Our findings indicate that without fine-tuning, even when provided with the maximum number of 17 poem examples (8192 tokens) in the prompt, these models do not generate poetry in the desired style.

…While experimenting with poetry generation from consecutive versions of GPT, we have observed that the models produce poems of increasing level of complexity and length; however, the requested style is clearly not preserved. For example, Walt Whitman’s poetry does not follow the ‘four lines in a stanza’ structure, and does not use rhyming (Bohan1995). The majority of poems that we generated ‘in the style of Walt Whitman’ do follow the ‘four lines in a stanza’ structure and use rhyming. This, in fact, applies to most poetry generated from GPT models (including GPT-4). Only rarely will GPT deviate from this specific structure, and even then, the style does not match that of the requested author. This applies both to zero-shot prompting (where the prompt contains only the instruction to write a poem in the style of the specific author) and few-shot prompting (where in the prompt, apart from the instruction, we provide as examples a few poems by the original author). For that matter, even in a multi-step conversation with ChatGPT (GPT-3.5-turbo) and GPT-4, when the prompt highlights that the generated poems have been in 4-line stanzas with rhyme, and that the desired output should not have this structure, the model, for the most of time, still generates 4-line stanzas with rhyme.

…When examining the dataset generated from the 17-poem prompts, we have observed that only about 25% of generated poems have deviated from the structured/rhymed style and on the surface have resembled Whitman’s poetry.