“Benchmarking LLM Diversity & Creativity”, Gwern2024-12-08 (, , , ; similar)⁠:

Discussion of possible tasks to measure LLM capabilities in soft ‘creative’ tasks like brainstorming or editing, to quantify failures in creative writing domains.