“Benchmarking LLM Diversity & Creativity”, Gwern2024-12-08 (GPT poetry, RL exploration, AI mode collapse, Fermi problems; similar):
Discussion of possible tasks to measure LLM capabilities in soft ‘creative’ tasks like brainstorming or editing, to quantify failures in creative writing domains. Similar Links:
Discussion of possible tasks to measure LLM capabilities in soft ‘creative’ tasks like brainstorming or editing, to quantify failures in creative writing domains.
Similar Links: