“Cut the CARP: Fishing for Zero-Shot Story Evaluation”, Shahbul, Matiana, J. R. Smith, Ryan Teehan, Louis Castricato, Stella Biderman, Leo Gao, Spencer Frazier2021-10-06 (, ; similar)⁠:

Recent advances in large-scale language models (Raffel et al 2019; Brown et al 2020) have brought qualitative and quantitative improvements in machine-driven text generation. Despite this, generation and evaluation of machine-generated narrative text remains a challenging problem. Objective evaluation of computationally-generated stories may be prohibitively expensive, require meticulously annotated datasets, or may not adequately measure the logical coherence of a generated story’s narratological structure.

Informed by recent advances in contrastive learning (Radford et al 2021), we present Contrastive Authoring and Reviewing Pairing (CARP): a scalable, efficient method for performing qualitatively superior, zero-shot evaluation of stories. We show a strong correlation between human evaluation of stories and those of CARP. Model outputs more correlate with corresponding human input than those language-model based methods which utilize finetuning or prompt engineering approaches. We also present and analyze the Story-Critique Dataset, a new corpora composed of 1.3 million aligned story-critique pairs derived from over 80,000 stories. We expect this corpus to be of interest to NLP researchers.