“GPT-3 Nonfiction § PDF Cleaning”, Gwern2020-06-19 (, )⁠:

Nonfiction writing by OpenAI’s GPT-3 model, testing logic, commonsense reasoning, anagrams, PDF/OCR cleaning, creative nonfiction, etc

GPT-3 can clean up OCR errors and miscellaneous formatting problems as a rewrite task given some few-shot examples; I provide a Python script using the openai Python library which can be used on the CLI to fix up paper abstracts.

Instruct-GPT-3 models can do this zero-shot simply by prompting “Correct the OCR errors:”, simplifying the prompt & saving tokens.