“Lm-Human-Preferences”, Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Christiano, Geoffrey Irving2019-09-14 (, ; backlinks; similar)⁠:

Code for the paper ‘Fine-Tuning Language Models from Human Preferences’. Status: Archive (code is provided as-is, no updates expected). We provide code for:

It does not contain code for generating labels. However, we have released human labels collected for our experiments, at gs://lm-human-preferences/labels. For those interested, the question and label schemas are simple and documented in label_types.py.

The code has only been tested using the smallest GPT-2 model (124M parameters). This code has only been tested using Python 3.7.3. Training has been tested on GCE machines with 8 V100s, running Ubuntu 16.04, but development also works on Mac OS X.