“Lm-Human-Preferences”, 2019-09-14 (; backlinks; similar):
Code for the paper ‘Fine-Tuning Language Models from Human Preferences’. Status: Archive (code is provided as-is, no updates expected). We provide code for:
Training reward models from human labels
Fine-tuning language models using those reward models
It does not contain code for generating labels. However, we have released human labels collected for our experiments, at
gs://lm-human-preferences/labels. For those interested, the question and label schemas are simple and documented inlabel_types.py.The code has only been tested using the smallest GPT-2 model (124M parameters). This code has only been tested using Python 3.7.3. Training has been tested on GCE machines with 8 V100s, running Ubuntu 16.04, but development also works on Mac OS X.