Less than two weeks ago, EleutherAI announced their latest open source language model, GPT-NeoX-20B. Today, we’re excited to announce that Forefront is the first platform where you can fine-tune GPT-NeoX, enabling our customers to train the largest open source language model on any natural language processing or understanding task. Start fine-tuning GPT-NeoX for free
The same fine-tuning experience our customers have come to know with GPT-J will be offered for GPT-NeoX including free fine-tuning, JSON Lines and text file support, test prompts, Weights & Biases integration, and control over hyperparameters like epochs and checkpoints. We look forward to seeing all the ways our customers will fine-tune GPT-NeoX models to solve complex NLP problems at scale. Let’s take a closer look at fine-tuning.
What is fine-tuning?
Recent research in Natural Language Processing (NLP) has led to the release of multiple large transformer-based language models (LLMs) like OpenAI’s GPT-[2,3], EleutherAI’s GPT-[Neo, J], and most recently, GPT-NeoX-20B, a 20 billion parameter language model, by EleutherAI. One of the most impactful outcomes of this research has been the finding that the performance of LLMs scales predictably as a power-law with the number of parameters; the downside of scaling parameters being the increased cost to fine-tune and inference. For those not impressed by the leap of tunable parameters now in the tens of billions, the performance that these models can achieve on a variety of tasks after fine-tuning just a few epochs on as little as 100 training examples is where you start to see the value.
Fine-tuning refers to the practice of further training language models on a dataset to achieve better performance on a specific task. This practice can enable a model to outperform one 10x its size on virtually any task. As such, fine-tuned models are the majority of models deployed in production on the Forefront platform and where businesses get the most value.
Until now, one had to choose between GPT-J’s 6 billion parameters and GPT-3 Davinci’s 175 billion parameters. The former model small enough to fine-tune and inference cost efficiently, but not big enough to perform well on complex tasks. The latter model big enough to perform well on complex tasks, but incredibly expensive to fine-tune and inference. Enter GPT-NeoX-20B, and solving many more complex NLP tasks at scale starts to look doable. Let’s look at how GPT-NeoX fine-tuned on various tasks compares to vanilla GPT-NeoX and GPT-3 Davinci.
Text summarization
Summarize text into a few sentences.
Emotion classification
Classify text as an emotion.
Question answering
Answer natural language questions about provided text.
Chat summarization
Summarize dialogue and transcripts.
Content generation
Write a paragraph based on a topic and bullet point.
Question answering with context
Answer natural language questions based on the provided information and scenario.
Chatbot with personality
Imitate Elon Musk in a conversation.
Blog idea generation
Generate blog ideas based on a company name and product description.
Blog Outline
Provide a blog outline based on a topic
How to fine-tune GPT-NeoX on Forefront
The first (and most important) step to fine-tuning a model is to prepare a dataset. A fine-tuning dataset can be in one of two formats on Forefront: JSON Lines or plain text file (UTF-8 encoding). For the purpose of this example, we’ll format our dataset as JSON Lines where each example is a prompt-completion pair. Here are some example dataset formats for the emotion classification, text summarization, question answering, and chat summarization use cases above.
After uploading your dataset, you can set the number of epochs your model will train for. Epochs refer to the number of complete passes through a training dataset, or put another way, how many times a model will “see” each training example in your dataset. A range of 2-4 epochs is typically recommended depending on the size of your dataset.
Next, you’ll set a number of checkpoints. Checkpoints refer to how many model versions will be saved throughout training. Training a model for the optimal amount of time is incredibly important, and checkpoints lets you easily find the optimal time by comparing performance between models at different points during training. Performance is compared by setting test prompts.
Test prompts are a simple method to validate the performance of your model checkpoints. They work by adding prompts and parameters for each model checkpoint to provide completions. After training, you can review the completions from each checkpoint to find the best performing model.
Alternative ways to fine-tune GPT-NeoX
Alternatively, you could fine-tune GPT-NeoX on your own infrastructure. To do this, you'll need at least 8 NVIDIA A100s, A40s, or A6000s and use the NeoX Github repo to preprocess your dataset and run the training script. The script will need to be run with the many degrees of parallelism that EleutherAI's repo supports.
Helpful Tips
These tips are meant as loose guidelines and experimentation is encouraged.
At Forefront, we believe building a simple, free experience for fine-tuning will lower the cost of experimentation with large language models enabling businesses to solve a variety of complex NLP problems. If you have any ideas on how we can further improve the fine-tuning experience, please get in touch with our team. Don't have access to the Forefront platform? Get access