Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
GPT-3: Language Models are Few-Shot Learners
Program Synthesis with Large Language Models
MMLU: Measuring Massive Multitask Language Understanding
Who Models the Models That Model Models? An Exploration of GPT-3’s In-Context Model Fitting Ability
The MovieLens Datasets: History and Context
Random_ai_poems.txt
https://tedunderwood.com/2021/02/02/why-sf-hasnt-prepared-us-to-imagine-machine-learning/