Bibliography (14):

https://deepmind.google/discover/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval/
https://x.com/drjwrae/status/1488557776754417664
Attention Is All You Need
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TruthfulQA: Measuring How Models Mimic Human Falsehoods
GPT-J-6B: 6B JAX-Based Transformer
Language Models are Unsupervised Multitask Learners
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GPT-3: Language Models are Few-Shot Learners
https://arxiv.org/pdf/2112.11446#page=81&org=deepmind
A Multi-Level Attention Model for Evidence-Based Fact Checking
FEVER: a large-scale dataset for Fact Extraction and VERification
BPEs: Neural Machine Translation of Rare Words with Subword Units
GPT-1: Improving Language Understanding by Generative Pre-Training § Model specifications