Chinchilla: Training Compute-Optimal Large Language Models
https://arxiv.org/pdf/2209.14958#page=30&org=deepmind
https://arxiv.org/pdf/2209.14958#page=4&org=deepmind
https://arxiv.org/pdf/2209.14958#page=3&org=deepmind
https://arxiv.org/pdf/2209.14958#page=46&org=deepmind