Bibliography (17):

  1. https://github.com/openai/LHOPT

  2. Chinchilla: Training Compute-Optimal Large Language Models

  3. Proximal Policy Optimization Algorithms

  4. ImageNet Large Scale Visual Recognition Challenge

  5. GPT-3: Language Models are Few-Shot Learners

  6. Decoupled Weight Decay Regularization

  7. SGDR: Stochastic Gradient Descent with Warm Restarts

  8. Cyclical Learning Rates for Training Neural Networks

  9. Human-level performance in 3D multiplayer games with population-based reinforcement learning

  10. 2021-almeida-figure3-lhoptlearnedhyperparameteroptimizationongpt2largewikitext103speedupdouble.jpg

  11. Language Models are Unsupervised Multitask Learners

  12. Pointer Sentinel Mixture Models

  13. Scaling Laws for Neural Language Models