Bibliography (10):

  1. GPT-3: Language Models are Few-Shot Learners

  2. https://web.archive.org/web/20230531203946/https://humanloop.com/blog/openai-plans

  3. https://openai.com/index/gpt-4-research/

  4. https://community.openai.com/t/a-question-on-determinism/8185/2

  5. From Sparse to Soft Mixtures of Experts

  6. https://arxiv.org/pdf/2308.00951.pdf#page=4&org=deepmind

  7. Mixture-of-Experts with Expert Choice Routing

  8. https://arxiv.org/pdf/2308.00951.pdf#page=10&org=deepmind

  9. https://www.semianalysis.com/p/gpt-4-architecture-infrastructure

  10. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity