Bibliography (22):

  1. GPT-J-6B: 6B JAX-Based Transformer

  2. Attention Is All You Need

  3. https://github.com/kingoflolz/mesh-transformer-jax

  4. GPT-3: Language Models are Few-Shot Learners

  5. https://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb

  6. https://6b.eleuther.ai/

  7. https://jax.readthedocs.io/en/latest/en/latest/notebooks/xmap_tutorial.html

  8. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  9. A domain-specific supercomputer for training deep neural networks

  10. https://github.com/kingoflolz

  11. https://x.com/arankomatsuzaki

  12. https://blog.novelai.net/data-efficient-language-transfer-with-gpt-j-45daedaaf35a

  13. https://huggingface.co/VietAI/gpt-j-6B-vietnamese-news

  14. https://github.com/kakaobrain/kogpt

  15. https://github.com/coteries/cedille-ai

  16. https://latitude.io/blog/latitude-roadmap

  17. Evaluating Large Language Models Trained on Code

  18. TruthfulQA: Measuring How Models Mimic Human Falsehoods

  19. Cut the CARP: Fishing for zero-shot story evaluation

  20. https://pone.dev/

  21. https://universalprior.substack.com/p/making-of-ian