GPT-J-6B: 6B JAX-Based Transformer
Attention Is All You Need
https://github.com/kingoflolz/mesh-transformer-jax
GPT-3: Language Models are Few-Shot Learners
https://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb
https://6b.eleuther.ai/
https://jax.readthedocs.io/en/latest/en/latest/notebooks/xmap_tutorial.html
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
A domain-specific supercomputer for training deep neural networks
https://github.com/kingoflolz
https://x.com/arankomatsuzaki
https://blog.novelai.net/data-efficient-language-transfer-with-gpt-j-45daedaaf35a
https://huggingface.co/VietAI/gpt-j-6B-vietnamese-news
https://github.com/kakaobrain/kogpt
https://github.com/coteries/cedille-ai
https://latitude.io/blog/latitude-roadmap
Evaluating Large Language Models Trained on Code
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Cut the CARP: Fishing for zero-shot story evaluation
https://pone.dev/
https://universalprior.substack.com/p/making-of-ian