Bibliography (21):

https://www.youtube.com/watch?v=oqi0QrbdgdI
https://x.com/quocleix/status/1583523186376785921
https://x.com/hwchung27/status/1583529350015565827
PaLM: Scaling Language Modeling with Pathways
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
MMLU: Measuring Massive Multitask Language Understanding
Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them
TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
Language Models are Multilingual Chain-of-Thought Reasoners
https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints
https://huggingface.co/google/flan-t5-large
Self-Consistency Improves Chain-of-Thought Reasoning in Language Models
https://prod.hypermind.com/ngdp/en/showcase2/showcase.html?sc=JSAI
https://www.metaculus.com/questions/11676/mmlu-sota-in-2023-2025/
FLAN: Finetuned Language Models Are Zero-Shot Learners
T0: Multitask Prompted Training Enables Zero-Shot Task Generalization
Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
https://arxiv.org/pdf/2210.11416.pdf#page=47&org=google
ByT5: Towards a token-free future with pre-trained byte-to-byte models