Bibliography (21):

  1. https://www.youtube.com/watch?v=oqi0QrbdgdI

  2. https://x.com/quocleix/status/1583523186376785921

  3. https://x.com/hwchung27/status/1583529350015565827

  4. PaLM: Scaling Language Modeling with Pathways

  5. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  6. U-PaLM: Transcending Scaling Laws with 0.1% Extra Compute

  7. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

  8. MMLU: Measuring Massive Multitask Language Understanding

  9. Challenging BIG-Bench Tasks (BBH) and Whether Chain-of-Thought Can Solve Them

  10. TyDiQA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

  11. Language Models are Multilingual Chain-of-Thought Reasoners

  12. https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints

  13. https://huggingface.co/google/flan-t5-large

  14. Self-Consistency Improves Chain-of-Thought Reasoning in Language Models

  15. https://prod.hypermind.com/ngdp/en/showcase2/showcase.html?sc=JSAI

  16. https://www.metaculus.com/questions/11676/mmlu-sota-in-2023-2025/

  17. FLAN: Finetuned Language Models Are Zero-Shot Learners

  18. T0: Multitask Prompted Training Enables Zero-Shot Task Generalization

  19. Tk-Instruct: Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks

  20. https://arxiv.org/pdf/2210.11416.pdf#page=47&org=google

  21. ByT5: Towards a token-free future with pre-trained byte-to-byte models