https://github.com/google/BIG-bench
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Self-Consistency Improves Chain-of-Thought Reasoning in Language Models
PaLM: Scaling Language Modeling with Pathways
Evaluating Large Language Models Trained on Code