In-Context Learning and Induction Heads
Universal Transformers
Attention Is All You Need
D.5: Context Dependence
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
โ index#ssm
[Transclude the forward-link's
context]
Scaling Laws for Acoustic Models
Scaling Laws for Autoregressive Generative Modeling