-
In-Context Learning and Induction Heads
-
Universal Transformers
-
Attention Is All You Need
-
D.5: Context Dependence
-
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
-
โ index#ssm
[Transclude the forward-link's context]
-
Scaling Laws for Acoustic Models
-
Scaling Laws for Autoregressive Generative Modeling
-