Bibliography (15):

  1. In-Context Learning and Induction Heads

  2. Universal Transformers

  3. Attention Is All You Need

  4. D.5: Context Dependence

  5. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context

  6. โ€‹ index#ssm

    [Transclude the forward-link's context]

  7. Scaling Laws for Acoustic Models

  8. Scaling Laws for Autoregressive Generative Modeling