Bibliography (15):

  1. What learning algorithm is in-context learning? Investigations with linear models

  2. Transformers learn in-context by gradient descent

  3. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

  4. CausalLM is not optimal for in-context learning

  5. An Explanation of In-context Learning as Implicit Bayesian Inference

  6. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

  7. Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

  8. Schema-learning and rebinding as mechanisms of in-context learning and emergence