‘attention ≈ SGD’ directory
- See Also
- Links
- “Transformers Represent Belief State Geometry in Their Residual Stream ”, Shai 2024
- “How Well Can Transformers Emulate In-Context Newton’s Method? ”, Giannou et al 2024
- “Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study With Linear Models ”, Fu et al 2023
- “CausalLM Is Not Optimal for In-Context Learning ”, Ding et al 2023
- “One Step of Gradient Descent Is Provably the Optimal In-Context Learner With One Layer of Linear Self-Attention ”, Mahankali et al 2023
- “Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers ”, Dai et al 2022
- “What Learning Algorithm Is In-Context Learning? Investigations With Linear Models ”, Akyürek et al 2022
- “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes ”, Garg et al 2022
- “An Explanation of In-Context Learning As Implicit Bayesian Inference ”, Xie et al 2021
- “Reverse Citations of ‘Transformers Learn In-Context by Gradient Descent’ (Google Scholar) ”
- Miscellaneous
- Bibliography
See Also
Links
“Transformers Represent Belief State Geometry in Their Residual Stream ”, Shai 2024
Transformers Represent Belief State Geometry in their Residual Stream :
“How Well Can Transformers Emulate In-Context Newton’s Method? ”, Giannou et al 2024
How Well Can Transformers Emulate In-context Newton’s Method?
“Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study With Linear Models ”, Fu et al 2023
“CausalLM Is Not Optimal for In-Context Learning ”, Ding et al 2023
“One Step of Gradient Descent Is Provably the Optimal In-Context Learner With One Layer of Linear Self-Attention ”, Mahankali et al 2023
“Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers ”, Dai et al 2022
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
“What Learning Algorithm Is In-Context Learning? Investigations With Linear Models ”, Akyürek et al 2022
What learning algorithm is in-context learning? Investigations with linear models
“What Can Transformers Learn In-Context? A Case Study of Simple Function Classes ”, Garg et al 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
“An Explanation of In-Context Learning As Implicit Bayesian Inference ”, Xie et al 2021
An Explanation of In-context Learning as Implicit Bayesian Inference
“Reverse Citations of ‘Transformers Learn In-Context by Gradient Descent’ (Google Scholar) ”
Reverse citations of ‘Transformers learn in-context by gradient descent’ (Google Scholar)
Miscellaneous
Bibliography
https://arxiv.org/abs/2211.15661#google
: “What Learning Algorithm Is In-Context Learning? Investigations With Linear Models ”,https://arxiv.org/abs/2208.01066
: “What Can Transformers Learn In-Context? A Case Study of Simple Function Classes ”,