https://github.com/google/automl/tree/master/lion
https://github.com/lucidrains/lion-pytorch
https://fastxtend.benjaminwarner.dev/optimizer.lion.html
https://x.com/dvruette/status/1625997942703198209
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
ImageNet Large Scale Visual Recognition Challenge
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Contrastive Representation Learning: A Framework and Review
1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
https://x.com/theshawwn/status/1625681629074137088
signSGD: Compressed Optimization for Non-Convex Problems