-
https://github.com/google/automl/tree/master/lion
-
https://github.com/lucidrains/lion-pytorch
-
https://fastxtend.benjaminwarner.dev/optimizer.lion.html
-
https://x.com/dvruette/status/1625997942703198209
-
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
-
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
-
ImageNet Large Scale Visual Recognition Challenge
-
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
-
Contrastive Representation Learning: A Framework and Review
-
1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed
-
Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization
-
https://x.com/theshawwn/status/1625681629074137088
-
signSGD: Compressed Optimization for Non-Convex Problems
-