Bibliography (15):

  1. https://github.com/google/automl/tree/master/lion

  2. https://github.com/lucidrains/lion-pytorch

  3. https://fastxtend.benjaminwarner.dev/optimizer.lion.html

  4. https://x.com/dvruette/status/1625997942703198209

  5. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

  6. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  7. ImageNet Large Scale Visual Recognition Challenge

  8. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era

  9. Contrastive Representation Learning: A Framework and Review

  10. 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed

  11. Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

  12. https://x.com/theshawwn/status/1625681629074137088

  13. signSGD: Compressed Optimization for Non-Convex Problems