Bibliography (6):

  1. https://huggingface.co/google/gemma-2-2b-it

  2. https://blog.google/technology/developers/google-gemma-2/

  3. Attention Is All You Need

  4. Longformer: The Long-Document Transformer

  5. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

  6. Distilling the Knowledge in a Neural Network