https://huggingface.co/google/gemma-2-2b-it
https://blog.google/technology/developers/google-gemma-2/
Attention Is All You Need
Longformer: The Long-Document Transformer
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Distilling the Knowledge in a Neural Network