-
https://huggingface.co/google/gemma-2-2b-it
-
https://blog.google/technology/developers/google-gemma-2/
-
Attention Is All You Need
-
Longformer: The Long-Document Transformer
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
-
Distilling the Knowledge in a Neural Network