âMistral-7Bâ, 2023-10-10 ()â :
We introduce Mistral-7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency.
Mistral-7B outperforms LLaMA-2 13B across all evaluated benchmarks, and LLaMa-1-34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost.
We also provide a model fine-tuned to follow instructions, Mistral-7B-Instruct, that surpasses the LLaMA-2 13BâChat model both on human and automated benchmarks.
Our models are released under the Apache 2.0 license.
View PDF: