“Mistral-7B”, Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, LĂ©lio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, TimothĂ©e Lacroix, William El Sayed2023-10-10 ()⁠:

We introduce Mistral-7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency.

Mistral-7B outperforms LLaMA-2 13B across all evaluated benchmarks, and LLaMa-1-34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost.

We also provide a model fine-tuned to follow instructions, Mistral-7B-Instruct, that surpasses the LLaMA-2 13B—Chat model both on human and automated benchmarks.

Our models are released under the Apache 2.0 license.