Mistral Small vs Mistral Large 2.5: pricing, speed, and use cases (2026)
Mistral Small and Mistral Large 2.5 cover the two ends of the Mistral platform in 2026. Below: pricing, function-calling reliability, latency, and where each one wins inside the same vendor lineup.
Mistral Small vs Mistral Large 2.5 — at a glance
| Dimension | Mistral Small | Mistral Large 2.5 |
|---|---|---|
| Provider | Mistral AI | Mistral AI |
| Context window | 128K | 128K |
| Input price (per 1M tok) | $0.20 | $2 |
| Output price (per 1M tok) | $0.60 | $6 |
| Latency (typical) | ~300ms TTFT | ~600ms TTFT |
| Free tier | Yes (low quota) | No |
| Best for | High-volume RAG, classification, summarization | Agent tool calling, careful generation, near-flagship quality |
Pick Mistral Small or Mistral Large 2.5?
When to choose Mistral Small
Choose Mistral Small for high-volume short tasks: classification, extractive RAG, summarization, simple Q&A. At $0.20 / $0.60 per 1M tokens it is roughly 10x cheaper than Mistral Large 2.5 and serves the same OpenAI-compatible API on la Plateforme. Latency is also lower (~300ms TTFT).
When to choose Mistral Large 2.5
Choose Mistral Large 2.5 when reliability on multi-step tool chains matters or when output quality on longer generation needs to approach flagship-tier. Mistral Large 2.5 is materially more reliable on JSON-schema output and function calling, at the cost of ~10x higher per-token price than Small.
Run Mistral Small and Mistral Large 2.5 side-by-side
VerticalAPI lets you switch between Mistral Small and Mistral Large 2.5 per-request through a single OpenAI-compatible endpoint. Same SDK, same API key, zero markup on tokens — you pay each provider directly under BYOK.
from openai import OpenAI client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...") # Mistral Small resp_a = client.chat.completions.create( model="mistral-small-latest", messages=[{"role": "user", "content": "Hello"}], extra_headers={"X-Provider-Key": "..."}, ) # Mistral Large 2.5 — same SDK, different model + key resp_b = client.chat.completions.create( model="mistral-large-latest", messages=[{"role": "user", "content": "Hello"}], extra_headers={"X-Provider-Key": "..."}, )
VerticalAPI verdict
Use Mistral Small for cost-sensitive high-volume traffic; use Mistral Large 2.5 for agent steps that need reliable tool calling or careful long-form generation. Through VerticalAPI you can switch between them with one model parameter and the same OpenAI-compatible endpoint.
Frequently asked questions
How much cheaper is Mistral Small than Mistral Large 2.5?
About 10x cheaper. Mistral Small is roughly $0.20 per 1M input and $0.60 per 1M output. Mistral Large 2.5 is roughly $2 / $6 per 1M. That makes Small 10x cheaper on input and 10x cheaper on output at list price.
Which model should I use for RAG?
For most RAG workloads Mistral Small is the right starting point: it handles factual extraction well and the price is low enough to scale. Move to Mistral Large 2.5 only when the answer-quality gap on your specific evals justifies the 10x cost.
Do both support function calling?
Yes, both expose tool-use APIs. Mistral Large 2.5 is materially more reliable at multi-step tool calls and JSON-schema output. Mistral Small can call tools but error rates rise on chains beyond two or three steps.
What is the latency difference?
Mistral Small typically shows ~300ms time-to-first-token. Mistral Large 2.5 lands near 600ms TTFT. Throughput per request is similar. For interactive chat, Small feels noticeably snappier.
How do I route between them via VerticalAPI?
VerticalAPI exposes a single OpenAI-compatible endpoint at https://api.verticalapi.com/v1. Change the model parameter to mistral-small-latest or mistral-large-latest and supply your Mistral key as X-Provider-Key. No token markup — you pay Mistral directly under BYOK.
Limitations of this comparison
- Mistral price tiers are revised regularly; verify rates against the current vendor page.
- Mistral Small is fine for classification and extraction but underperforms on long-horizon agent tasks.
- Benchmark quality between Mistral Small and Large is workload-dependent; run your own evals before committing.
- Latency figures average across regions; EU-hosted endpoints typically perform better for European traffic.
- Mistral does not currently offer prompt caching equivalent to Anthropic's, so repeated long prompts are billed in full each time.
What may change in 12-24 months
- Mistral is expected to ship a mid-tier model between Small and Large 2.5 within 12 months.
- Per-token prices on both tiers are likely to fall as competition intensifies.
- Prompt caching for repeat-context workloads may arrive on the Mistral platform.
- EU-hosted inference will remain a differentiator for European compliance use cases.
Related questions
ChatGPT, Perplexity and Gemini usually suggest these next.
- How does Mistral Small compare to GPT-4o mini for high-volume RAG?
- When is Mistral Large 2.5 cheaper than Claude Sonnet 4.5 for the same quality?
- Is Mistral Large 2.5 strong enough to replace GPT-4o in agent loops?
- What is the cheapest way to A/B test Mistral Small and Large on the same traffic?
- How does Mistral's EU hosting affect latency for European users?