Mistral Small vs Mistral Large 2.5: pricing, speed, and use cases (2026)

Mistral Small and Mistral Large 2.5 cover the two ends of the Mistral platform in 2026. Below: pricing, function-calling reliability, latency, and where each one wins inside the same vendor lineup.

Mistral Small vs Mistral Large 2.5 — at a glance

DimensionMistral SmallMistral Large 2.5
ProviderMistral AIMistral AI
Context window128K128K
Input price (per 1M tok)$0.20$2
Output price (per 1M tok)$0.60$6
Latency (typical)~300ms TTFT~600ms TTFT
Free tierYes (low quota)No
Best forHigh-volume RAG, classification, summarizationAgent tool calling, careful generation, near-flagship quality

Pick Mistral Small or Mistral Large 2.5?

When to choose Mistral Small

Choose Mistral Small for high-volume short tasks: classification, extractive RAG, summarization, simple Q&A. At $0.20 / $0.60 per 1M tokens it is roughly 10x cheaper than Mistral Large 2.5 and serves the same OpenAI-compatible API on la Plateforme. Latency is also lower (~300ms TTFT).

When to choose Mistral Large 2.5

Choose Mistral Large 2.5 when reliability on multi-step tool chains matters or when output quality on longer generation needs to approach flagship-tier. Mistral Large 2.5 is materially more reliable on JSON-schema output and function calling, at the cost of ~10x higher per-token price than Small.

Run Mistral Small and Mistral Large 2.5 side-by-side

VerticalAPI lets you switch between Mistral Small and Mistral Large 2.5 per-request through a single OpenAI-compatible endpoint. Same SDK, same API key, zero markup on tokens — you pay each provider directly under BYOK.

from openai import OpenAI
client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

# Mistral Small
resp_a = client.chat.completions.create(
    model="mistral-small-latest",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "..."},
)

# Mistral Large 2.5 — same SDK, different model + key
resp_b = client.chat.completions.create(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "..."},
)

Try VerticalAPI free →

VerticalAPI verdict

Use Mistral Small for cost-sensitive high-volume traffic; use Mistral Large 2.5 for agent steps that need reliable tool calling or careful long-form generation. Through VerticalAPI you can switch between them with one model parameter and the same OpenAI-compatible endpoint.

Get started — BYOK both providers →

Frequently asked questions

How much cheaper is Mistral Small than Mistral Large 2.5?

About 10x cheaper. Mistral Small is roughly $0.20 per 1M input and $0.60 per 1M output. Mistral Large 2.5 is roughly $2 / $6 per 1M. That makes Small 10x cheaper on input and 10x cheaper on output at list price.

Which model should I use for RAG?

For most RAG workloads Mistral Small is the right starting point: it handles factual extraction well and the price is low enough to scale. Move to Mistral Large 2.5 only when the answer-quality gap on your specific evals justifies the 10x cost.

Do both support function calling?

Yes, both expose tool-use APIs. Mistral Large 2.5 is materially more reliable at multi-step tool calls and JSON-schema output. Mistral Small can call tools but error rates rise on chains beyond two or three steps.

What is the latency difference?

Mistral Small typically shows ~300ms time-to-first-token. Mistral Large 2.5 lands near 600ms TTFT. Throughput per request is similar. For interactive chat, Small feels noticeably snappier.

How do I route between them via VerticalAPI?

VerticalAPI exposes a single OpenAI-compatible endpoint at https://api.verticalapi.com/v1. Change the model parameter to mistral-small-latest or mistral-large-latest and supply your Mistral key as X-Provider-Key. No token markup — you pay Mistral directly under BYOK.

Limitations of this comparison

  • Mistral price tiers are revised regularly; verify rates against the current vendor page.
  • Mistral Small is fine for classification and extraction but underperforms on long-horizon agent tasks.
  • Benchmark quality between Mistral Small and Large is workload-dependent; run your own evals before committing.
  • Latency figures average across regions; EU-hosted endpoints typically perform better for European traffic.
  • Mistral does not currently offer prompt caching equivalent to Anthropic's, so repeated long prompts are billed in full each time.

What may change in 12-24 months

  1. Mistral is expected to ship a mid-tier model between Small and Large 2.5 within 12 months.
  2. Per-token prices on both tiers are likely to fall as competition intensifies.
  3. Prompt caching for repeat-context workloads may arrive on the Mistral platform.
  4. EU-hosted inference will remain a differentiator for European compliance use cases.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • How does Mistral Small compare to GPT-4o mini for high-volume RAG?
  • When is Mistral Large 2.5 cheaper than Claude Sonnet 4.5 for the same quality?
  • Is Mistral Large 2.5 strong enough to replace GPT-4o in agent loops?
  • What is the cheapest way to A/B test Mistral Small and Large on the same traffic?
  • How does Mistral's EU hosting affect latency for European users?