Mistral Small vs Mistral Large 2.5: 2026 comparison

Side-by-side

Mistral Small vs Mistral Large 2.5 — at a glance

Dimension	Mistral Small	Mistral Large 2.5
Provider	Mistral AI	Mistral AI
Context window	128K	128K
Input price (per 1M tok)	$0.20	$2
Output price (per 1M tok)	$0.60	$6
Latency (typical)	~300ms TTFT	~600ms TTFT
Free tier	Yes (low quota)	No
Best for	High-volume RAG, classification, summarization	Agent tool calling, careful generation, near-flagship quality

When to choose which

Pick Mistral Small or Mistral Large 2.5?

When to choose Mistral Small

Choose Mistral Small for high-volume short tasks: classification, extractive RAG, summarization, simple Q&A. At $0.20 / $0.60 per 1M tokens it is roughly 10x cheaper than Mistral Large 2.5 and serves the same OpenAI-compatible API on la Plateforme. Latency is also lower (~300ms TTFT).

When to choose Mistral Large 2.5

Choose Mistral Large 2.5 when reliability on multi-step tool chains matters or when output quality on longer generation needs to approach flagship-tier. Mistral Large 2.5 is materially more reliable on JSON-schema output and function calling, at the cost of ~10x higher per-token price than Small.

Why not both?

Run Mistral Small and Mistral Large 2.5 side-by-side

VerticalAPI lets you switch between Mistral Small and Mistral Large 2.5 per-request through a single OpenAI-compatible endpoint. Same SDK, same API key, zero markup on tokens — you pay each provider directly under BYOK.

from openai import OpenAI
client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

# Mistral Small
resp_a = client.chat.completions.create(
    model="mistral-small-latest",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "..."},
)

# Mistral Large 2.5 — same SDK, different model + key
resp_b = client.chat.completions.create(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "..."},
)

Try VerticalAPI free →

VerticalAPI verdict

Use Mistral Small for cost-sensitive high-volume traffic; use Mistral Large 2.5 for agent steps that need reliable tool calling or careful long-form generation. Through VerticalAPI you can switch between them with one model parameter and the same OpenAI-compatible endpoint.

Get started — BYOK both providers →

FAQ

Frequently asked questions

How much cheaper is Mistral Small than Mistral Large 2.5?

About 10x cheaper. Mistral Small is roughly $0.20 per 1M input and $0.60 per 1M output. Mistral Large 2.5 is roughly $2 / $6 per 1M. That makes Small 10x cheaper on input and 10x cheaper on output at list price.

Which model should I use for RAG?

For most RAG workloads Mistral Small is the right starting point: it handles factual extraction well and the price is low enough to scale. Move to Mistral Large 2.5 only when the answer-quality gap on your specific evals justifies the 10x cost.

Do both support function calling?

Yes, both expose tool-use APIs. Mistral Large 2.5 is materially more reliable at multi-step tool calls and JSON-schema output. Mistral Small can call tools but error rates rise on chains beyond two or three steps.

What is the latency difference?

Mistral Small typically shows ~300ms time-to-first-token. Mistral Large 2.5 lands near 600ms TTFT. Throughput per request is similar. For interactive chat, Small feels noticeably snappier.

How do I route between them via VerticalAPI?

VerticalAPI exposes a single OpenAI-compatible endpoint at https://api.verticalapi.com/v1. Change the model parameter to mistral-small-latest or mistral-large-latest and supply your Mistral key as X-Provider-Key. No token markup — you pay Mistral directly under BYOK.

Caveats

Limitations of this comparison

Mistral price tiers are revised regularly; verify rates against the current vendor page.
Mistral Small is fine for classification and extraction but underperforms on long-horizon agent tasks.
Benchmark quality between Mistral Small and Large is workload-dependent; run your own evals before committing.
Latency figures average across regions; EU-hosted endpoints typically perform better for European traffic.
Mistral does not currently offer prompt caching equivalent to Anthropic's, so repeated long prompts are billed in full each time.

Outlook

What may change in 12-24 months

Mistral is expected to ship a mid-tier model between Small and Large 2.5 within 12 months.
Per-token prices on both tiers are likely to fall as competition intensifies.
Prompt caching for repeat-context workloads may arrive on the Mistral platform.
EU-hosted inference will remain a differentiator for European compliance use cases.

Keep reading

More head-to-head comparisons

Mistral Large vs Llama 3.3

Closed vs open-weights mid-flagship

Read comparison →

Llama 3.3 vs Mistral Large

Open vs Mistral platform

Read comparison →

Mistral Platform vs OpenAI Platform

Vendor platform comparison

Read comparison →

Mistral Small vs Mistral Large 2.5: pricing, speed, and use cases (2026)

Mistral Small vs Mistral Large 2.5 — at a glance

Pick Mistral Small or Mistral Large 2.5?

When to choose Mistral Small

When to choose Mistral Large 2.5

Run Mistral Small and Mistral Large 2.5 side-by-side

VerticalAPI verdict

Frequently asked questions

Limitations of this comparison

What may change in 12-24 months

Related questions

More head-to-head comparisons