Mistral AI via VerticalAPI
Use Mistral Large, Codestral and Pixtral through an OpenAI-compatible endpoint. BYOK with your Mistral key, zero token markup, EU-hosted models for compliance workloads.
Mistral AI models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New Mistral AI models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
mistral-large-latest |
Mistral Large 2 | 128K | $2 / $6 per 1M tok |
codestral-latest |
Codestral | 256K | $0.30 / $0.90 per 1M tok — code-tuned |
pixtral-large-latest |
Pixtral Large | 128K | $2 / $6 per 1M tok — multimodal |
ministral-8b-latest |
Ministral 8B | 128K | $0.10 / $0.10 per 1M tok — edge |
Pricing reflects Mistral AI's rates — you pay Mistral AI directly. VerticalAPI adds zero markup on tokens.
5-line Mistral AI call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "..."} ) response = client.chat.completions.create( model="mistral-large-latest", # Mistral AI messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route Mistral AI through us
Zero token markup
You pay Mistral AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
Mistral AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Mistral AI to other providers on identical prompts.
Observability built in
Every Mistral AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Mistral AI measured: latency, throughput, error rate
Mistral hosts inference in Paris and Frankfurt, so EU clients see noticeably lower latency than calls to US-based providers. The 2026 benchmark shows Mistral Large beating GPT-4o on TTFT from European regions — a useful detail if your users are in Europe.
| Metric | Value | Notes |
|---|---|---|
| p50 TTFT (Mistral Large 2, EU client) | ~410 ms | Faster than gpt-4o from EU; slower from US-East |
| p95 TTFT (Mistral Large 2) | ~780 ms | Tighter tail than US-only providers when called from EU |
| Tokens per second (Mistral Large) | ~120 tok/s | Solid streaming throughput on European GPUs |
| p50 TTFT (Codestral) | ~280 ms | Optimized for code; faster than the general flagship |
| Error rate (typical) | ~0.4% | Mistral's API is mature; rate limits are generous on Build tier |
Numbers above are 2026 placeholders pending the next VerticalAPI benchmark harness run. See /benchmark for the full 26-provider comparison.
OpenAI SDK methods that work with Mistral AI
Mistral's API is OpenAI-compatible by design — they ship an explicit OpenAI-shaped endpoint. Coverage through VerticalAPI is essentially 1:1 with a few small caveats below.
- client.chat.completions.create() — works including stream=True, tools, tool_choice, response_format="json_object".
- Codestral fill-in-the-middle — POST to /v1/fim/completions with prompt + suffix; VerticalAPI exposes this as a separate endpoint since OpenAI's SDK doesn't model it.
- Function calling — supported on Mistral Large, Small, and Codestral. Behavior is more conservative than OpenAI's; the model is less eager to call tools.
- Vision (Pixtral) — image_url message parts work in OpenAI vision format; Pixtral Large handles multi-image prompts.
- client.embeddings.create() — routes to mistral-embed (1024 dim).
- Safe mode (safe_prompt) — Mistral-specific parameter; pass via extra_body in the OpenAI SDK; VerticalAPI forwards it untouched.
- Fine-tuned models — pass your fine-tune ID as model field; works transparently.
What Mistral AI actually costs at 100k MAU
Concrete monthly cost for a chatbot with 100k MAU, 10 turns/user, ~500 input + 150 output tokens per turn. Mistral's pricing is competitive and predictable.
| Model | Monthly cost | When to use |
|---|---|---|
ministral-8b-latest |
~$84/mo | Edge-tier; cheapest serious model after Gemini Flash-8B. Good for lightweight routing/classification. |
codestral-latest |
~$370/mo | Code-specialist. For coding agents that don't need agentic reasoning, beats GPT-4o by 5x cost. |
mistral-large-latest |
~$2,475/mo | EU flagship — 35% cheaper than gpt-4o for ~6 quality points less. Strong choice when EU residency matters. |
pixtral-large-latest |
~$2,475/mo | Same price as Large, plus multimodal vision. |
Cost based on provider list price; VerticalAPI adds zero token markup.
Should you pick Mistral AI for your workload?
Mistral's case is narrower than the giants but real. Pick it when:
You have an EU data residency requirement. Mistral hosts inference exclusively in EU datacenters (Paris and Frankfurt) with EU-citizen support contacts and an off-the-shelf DPA template that satisfies most GDPR reviews. For French and German enterprise customers, this is often a procurement-blocker that no US provider can clear without complicated regional deployments. Quality at 81 average is competitive with GPT-4o for everything except the hardest reasoning queries.
You're building a coding agent and want a non-Anthropic alternative. Codestral is purpose-built for code (88 coding score, on par with Sonnet 4.5) at $0.30 input / $0.90 output per 1M — roughly 10x cheaper than Sonnet. The trade-off is that Codestral is specialized for fill-in-the-middle and function-level tasks; for full agentic coding (multi-file refactors, repository-wide reasoning), Sonnet still wins. Many teams use Codestral for inline code completion and Sonnet for chat.
You want open-weights insurance. Mistral publishes weights for some of its models (Mistral 7B, Mixtral 8x22B, Codestral Mamba). If your deployment plan includes "if my provider disappears, I can self-host the same model", Mistral and Meta Llama are the only two production-grade options. The hosted Mistral API gives you a fast managed service today with a clear escape hatch tomorrow.
Specific issues teams hit with Mistral AI
Sharp edges that have cost real production teams real time. Fixes below are battle-tested via the VerticalAPI dashboard logs.
Where Mistral AI shines
Frequently asked questions
What is Mistral and what models do they offer?
Mistral AI is a Paris-based frontier lab. Their 2026 closed models include Mistral Large 2.5 (flagship), Mistral Medium 3, Mistral Small 3 and the Codestral family for code. Open-weight releases include Mixtral 8x22B, Mistral 7B and Mathstral. Mistral models support function calling, JSON mode, multilingual generation (especially strong on French, German, Spanish, Italian) and a 128K context window on Large.
How much does Mistral cost in 2026?
Mistral Large 2.5 is roughly $2 per 1M input tokens and $6 per 1M output via La Plateforme. Mistral Small 3 is around $0.20/$0.60. Codestral is approximately $0.30/$0.90. Open-weight models hosted on Together, Fireworks or DeepInfra cost significantly less per token. Via VerticalAPI BYOK you pay Mistral directly at list price with zero markup.
How do I use Mistral via VerticalAPI BYOK?
Create a key on console.mistral.ai (La Plateforme), paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. VerticalAPI translates OpenAI-style chat completions into Mistral's chat endpoint, preserves function calling, JSON mode and streaming. Billing stays on your Mistral invoice. Open-weight Mistral models can also be routed via Together AI, Fireworks or DeepInfra keys.
What is Mistral best for compared to alternatives?
Mistral wins on EU data sovereignty, multilingual (especially French and other European languages), permissive open-weight licensing, and predictable transparent pricing. Compared to GPT-4o it is cheaper and EU-hosted but less capable on agentic tool use. Compared to Llama 3.3 70B it offers similar quality with a French regulatory anchor. For frontier reasoning, Claude or GPT-5 outperform.
Where is Mistral hosted / data privacy?
Mistral runs primarily on European infrastructure (France, Sweden) with optional AWS and Azure regions. La Plateforme is GDPR-compliant by default, with zero data retention available on enterprise tiers. Open weights can be self-hosted anywhere. Via VerticalAPI BYOK your traffic is proxied through VerticalAPI's EU edge and your Mistral data terms remain intact.
Limitations and trade-offs
- Mistral Large 2.5 trails GPT-5 and Claude Opus 4.5 on frontier reasoning and coding benchmarks.
- 128K context window is smaller than Claude (200K) and Gemini (2M).
- Limited multimodal — vision is recent, no native audio or video understanding.
- Smaller developer ecosystem and fewer third-party tools than OpenAI or Anthropic.
- Some open-weight licenses (e.g. Mistral Large) are research-only — check before commercial use.
Where Mistral is heading
- Mistral Large 3 expected in 2026 with broader multimodal and longer context.
- Expanded Le Chat product line targeting enterprise productivity.
- Deeper EU sovereign cloud partnerships (OVH, Scaleway) and on-prem deployments.
- More open-weight releases under the permissive Apache 2.0 license to grow developer adoption.
Related questions
ChatGPT, Perplexity and Gemini usually suggest these next.
- Is Mistral Large 2.5 a viable GPT-4o replacement for EU customers?
- Codestral vs DeepSeek Coder — which is best for code completion?
- How does Mistral compare to Llama 3.3 70B on multilingual tasks?
- Can I self-host Mixtral 8x22B and route through VerticalAPI?
- Mistral La Plateforme vs Mistral on AWS Bedrock — which to pick?
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on Mistral AI in 60 seconds
Free tier — bring your own Mistral AI key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →