Mistral AI via VerticalAPI

Use Mistral Large, Codestral and Pixtral through an OpenAI-compatible endpoint. BYOK with your Mistral key, zero token markup, EU-hosted models for compliance workloads.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <your-mistral-key>

Mistral AI models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Mistral AI models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
mistral-large-latest Mistral Large 2 128K $2 / $6 per 1M tok
codestral-latest Codestral 256K $0.30 / $0.90 per 1M tok — code-tuned
pixtral-large-latest Pixtral Large 128K $2 / $6 per 1M tok — multimodal
ministral-8b-latest Ministral 8B 128K $0.10 / $0.10 per 1M tok — edge

Pricing reflects Mistral AI's rates — you pay Mistral AI directly. VerticalAPI adds zero markup on tokens.

5-line Mistral AI call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

mistral_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="mistral-large-latest",  # Mistral AI
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Mistral AI through us

Zero token markup

You pay Mistral AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Mistral AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Mistral AI to other providers on identical prompts.

Observability built in

Every Mistral AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Mistral AI measured: latency, throughput, error rate

Mistral hosts inference in Paris and Frankfurt, so EU clients see noticeably lower latency than calls to US-based providers. The 2026 benchmark shows Mistral Large beating GPT-4o on TTFT from European regions — a useful detail if your users are in Europe.

MetricValueNotes
p50 TTFT (Mistral Large 2, EU client) ~410 ms Faster than gpt-4o from EU; slower from US-East
p95 TTFT (Mistral Large 2) ~780 ms Tighter tail than US-only providers when called from EU
Tokens per second (Mistral Large) ~120 tok/s Solid streaming throughput on European GPUs
p50 TTFT (Codestral) ~280 ms Optimized for code; faster than the general flagship
Error rate (typical) ~0.4% Mistral's API is mature; rate limits are generous on Build tier

Numbers above are 2026 placeholders pending the next VerticalAPI benchmark harness run. See /benchmark for the full 26-provider comparison.

OpenAI SDK methods that work with Mistral AI

Mistral's API is OpenAI-compatible by design — they ship an explicit OpenAI-shaped endpoint. Coverage through VerticalAPI is essentially 1:1 with a few small caveats below.

  • client.chat.completions.create() — works including stream=True, tools, tool_choice, response_format="json_object".
  • Codestral fill-in-the-middle — POST to /v1/fim/completions with prompt + suffix; VerticalAPI exposes this as a separate endpoint since OpenAI's SDK doesn't model it.
  • Function calling — supported on Mistral Large, Small, and Codestral. Behavior is more conservative than OpenAI's; the model is less eager to call tools.
  • Vision (Pixtral) — image_url message parts work in OpenAI vision format; Pixtral Large handles multi-image prompts.
  • client.embeddings.create() — routes to mistral-embed (1024 dim).
  • Safe mode (safe_prompt) — Mistral-specific parameter; pass via extra_body in the OpenAI SDK; VerticalAPI forwards it untouched.
  • Fine-tuned models — pass your fine-tune ID as model field; works transparently.

What Mistral AI actually costs at 100k MAU

Concrete monthly cost for a chatbot with 100k MAU, 10 turns/user, ~500 input + 150 output tokens per turn. Mistral's pricing is competitive and predictable.

ModelMonthly costWhen to use
ministral-8b-latest ~$84/mo Edge-tier; cheapest serious model after Gemini Flash-8B. Good for lightweight routing/classification.
codestral-latest ~$370/mo Code-specialist. For coding agents that don't need agentic reasoning, beats GPT-4o by 5x cost.
mistral-large-latest ~$2,475/mo EU flagship — 35% cheaper than gpt-4o for ~6 quality points less. Strong choice when EU residency matters.
pixtral-large-latest ~$2,475/mo Same price as Large, plus multimodal vision.

Cost based on provider list price; VerticalAPI adds zero token markup.

Should you pick Mistral AI for your workload?

Mistral's case is narrower than the giants but real. Pick it when:

You have an EU data residency requirement. Mistral hosts inference exclusively in EU datacenters (Paris and Frankfurt) with EU-citizen support contacts and an off-the-shelf DPA template that satisfies most GDPR reviews. For French and German enterprise customers, this is often a procurement-blocker that no US provider can clear without complicated regional deployments. Quality at 81 average is competitive with GPT-4o for everything except the hardest reasoning queries.

You're building a coding agent and want a non-Anthropic alternative. Codestral is purpose-built for code (88 coding score, on par with Sonnet 4.5) at $0.30 input / $0.90 output per 1M — roughly 10x cheaper than Sonnet. The trade-off is that Codestral is specialized for fill-in-the-middle and function-level tasks; for full agentic coding (multi-file refactors, repository-wide reasoning), Sonnet still wins. Many teams use Codestral for inline code completion and Sonnet for chat.

You want open-weights insurance. Mistral publishes weights for some of its models (Mistral 7B, Mixtral 8x22B, Codestral Mamba). If your deployment plan includes "if my provider disappears, I can self-host the same model", Mistral and Meta Llama are the only two production-grade options. The hosted Mistral API gives you a fast managed service today with a clear escape hatch tomorrow.

Specific issues teams hit with Mistral AI

Sharp edges that have cost real production teams real time. Fixes below are battle-tested via the VerticalAPI dashboard logs.

Tool calling under-triggers
Mistral's tool selection is more conservative than OpenAI's — the model often answers from prior knowledge rather than calling a tool, even when a tool is more appropriate. Force it with tool_choice="any" when you need tool use to be reliable, or add explicit "call tool X" instructions in the system prompt.
JSON mode requires explicit instruction
Setting response_format="json_object" alone isn't enough — you also need to instruct the model in the system or user message to output JSON. Otherwise the model may produce text-with-JSON-fragments. OpenAI's JSON mode is more strictly enforced than Mistral's.
Codestral's FIM endpoint isn't /v1/chat/completions
Fill-in-the-middle uses /v1/fim/completions with prompt + suffix fields, not the standard chat schema. The OpenAI SDK doesn't model this — call the endpoint directly via httpx, or use VerticalAPI's exposed wrapper.
Pixtral image limits
Pixtral Large accepts up to 8 images per request; beyond that you'll get a 400 error. Resize large images before upload — Pixtral processes images at 1024x1024 max anyway.
Safe mode rejects technical prompts
Mistral's safe_prompt=True can reject security/pentesting/medical content even when legitimate. For technical applications, set safe_prompt=False (default) and rely on your own content moderation upstream.

Where Mistral AI shines

EU data residency code completion (Codestral) open-weights deployment fine-tuning

Frequently asked questions

What is Mistral and what models do they offer?

Mistral AI is a Paris-based frontier lab. Their 2026 closed models include Mistral Large 2.5 (flagship), Mistral Medium 3, Mistral Small 3 and the Codestral family for code. Open-weight releases include Mixtral 8x22B, Mistral 7B and Mathstral. Mistral models support function calling, JSON mode, multilingual generation (especially strong on French, German, Spanish, Italian) and a 128K context window on Large.

How much does Mistral cost in 2026?

Mistral Large 2.5 is roughly $2 per 1M input tokens and $6 per 1M output via La Plateforme. Mistral Small 3 is around $0.20/$0.60. Codestral is approximately $0.30/$0.90. Open-weight models hosted on Together, Fireworks or DeepInfra cost significantly less per token. Via VerticalAPI BYOK you pay Mistral directly at list price with zero markup.

How do I use Mistral via VerticalAPI BYOK?

Create a key on console.mistral.ai (La Plateforme), paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. VerticalAPI translates OpenAI-style chat completions into Mistral's chat endpoint, preserves function calling, JSON mode and streaming. Billing stays on your Mistral invoice. Open-weight Mistral models can also be routed via Together AI, Fireworks or DeepInfra keys.

What is Mistral best for compared to alternatives?

Mistral wins on EU data sovereignty, multilingual (especially French and other European languages), permissive open-weight licensing, and predictable transparent pricing. Compared to GPT-4o it is cheaper and EU-hosted but less capable on agentic tool use. Compared to Llama 3.3 70B it offers similar quality with a French regulatory anchor. For frontier reasoning, Claude or GPT-5 outperform.

Where is Mistral hosted / data privacy?

Mistral runs primarily on European infrastructure (France, Sweden) with optional AWS and Azure regions. La Plateforme is GDPR-compliant by default, with zero data retention available on enterprise tiers. Open weights can be self-hosted anywhere. Via VerticalAPI BYOK your traffic is proxied through VerticalAPI's EU edge and your Mistral data terms remain intact.

Limitations and trade-offs

  • Mistral Large 2.5 trails GPT-5 and Claude Opus 4.5 on frontier reasoning and coding benchmarks.
  • 128K context window is smaller than Claude (200K) and Gemini (2M).
  • Limited multimodal — vision is recent, no native audio or video understanding.
  • Smaller developer ecosystem and fewer third-party tools than OpenAI or Anthropic.
  • Some open-weight licenses (e.g. Mistral Large) are research-only — check before commercial use.

Where Mistral is heading

  1. Mistral Large 3 expected in 2026 with broader multimodal and longer context.
  2. Expanded Le Chat product line targeting enterprise productivity.
  3. Deeper EU sovereign cloud partnerships (OVH, Scaleway) and on-prem deployments.
  4. More open-weight releases under the permissive Apache 2.0 license to grow developer adoption.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Is Mistral Large 2.5 a viable GPT-4o replacement for EU customers?
  • Codestral vs DeepSeek Coder — which is best for code completion?
  • How does Mistral compare to Llama 3.3 70B on multilingual tasks?
  • Can I self-host Mixtral 8x22B and route through VerticalAPI?
  • Mistral La Plateforme vs Mistral on AWS Bedrock — which to pick?