Cost per million tokens: every major LLM (2026)

Premium, mid, and budget LLM pricing for 2026. Input and output per 1M tokens across OpenAI, Anthropic, Google, Mistral, and the open-weight inference providers.

All major LLMs — list price per 1M tokens

ModelTierInput / 1MOutput / 1MContext
Claude Opus 4.5Premium$15$75200K
GPT-4 TurboPremium$10$30128K
Claude Sonnet 4.5Mid$3$15200K
GPT-4oMid$2.50$10128K
Gemini 2.5 ProMid$1.25$52M
Mistral Large 2Mid$2$6128K
Claude Haiku 4.5Budget$1$5200K
Llama 3.3 70B (Together)Budget$0.88$0.88128K
Mixtral 8x22BBudget$1.20$1.2064K
GPT-4o miniBudget$0.15$0.60128K
Gemini 2.5 FlashBudget$0.075$0.301M
Mistral SmallBudget$0.20$0.6032K

Prices reflect public list price mid-2026. Prompt caching, batch API, and committed-use discounts can cut these by 50-90%.

VerticalAPI verdict

Match the tier to the task. Premium for hard reasoning or long agent tasks where one good answer beats ten cheap ones. Mid for production user-facing apps and most coding agents. Budget for high-volume extraction, classification, and RAG retrieval. Stack prompt caching (Anthropic, OpenAI) and Batch API (50% off) for the biggest wins. Through VerticalAPI BYOK you can A/B test the same prompt across every tier in one line at list price.

Get started — BYOK every tier →

Frequently asked questions

What is the cheapest LLM per 1M tokens in 2026?

Gemini 2.5 Flash is the cheapest closed-weight LLM in 2026 at $0.075 per 1M input tokens and $0.30 per 1M output tokens. On the open-weight side, Llama 3.2 3B and Mistral 7B hosted on DeepInfra or Together can drop below $0.10 per 1M output. For raw extraction or classification at scale, these budget tiers cost roughly 100-200x less than premium flagships like Claude Opus or GPT-4 Turbo per token processed. The trade-off is reasoning quality on multi-step tasks.

How are LLM prices structured in 2026?

2026 LLM pricing falls into three tiers. Premium tier ($10+ per 1M output): Claude Opus 4.5, GPT-4 Turbo, advanced reasoning models. Mid tier ($3-10 per 1M output): GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, Mistral Large 2 — the workhorses for production agents. Budget tier ($0.30-3 per 1M output): GPT-4o mini, Claude Haiku 4.5, Gemini Flash, Mistral Small, hosted Llama 3.3 70B and Mixtral 8x22B. Input tokens are typically 4-5x cheaper than output tokens across all tiers.

How much do prompt caching and batch APIs reduce cost?

Anthropic prompt caching reduces cost on cached prompt portions by up to 90% on Claude. OpenAI prompt caching applies automatically and gives 50% off cached tokens. Google Gemini context caching charges separately for cache storage and reads. OpenAI Batch API and Anthropic Message Batches both give 50% off list price for requests that tolerate up to 24 hours of latency. Stacked, caching plus batch can cut effective per-token cost by 70-90% on the right workload (RAG with shared system prompts, batched classification).

Are output tokens really 4x more expensive than input?

Yes. Across all major providers in 2026, output tokens are priced 4-5x higher than input. GPT-4o is $2.50/$10 (4x), Claude Sonnet 4.5 is $3/$15 (5x), Gemini 2.5 Pro is $1.25/$5 (4x), Claude Opus 4.5 is $15/$75 (5x). This reflects the higher compute cost of autoregressive generation versus parallel input processing. For cost optimization, push as much work as possible into prompt engineering (input) and constrain output length aggressively with max_tokens.

Can I compare LLM costs across providers with one API?

Yes. VerticalAPI exposes every major LLM through a single OpenAI-compatible endpoint at https://api.verticalapi.com/v1. You can A/B test the same prompt across GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, Mistral Large 2, Llama 3.3 70B, and Mixtral 8x22B by changing only the model parameter. Because VerticalAPI is BYOK, you pay each provider directly at list price with zero markup, so the cost comparison reflects what you would pay going direct.

Limitations of this comparison

  • List prices are revised 2-4 times per year; the relative ranking between providers shifts more often than the absolute numbers.
  • Open-weight prices on inference providers (Together, Fireworks, DeepInfra, Groq) vary by provider — same Llama 3.3 70B can be 30% cheaper on one than another.
  • Reasoning models (o-series, Claude reasoning, Gemini Deep Think) charge for hidden reasoning tokens that aren't visible in output — actual cost can be 5-20x the apparent per-token rate.
  • Vision, audio, and function-call tokens are sometimes counted differently — read each provider's pricing page for edge cases.
  • Volume commitments, enterprise contracts, and Microsoft/AWS reseller deals can cut list prices by 20-50%.

What may change in 12-24 months

  1. Mid-tier pricing is expected to drop another 30-50% as competition intensifies and inference hardware (Blackwell, MI300X) cheapens.
  2. Budget-tier output may fall below $0.10/1M for the cheapest models — already happening on open-weight inference.
  3. Premium reasoning models will keep getting more expensive per visible token but may produce dramatically less output for the same task.
  4. Per-token pricing may give way to per-task pricing for agentic workloads, especially in Anthropic's roadmap.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • What is the cheapest LLM for RAG at scale in 2026?
  • How much does Anthropic prompt caching actually save?
  • Is OpenAI Batch API worth it for production workloads?
  • When does self-hosting Llama 3.3 beat paying per token?
  • How do reasoning model costs compare to standard LLMs?