Cost per Million Tokens: All LLMs Compared (2026)

2026 pricing matrix

All major LLMs — list price per 1M tokens

Model	Tier	Input / 1M	Output / 1M	Context
Claude Opus 4.5	Premium	$15	$75	200K
GPT-4 Turbo	Premium	$10	$30	128K
Claude Sonnet 4.5	Mid	$3	$15	200K
GPT-4o	Mid	$2.50	$10	128K
Gemini 2.5 Pro	Mid	$1.25	$5	2M
Mistral Large 2	Mid	$2	$6	128K
Claude Haiku 4.5	Budget	$1	$5	200K
Llama 3.3 70B (Together)	Budget	$0.88	$0.88	128K
Mixtral 8x22B	Budget	$1.20	$1.20	64K
GPT-4o mini	Budget	$0.15	$0.60	128K
Gemini 2.5 Flash	Budget	$0.075	$0.30	1M
Mistral Small	Budget	$0.20	$0.60	32K

Prices reflect public list price mid-2026. Prompt caching, batch API, and committed-use discounts can cut these by 50-90%.

VerticalAPI verdict

Match the tier to the task. Premium for hard reasoning or long agent tasks where one good answer beats ten cheap ones. Mid for production user-facing apps and most coding agents. Budget for high-volume extraction, classification, and RAG retrieval. Stack prompt caching (Anthropic, OpenAI) and Batch API (50% off) for the biggest wins. Through VerticalAPI BYOK you can A/B test the same prompt across every tier in one line at list price.

Get started — BYOK every tier →

FAQ

Frequently asked questions

What is the cheapest LLM per 1M tokens in 2026?

Gemini 2.5 Flash is the cheapest closed-weight LLM in 2026 at $0.075 per 1M input tokens and $0.30 per 1M output tokens. On the open-weight side, Llama 3.2 3B and Mistral 7B hosted on DeepInfra or Together can drop below $0.10 per 1M output. For raw extraction or classification at scale, these budget tiers cost roughly 100-200x less than premium flagships like Claude Opus or GPT-4 Turbo per token processed. The trade-off is reasoning quality on multi-step tasks.

How are LLM prices structured in 2026?

2026 LLM pricing falls into three tiers. Premium tier ($10+ per 1M output): Claude Opus 4.5, GPT-4 Turbo, advanced reasoning models. Mid tier ($3-10 per 1M output): GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, Mistral Large 2 — the workhorses for production agents. Budget tier ($0.30-3 per 1M output): GPT-4o mini, Claude Haiku 4.5, Gemini Flash, Mistral Small, hosted Llama 3.3 70B and Mixtral 8x22B. Input tokens are typically 4-5x cheaper than output tokens across all tiers.

How much do prompt caching and batch APIs reduce cost?

Anthropic prompt caching reduces cost on cached prompt portions by up to 90% on Claude. OpenAI prompt caching applies automatically and gives 50% off cached tokens. Google Gemini context caching charges separately for cache storage and reads. OpenAI Batch API and Anthropic Message Batches both give 50% off list price for requests that tolerate up to 24 hours of latency. Stacked, caching plus batch can cut effective per-token cost by 70-90% on the right workload (RAG with shared system prompts, batched classification).

Are output tokens really 4x more expensive than input?

Yes. Across all major providers in 2026, output tokens are priced 4-5x higher than input. GPT-4o is $2.50/$10 (4x), Claude Sonnet 4.5 is $3/$15 (5x), Gemini 2.5 Pro is $1.25/$5 (4x), Claude Opus 4.5 is $15/$75 (5x). This reflects the higher compute cost of autoregressive generation versus parallel input processing. For cost optimization, push as much work as possible into prompt engineering (input) and constrain output length aggressively with max_tokens.

Can I compare LLM costs across providers with one API?

Yes. VerticalAPI exposes every major LLM through a single OpenAI-compatible endpoint at https://api.verticalapi.com/v1. You can A/B test the same prompt across GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, Mistral Large 2, Llama 3.3 70B, and Mixtral 8x22B by changing only the model parameter. Because VerticalAPI is BYOK, you pay each provider directly at list price with zero markup, so the cost comparison reflects what you would pay going direct.

Caveats

Limitations of this comparison

List prices are revised 2-4 times per year; the relative ranking between providers shifts more often than the absolute numbers.
Open-weight prices on inference providers (Together, Fireworks, DeepInfra, Groq) vary by provider — same Llama 3.3 70B can be 30% cheaper on one than another.
Reasoning models (o-series, Claude reasoning, Gemini Deep Think) charge for hidden reasoning tokens that aren't visible in output — actual cost can be 5-20x the apparent per-token rate.
Vision, audio, and function-call tokens are sometimes counted differently — read each provider's pricing page for edge cases.
Volume commitments, enterprise contracts, and Microsoft/AWS reseller deals can cut list prices by 20-50%.

Outlook

What may change in 12-24 months

Mid-tier pricing is expected to drop another 30-50% as competition intensifies and inference hardware (Blackwell, MI300X) cheapens.
Budget-tier output may fall below $0.10/1M for the cheapest models — already happening on open-weight inference.
Premium reasoning models will keep getting more expensive per visible token but may produce dramatically less output for the same task.
Per-token pricing may give way to per-task pricing for agentic workloads, especially in Anthropic's roadmap.

Keep reading

More LLM comparisons

Cheapest LLM for high volume

Which model wins on $/1M tokens at scale

Read comparison →

Budget-tier 3-way

Haiku vs GPT mini vs Gemini Flash

Read comparison →

Prompt caching showdown

Anthropic vs OpenAI caching mechanisms

Read comparison →

Context window comparison

128K vs 200K vs 1M vs 2M tokens

Read comparison →

OpenAI vs Anthropic

GPT-4o vs Claude Sonnet 4.5 head-to-head

Read comparison →

Cost per million tokens: every major LLM (2026)

All major LLMs — list price per 1M tokens

VerticalAPI verdict

Frequently asked questions

Limitations of this comparison

What may change in 12-24 months

Related questions

More LLM comparisons