Anthropic prompt caching vs OpenAI caching: 2026 comparison

Side-by-side

Anthropic prompt caching vs OpenAI caching — at a glance

Dimension	Anthropic Prompt Caching	OpenAI Prompt Caching
Cached read discount	Up to 90% (0.1x list price)	50% (0.5x list price)
Cache write cost	1.25x list (5-min) or 2x (1-hour)	1x list price (no surcharge)
TTL options	5 minutes or 1 hour	5 minutes (default)
Activation	Explicit cache_control markers	Automatic (prefix-based)
Minimum cache size	1024 tokens (Sonnet/Opus), 2048 (Haiku)	1024 tokens
Best for	Long system prompts, agent loops, RAG with stable context	Any workload — zero code change to activate

When to choose which

Pick Anthropic Prompt Caching or OpenAI Prompt Caching?

When to choose Anthropic Prompt Caching

Choose Anthropic Prompt Caching when your workload reuses a large stable prefix (system prompt, tool definitions, RAG context) across many requests, and the 90% read discount changes the economics. The 1-hour TTL is especially valuable for slow agent loops or background batch jobs where the 5-minute window is too short. You need to mark cache_control breakpoints in the prompt, but the savings on agent workloads regularly hit 70-85% of input cost.

Up to 90% discount on cached input reads (0.1x list price)
Two TTL tiers: 5-minute (cheaper write) or 1-hour
Explicit cache_control markers give precise control
Best ROI on long-system-prompt agent loops and RAG
Often makes Claude Sonnet 4.5 cheaper than GPT-4o despite higher list price

When to choose OpenAI Prompt Caching

Choose OpenAI Prompt Caching when you want caching benefits with zero code change. OpenAI's caching is automatic — any prompt prefix over 1024 tokens reused within 5 minutes gets a 50% discount on the cached portion. There is no write surcharge, no cache_control markers, and no client-side bookkeeping. The discount is smaller than Anthropic's, but the integration cost is also zero.

50% discount on cached input reads (0.5x list price)
Automatic — no code changes or cache_control markers
No write surcharge (cache writes cost the same as uncached)
5-minute TTL with hit reporting in response metadata
Best for teams that want caching without refactoring prompts

Why not both?

Route Anthropic Prompt Caching and OpenAI Prompt Caching through one endpoint

VerticalAPI exposes both providers through a single OpenAI-compatible endpoint. Same SDK, BYOK, zero markup on tokens — you pay each provider directly with your own keys.

from openai import OpenAI
client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

# Anthropic Prompt Caching via VerticalAPI BYOK
resp_a = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "sk-ant-..."},
)

# OpenAI Prompt Caching same SDK, different model + key
resp_b = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "sk-..."},
)

Try VerticalAPI free →

VerticalAPI verdict

Anthropic offers a steeper discount (up to 90%) and a 1-hour TTL but requires explicit cache_control markers and pays a write surcharge. OpenAI offers automatic caching with no surcharge but caps the read discount at 50%. For long-running agent loops with stable system prompts, Anthropic wins clearly on total cost. For quick wins with no code changes, OpenAI's caching is more accessible. Both work transparently when you BYOK through VerticalAPI on the same OpenAI-compatible endpoint.

Get started — BYOK both providers →

FAQ

Frequently asked questions

Which provider gives a bigger caching discount?

Anthropic by a wide margin. Anthropic Prompt Caching reads cached input tokens at 0.1x list price (up to 90% off), while OpenAI Prompt Caching reads cached tokens at 0.5x list price (50% off). For workloads reusing thousands of tokens of stable system prompt or RAG context, Anthropic's caching can move effective costs 40-60% below OpenAI's even though Claude Sonnet 4.5's list price is higher than GPT-4o's.

Does Anthropic charge extra to write to the cache?

Yes. Anthropic charges 1.25x list price for 5-minute cache writes and 2x for 1-hour writes. OpenAI does not charge a write surcharge — cache writes cost the same as uncached input. This means Anthropic's net savings only kick in when you read the cached prompt at least roughly 2-3 times before it expires; below that threshold, OpenAI's free caching may be cheaper despite the smaller discount.

Is OpenAI's caching automatic?

Yes. OpenAI Prompt Caching activates automatically for any prompt prefix of 1024 or more tokens reused within 5 minutes — no API parameters or markers required. The response includes a 'cached_tokens' field showing the hit count. Anthropic requires you to explicitly mark cache_control breakpoints in the messages array; without those markers, no caching happens.

What is the longest cache TTL?

Anthropic offers a 1-hour TTL option (at 2x write cost) — useful for slow agent loops or batch jobs. OpenAI's TTL is fixed at roughly 5 minutes. For workloads where requests are spaced more than 5 minutes apart (e.g., long-running multi-turn agents, scheduled summarization), Anthropic's 1-hour cache can be the difference between caching paying off and not.

Do both work through VerticalAPI?

Yes. VerticalAPI passes prompt-caching headers and metadata through transparently for both Anthropic and OpenAI providers on its BYOK endpoint at https://api.verticalapi.com/v1. You configure cache_control markers (for Anthropic) or rely on automatic caching (for OpenAI) exactly as you would on each native API. Zero markup on tokens, including on cached reads.

Caveats

Limitations of this comparison

Anthropic's 90% discount only applies once you've recouped the 25%-100% write surcharge — break-even is roughly 2-3 reads per write.
OpenAI caching is opaque — no client-side control over what gets cached or for how long.
Cache hit rates depend heavily on prompt structure; small prefix variations (timestamps, user IDs) can invalidate the cache.
Both APIs have minimum cache sizes (~1024 tokens) — short prompts get no benefit.
Discount rates and TTLs may change; figures here reflect mid-2026 published pricing.

Outlook

What may change in 12-24 months

OpenAI is expected to expand TTL options (15-min, 1-hour) and possibly increase the read discount to compete with Anthropic.
Anthropic may add an automatic-caching mode for prompts that meet stability thresholds.
Both providers are likely to add cache-key namespaces for multi-tenant agent platforms.
Long-context caching (200K+ tokens) will become economically critical as agent workloads grow.

Keep reading