Anthropic prompt caching vs OpenAI prompt caching: the two caching APIs that move agent unit economics (2026)

Both OpenAI and Anthropic ship prompt caching APIs that discount repeated input tokens. The mechanisms, TTLs, and discount rates differ enough to change which provider is cheapest on real agent workloads. Here is the head-to-head.

Anthropic prompt caching vs OpenAI caching — at a glance

DimensionAnthropic Prompt CachingOpenAI Prompt Caching
Cached read discountUp to 90% (0.1x list price)50% (0.5x list price)
Cache write cost1.25x list (5-min) or 2x (1-hour)1x list price (no surcharge)
TTL options5 minutes or 1 hour5 minutes (default)
ActivationExplicit cache_control markersAutomatic (prefix-based)
Minimum cache size1024 tokens (Sonnet/Opus), 2048 (Haiku)1024 tokens
Best forLong system prompts, agent loops, RAG with stable contextAny workload — zero code change to activate

Pick Anthropic Prompt Caching or OpenAI Prompt Caching?

When to choose Anthropic Prompt Caching

Choose Anthropic Prompt Caching when your workload reuses a large stable prefix (system prompt, tool definitions, RAG context) across many requests, and the 90% read discount changes the economics. The 1-hour TTL is especially valuable for slow agent loops or background batch jobs where the 5-minute window is too short. You need to mark cache_control breakpoints in the prompt, but the savings on agent workloads regularly hit 70-85% of input cost.

  • Up to 90% discount on cached input reads (0.1x list price)
  • Two TTL tiers: 5-minute (cheaper write) or 1-hour
  • Explicit cache_control markers give precise control
  • Best ROI on long-system-prompt agent loops and RAG
  • Often makes Claude Sonnet 4.5 cheaper than GPT-4o despite higher list price

When to choose OpenAI Prompt Caching

Choose OpenAI Prompt Caching when you want caching benefits with zero code change. OpenAI's caching is automatic — any prompt prefix over 1024 tokens reused within 5 minutes gets a 50% discount on the cached portion. There is no write surcharge, no cache_control markers, and no client-side bookkeeping. The discount is smaller than Anthropic's, but the integration cost is also zero.

  • 50% discount on cached input reads (0.5x list price)
  • Automatic — no code changes or cache_control markers
  • No write surcharge (cache writes cost the same as uncached)
  • 5-minute TTL with hit reporting in response metadata
  • Best for teams that want caching without refactoring prompts

Route Anthropic Prompt Caching and OpenAI Prompt Caching through one endpoint

VerticalAPI exposes both providers through a single OpenAI-compatible endpoint. Same SDK, BYOK, zero markup on tokens — you pay each provider directly with your own keys.

from openai import OpenAI
client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

# Anthropic Prompt Caching via VerticalAPI BYOK
resp_a = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "sk-ant-..."},
)

# OpenAI Prompt Caching same SDK, different model + key
resp_b = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={"X-Provider-Key": "sk-..."},
)

Try VerticalAPI free →

VerticalAPI verdict

Anthropic offers a steeper discount (up to 90%) and a 1-hour TTL but requires explicit cache_control markers and pays a write surcharge. OpenAI offers automatic caching with no surcharge but caps the read discount at 50%. For long-running agent loops with stable system prompts, Anthropic wins clearly on total cost. For quick wins with no code changes, OpenAI's caching is more accessible. Both work transparently when you BYOK through VerticalAPI on the same OpenAI-compatible endpoint.

Get started — BYOK both providers →

Frequently asked questions

Which provider gives a bigger caching discount?

Anthropic by a wide margin. Anthropic Prompt Caching reads cached input tokens at 0.1x list price (up to 90% off), while OpenAI Prompt Caching reads cached tokens at 0.5x list price (50% off). For workloads reusing thousands of tokens of stable system prompt or RAG context, Anthropic's caching can move effective costs 40-60% below OpenAI's even though Claude Sonnet 4.5's list price is higher than GPT-4o's.

Does Anthropic charge extra to write to the cache?

Yes. Anthropic charges 1.25x list price for 5-minute cache writes and 2x for 1-hour writes. OpenAI does not charge a write surcharge — cache writes cost the same as uncached input. This means Anthropic's net savings only kick in when you read the cached prompt at least roughly 2-3 times before it expires; below that threshold, OpenAI's free caching may be cheaper despite the smaller discount.

Is OpenAI's caching automatic?

Yes. OpenAI Prompt Caching activates automatically for any prompt prefix of 1024 or more tokens reused within 5 minutes — no API parameters or markers required. The response includes a 'cached_tokens' field showing the hit count. Anthropic requires you to explicitly mark cache_control breakpoints in the messages array; without those markers, no caching happens.

What is the longest cache TTL?

Anthropic offers a 1-hour TTL option (at 2x write cost) — useful for slow agent loops or batch jobs. OpenAI's TTL is fixed at roughly 5 minutes. For workloads where requests are spaced more than 5 minutes apart (e.g., long-running multi-turn agents, scheduled summarization), Anthropic's 1-hour cache can be the difference between caching paying off and not.

Do both work through VerticalAPI?

Yes. VerticalAPI passes prompt-caching headers and metadata through transparently for both Anthropic and OpenAI providers on its BYOK endpoint at https://api.verticalapi.com/v1. You configure cache_control markers (for Anthropic) or rely on automatic caching (for OpenAI) exactly as you would on each native API. Zero markup on tokens, including on cached reads.

Limitations of this comparison

  • Anthropic's 90% discount only applies once you've recouped the 25%-100% write surcharge — break-even is roughly 2-3 reads per write.
  • OpenAI caching is opaque — no client-side control over what gets cached or for how long.
  • Cache hit rates depend heavily on prompt structure; small prefix variations (timestamps, user IDs) can invalidate the cache.
  • Both APIs have minimum cache sizes (~1024 tokens) — short prompts get no benefit.
  • Discount rates and TTLs may change; figures here reflect mid-2026 published pricing.

What may change in 12-24 months

  1. OpenAI is expected to expand TTL options (15-min, 1-hour) and possibly increase the read discount to compete with Anthropic.
  2. Anthropic may add an automatic-caching mode for prompts that meet stability thresholds.
  3. Both providers are likely to add cache-key namespaces for multi-tenant agent platforms.
  4. Long-context caching (200K+ tokens) will become economically critical as agent workloads grow.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • How does prompt caching change the total cost of an agent loop?
  • Can I cache the same prompt across Anthropic and OpenAI simultaneously?
  • What's the cache hit rate I should expect on a typical RAG workload?
  • Does Gemini support prompt caching equivalent to Anthropic or OpenAI?
  • How do I measure cache hit rate through VerticalAPI?