Anthropic via VerticalAPI

Route to Claude Sonnet, Opus and Haiku through an OpenAI-compatible endpoint. BYOK with your Anthropic key, zero markup on tokens, full prompt caching support.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: sk-ant-...

Anthropic models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Anthropic models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
claude-sonnet-4-5 Claude Sonnet 4.5 200K $3 / $15 per 1M tok
claude-opus-4-6 Claude Opus 4.6 200K $15 / $75 per 1M tok — flagship
claude-haiku-4-5 Claude Haiku 4.5 200K $0.80 / $4 per 1M tok — fastest
claude-sonnet-4-5-1m Claude Sonnet 4.5 (1M ctx) 1M Long-context tier — premium pricing

Pricing reflects Anthropic's rates — you pay Anthropic directly. VerticalAPI adds zero markup on tokens.

5-line Anthropic call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

anthropic_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "sk-ant-..."}
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",  # Anthropic
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Anthropic through us

Zero token markup

You pay Anthropic directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Anthropic alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Anthropic to other providers on identical prompts.

Observability built in

Every Anthropic call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Anthropic measured: latency, throughput, error rate

Anthropic's Claude Sonnet 4.5 is the slowest of the major flagship tiers on time-to-first-token but compensates with the best coding and tool-use quality scores in the 2026 benchmark. Numbers below are from 1000 calls/model via the VerticalAPI gateway.

MetricValueNotes
p50 TTFT (Sonnet 4.5) ~1.2 s Anthropic streams more conservatively — expect higher TTFT than OpenAI
p95 TTFT (Sonnet 4.5) ~2.8 s Tail latency is high; bake this into UX timeouts
Tokens per second (Sonnet) ~80 tok/s Steady-state streaming throughput
p50 TTFT (Haiku 4.5) ~620 ms Haiku is closer to GPT-4o speeds, with quality near gpt-4o-mini
Prompt caching hit rate ~85% typical Cache-aware workloads see 50-80% cost reduction in practice

Numbers above are 2026 placeholders pending the next VerticalAPI benchmark harness run. See /benchmark for the full 26-provider comparison.

OpenAI SDK methods that work with Anthropic

Anthropic's native API differs from OpenAI's at the wire format (separate system field, content blocks, no logprobs). VerticalAPI normalizes these so the OpenAI SDK works, with a few edge cases to be aware of.

  • client.chat.completions.create() — works fully; stream=True, tools, tool_choice all supported.
  • logprobs / top_logprobs — not supported by Anthropic; the field is silently dropped (you receive a normal response without logprobs).
  • n>1 (multiple completions) — Anthropic does not natively support n>1; VerticalAPI emulates by issuing parallel calls (cost multiplies accordingly).
  • response_format="json_object" — Anthropic doesn't have a strict JSON mode equivalent; VerticalAPI prompts for JSON and validates, but enforcement is softer than OpenAI's structured outputs.
  • Prompt caching — pass cache_control: {"type": "ephemeral"} on a message to mark a cache breakpoint. Cache hits are billed at 10% of normal input cost.
  • Vision — image_url message parts work; data: URIs and HTTPS URLs both accepted.
  • client.embeddings.create() — Anthropic doesn't ship embeddings; VerticalAPI returns a clear error directing you to use a provider that does.

What Anthropic actually costs at 100k MAU

Concrete monthly cost for a chatbot with 100k MAU, 10 turns/user, ~500 input + 150 output tokens per turn (650M input, 195M output tokens/month). Anthropic is the most expensive flagship — choose carefully.

ModelMonthly costWhen to use
claude-haiku-4-5 ~$1,300/mo Best value Claude tier — quality close to gpt-4o-mini, latency similar
claude-sonnet-4-5 ~$4,875/mo Most agentic-coding workloads should default here; ~37% more expensive than gpt-4o for ~5 quality points more on coding
claude-sonnet-4-5 + 70% cache hit ~$2,100/mo Cache-aware design (system prompts, RAG context) drops cost by half
claude-opus-4-6 ~$24,375/mo Reserve for hardest reasoning. 5x cost vs Sonnet for 3 quality points — only worth it for top-of-funnel queries
claude-sonnet-4-5-1m (long context) varies 1M-context tier prices premium — see Anthropic pricing page

Cost based on provider list price; VerticalAPI adds zero token markup.

Should you pick Anthropic for your workload?

Anthropic is the right pick when quality matters more than latency or cost. Specifically:

You're building a coding agent. Claude Sonnet 4.5 wins the 2026 benchmark on coding (92/100) and is the model of choice for tools like Cursor, Aider, and most internal coding assistants. Tool-use behavior is more predictable than GPT-4o on long, multi-step tool chains, especially when you need the model to decide *not* to call a tool. The only serious competitor for code is Mistral Codestral, which is cheaper but specialized for fill-in-the-middle (loses on agentic coding).

You handle long context regularly. Sonnet 4.5's 200K window — and the 1M-token tier for accounts with access — is a strong match for codebase analysis, multi-document RAG, and long-form transcript work. Combined with prompt caching, you can keep a 100K-token system context and pay 10% on cache hits, which makes long-context economically viable in ways OpenAI's gpt-4o doesn't.

You can absorb the latency. Anthropic's TTFT is 30-50% slower than OpenAI's at the flagship tier. For chat UX where users expect near-instant streaming, this is noticeable. For agentic backend workloads (where the user is waiting on a final result anyway), it doesn't matter. If sub-second TTFT is a hard requirement, use Haiku 4.5 (~620ms) or skip Anthropic entirely for Groq/Cerebras Llama.

Specific issues teams hit with Anthropic

Sharp edges that have cost real production teams real time. Fixes below are battle-tested via the VerticalAPI dashboard logs.

tool_use blocks are different from OpenAI tool_calls
Anthropic returns tools as content blocks, not a top-level tool_calls array. VerticalAPI normalizes this — but if you're inspecting raw response payloads, you may see Anthropic's native shape leak through. Use the parsed message.tool_calls field, not the raw content.
stop_reason isn't always "stop"
Anthropic uses end_turn, max_tokens, stop_sequence, tool_use. VerticalAPI maps to OpenAI's finish_reason but "tool_calls" is the only direct match. Your code that checks finish_reason == "stop" should also accept "end_turn" — or trust the normalized field.
Cache breakpoints have a 4-block limit
You can only have 4 cache_control breakpoints per request. Trying to cache 5+ system messages silently drops the extras. Place breakpoints strategically — typically at large stable blocks (system prompt, RAG context, few-shot examples).
Long thinking mode (extended_thinking) costs more than you'd expect
extended_thinking enables visible chain-of-thought, but the thought tokens count as output tokens. Budgeting for 2-5x the normal output cost is realistic on hard problems.
Rate limits are per-organization, not per-key
Adding more API keys won't lift your rate limit. Apply for tier increases in the Anthropic console; VerticalAPI's dashboard shows your current usage against the tier limit so you can plan ahead.

Where Anthropic shines

long-context analysis agentic coding tool use prompt caching

Frequently asked questions

What is Anthropic and what models do they offer?

Anthropic is the AI safety lab behind the Claude model family. The 2026 lineup is Claude Sonnet 4.5 (the workhorse model leading SWE-Bench Verified), Claude Opus 4.5 (the frontier reasoning model), and Claude Haiku 4.5 (a fast small model). All Claude 4.x models share a 200K-token context window, native tool use, vision input, prompt caching, and the Messages API with streaming.

How much does Anthropic cost in 2026?

Claude Sonnet 4.5 is $3 per 1M input tokens and $15 per 1M output. Opus 4.5 is $15/$75. Haiku 4.5 is $1/$5. Prompt caching cuts repeated input by up to 90% (cached reads ~$0.30/1M on Sonnet), making long-context RAG and coding agents dramatically cheaper. Batch API discounts are 50%. Via VerticalAPI BYOK you pay Anthropic directly at list price with zero token markup.

How do I use Anthropic via VerticalAPI BYOK?

Get an Anthropic API key (sk-ant-…), paste it into the VerticalAPI dashboard, then point either the Anthropic SDK or the OpenAI SDK at https://api.verticalapi.com/v1. VerticalAPI translates OpenAI-style chat completions into Anthropic Messages, preserves tool use, streaming and prompt caching, and surfaces unified usage logs. Billing stays on your Anthropic invoice.

What is Anthropic best for compared to alternatives?

Claude leads on long-context comprehension, coding tasks (top SWE-Bench Verified score in 2026), agentic tool use, and instruction following on nuanced briefs. Compared to GPT-4o it has a larger context (200K vs 128K) and better prompt caching economics; compared to Gemini 2.5 Pro it has a smaller context (200K vs 2M) but stronger code generation. For low-cost open-weight inference, Llama 3.3 70B on Groq is cheaper but less capable.

Where is Anthropic hosted / data privacy?

Anthropic runs on AWS and Google Cloud, with multi-region deployment in the US and EU. API inputs and outputs are not used to train models. Enterprise tiers offer zero data retention and HIPAA. Via VerticalAPI BYOK, traffic is proxied through VerticalAPI's edge while your Anthropic key and contractual data terms remain directly with Anthropic.

Limitations and trade-offs

  • 200K context window is smaller than Gemini 2.5 Pro's 2M for whole-codebase or whole-book inputs.
  • No native image, audio or video generation — Claude is text + vision-input only.
  • Output tokens at $15 (Sonnet) and $75 (Opus) per 1M are expensive vs open-weight inference on Groq or Cerebras.
  • Rate limits on new accounts can require multiple tier upgrades for production agent workloads.
  • No fine-tuning or open weights — you cannot self-host Claude, even on AWS Bedrock you rent the model.

Where Anthropic is heading

  1. Claude 5 generation expected late 2026 with extended thinking and longer effective context.
  2. Wider rollout of computer-use and browser-use APIs for desktop agents.
  3. Cheaper Haiku-class distillations targeting on-device and edge deployment.
  4. Broader sovereignty options via AWS Bedrock regional endpoints in EU and Asia.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Claude Sonnet 4.5 vs GPT-5 for coding — which wins SWE-Bench?
  • How does Anthropic prompt caching cut cost by 90%?
  • Is Claude Opus 4.5 worth $75 per million output tokens?
  • Can I run Claude on AWS Bedrock with my own AWS account?
  • What is the best Claude model for long-document RAG?