LLM Rate Limits Compared: OpenAI, Anthropic, Google (2026)

Q: Can a BYOK gateway help me avoid 429 errors?

Yes. VerticalAPI's OpenAI-compatible endpoint at https://api.verticalapi.com/v1 lets you configure fallback chains — for example, route to GPT-4o first, fall back to Claude Sonnet 4.5 on 429 or 5xx. Because it's BYOK, each fallback uses your own keys with each provider, so you stack their rate-limit budgets rather than competing for a shared pool. This is the cheapest way to add resilience without renegotiating enterprise contracts.

2026 rate-limit matrix

OpenAI, Anthropic, Google — typical limits

Tier	OpenAI (GPT-4o)	Anthropic (Claude Sonnet 4.5)	Google (Gemini 2.5 Pro)
Free / Starter	30K TPM, 3 RPM	50K TPM, 50 RPM	Free AI Studio: 32K daily
Tier 1 ($5-50)	30K TPM, 500 RPM	80K TPM, 1K RPM	Vertex Pay-as-you-go
Tier 3 ($100+)	800K TPM, 5K RPM	400K TPM, 4K RPM	Up to 2M TPM
Tier 5 ($1K+)	30M TPM, 30K RPM	2M+ TPM (custom)	10M+ TPM (custom)
Enterprise ($5K+/month)	Custom (negotiated)	Custom	Vertex committed capacity
Batch API quota	Separate (10x sync)	Separate batch queue	Vertex Batch separate

TPM = tokens per minute, RPM = requests per minute. Limits change frequently; check provider dashboards.

VerticalAPI verdict

Rate-limit upgrades scale with spend, not negotiation. The fastest way to add headroom isn't asking sales — it's stacking provider quotas via BYOK fallback. Route GPT-4o first, fall back to Claude on 429, and to Gemini after that. Each provider sees its own per-account quota, so you effectively triple your real-time budget without contract changes. Plus, batch APIs run on separate queues and give 50% off — push background work there.

Get started — multi-provider fallback →

FAQ

Frequently asked questions

What are typical LLM rate limits in 2026?

Rate limits in 2026 are tiered by provider and account spend. OpenAI starts free-tier accounts at roughly 30,000 tokens-per-minute (TPM) and scales through Tiers 1-5 up to 30M+ TPM for enterprise. Anthropic starts new accounts around 50K TPM and 50 RPM, scaling with monthly spend. Google Gemini gives free Studio access with low daily quotas (around 32K daily on free), scaling to high TPM on paid Vertex AI. Enterprise contracts above $5K/month typically unlock 1M+ TPM across all providers.

How does OpenAI's TPM tier system work?

OpenAI rate limits are organized into Usage Tiers (Free, Tier 1-5). Free starts at 30K TPM and 3 RPM on GPT-4o. Tier 1 ($5+ spent, 7+ days) reaches 30K TPM and 500 RPM. Tier 3 ($100+ spent, 7+ days) reaches 800K TPM and 5,000 RPM. Tier 5 ($1,000+ spent, 30+ days) gives 30M TPM and 30,000 RPM on GPT-4o. Limits are per model and per organization. Token-per-minute counts both input and output. Hitting limits returns HTTP 429; OpenAI publishes the current limits in the dashboard.

Why do production teams hit rate limits even on paid tiers?

Three common causes: bursty traffic (e.g. nightly batch jobs that exceed minute-level TPM), long-context requests (a single 1M-token Gemini call uses an enormous TPM slice), and concurrent agent runs (parallel tool calls multiply RPM). Solutions include staggered scheduling, batch APIs (which use a separate quota), exponential backoff with jitter, and multi-provider fallback. BYOK gateways like VerticalAPI let you fall over from one provider to another on 429 within the same request shape, with no SDK changes.

Do batch APIs have separate rate limits?

Yes. OpenAI Batch API uses a separate batch queue with much higher daily limits (typically 10x the synchronous TPM) at the cost of up to 24-hour latency. Anthropic Message Batches similarly run on a dedicated quota. Google Vertex Batch is also separate from interactive limits. For background workloads (RAG indexing, classification, evaluation runs), routing to batch APIs both saves 50% on tokens and avoids competing with real-time traffic for synchronous quota.

Can a BYOK gateway help me avoid 429 errors?

Yes. VerticalAPI's OpenAI-compatible endpoint at https://api.verticalapi.com/v1 lets you configure fallback chains — for example, route to GPT-4o first, fall back to Claude Sonnet 4.5 on 429 or 5xx, then Gemini. Because it's BYOK, each fallback uses your own keys with each provider, so you stack their rate-limit budgets rather than competing for a shared pool. This is the cheapest way to add resilience without renegotiating enterprise contracts or paying per-token markup.

Caveats

Limitations of this comparison

Rate-limit tiers change without notice; OpenAI alone updated them four times in 2025.
TPM counts both input and output, so a 1M-context Gemini request consumes a full minute of 1M TPM by itself.
Some endpoints (vision, audio, fine-tuned models) have separate quotas not shown in the main tier.
Azure OpenAI and AWS Bedrock manage their own resource-based quotas that don't map directly to OpenAI/Anthropic tiers.
Free Gemini AI Studio quotas are not for production use — terms explicitly restrict commercial deployment.

Outlook

What may change in 12-24 months

Rate limits will keep rising as inference capacity grows; the bottleneck is shifting from TPM to per-request latency for agentic workloads.
Granular per-feature limits (vision TPM, reasoning TPM separate from text) are expected to spread.
Reserved-capacity contracts (Vertex Provisioned Throughput, Azure PTU) will likely become more accessible to mid-market.
BYOK fallback gateways will become standard infrastructure for any production agentic stack.

Keep reading

More LLM comparisons

BYOK vs managed providers

Why BYOK helps with rate limits

Read comparison →

OpenRouter vs VerticalAPI

Aggregator vs BYOK gateway

Read comparison →

Bedrock vs Azure OpenAI

Enterprise capacity models compared

2026 pricing matrix

GPT-4o vs Claude Sonnet 4.5

Read comparison →

LLM rate limits compared (2026)

OpenAI, Anthropic, Google — typical limits

VerticalAPI verdict

Frequently asked questions

Limitations of this comparison

What may change in 12-24 months

Related questions

More LLM comparisons