All LLM provider verticals
25 providers, one OpenAI-compatible endpoint, BYOK. Zero markup on tokens. Pick a provider to see supported models, pricing notes and a 5-line quickstart.
GPT-4o, GPT-4 Turbo, o1 reasoning models
Claude Sonnet 4.5, Opus 4.6, Haiku 4.5
Gemini 2.5 Pro, 2.5 Flash, Flash-8B
Mistral Large, Codestral, Pixtral
Llama 3.3, Llama 3.2 Vision, Llama 4
Grok-3, Grok-2 Vision
Sub-100ms inference for Llama, Mixtral, Whisper
200+ open-weights models — Llama, Qwen, DeepSeek
Fast inference for Llama, DeepSeek, function calling
Sonar Pro — web-grounded answers with citations
Command R+, Embed v3, Rerank
Jamba 1.5 Large — hybrid Mamba-Transformer
Claude, Llama, Titan, Mistral — through your AWS account
GPT-4o on Azure — your Azure subscription, your data residency
Gemini + Claude + Llama on GCP — your project, your residency
300+ models across providers — universal router
Cheap open-weights inference — Llama, Qwen, Mixtral
Run any open-weights model — Llama, FLUX, Whisper, ComfyUI
Wafer-scale inference — fastest tokens/sec on the market
GPU-cloud inference — Hermes 3, Llama 3.3
Optimized inference — Llama, Stable Diffusion, custom
Production-grade open-weights inference
NVIDIA-optimized microservices for open-weights LLMs
DBRX, Llama, Mixtral served on Databricks
Hybrid Mamba-Transformer with 256K context — open weights