All LLM provider verticals

25 providers, one OpenAI-compatible endpoint, BYOK. Zero markup on tokens. Pick a provider to see supported models, pricing notes and a 5-line quickstart.

OpenAI openai

GPT-4o, GPT-4 Turbo, o1 reasoning models

View OpenAI integration →
Anthropic anthropic

Claude Sonnet 4.5, Opus 4.6, Haiku 4.5

View Anthropic integration →
Google Gemini google

Gemini 2.5 Pro, 2.5 Flash, Flash-8B

View Google Gemini integration →
Mistral AI mistral

Mistral Large, Codestral, Pixtral

View Mistral AI integration →
Meta Llama meta

Llama 3.3, Llama 3.2 Vision, Llama 4

View Meta Llama integration →
xAI Grok xai

Grok-3, Grok-2 Vision

View xAI Grok integration →
Groq groq

Sub-100ms inference for Llama, Mixtral, Whisper

View Groq integration →
Together AI together-ai

200+ open-weights models — Llama, Qwen, DeepSeek

View Together AI integration →
Fireworks AI fireworks

Fast inference for Llama, DeepSeek, function calling

View Fireworks AI integration →
Perplexity Sonar perplexity

Sonar Pro — web-grounded answers with citations

View Perplexity Sonar integration →
Cohere cohere

Command R+, Embed v3, Rerank

View Cohere integration →
AI21 Labs ai21

Jamba 1.5 Large — hybrid Mamba-Transformer

View AI21 Labs integration →
AWS Bedrock aws-bedrock

Claude, Llama, Titan, Mistral — through your AWS account

View AWS Bedrock integration →
Azure OpenAI azure-openai

GPT-4o on Azure — your Azure subscription, your data residency

View Azure OpenAI integration →
Google Vertex AI vertex-ai

Gemini + Claude + Llama on GCP — your project, your residency

View Google Vertex AI integration →
OpenRouter openrouter

300+ models across providers — universal router

View OpenRouter integration →
DeepInfra deepinfra

Cheap open-weights inference — Llama, Qwen, Mixtral

View DeepInfra integration →
Replicate replicate

Run any open-weights model — Llama, FLUX, Whisper, ComfyUI

View Replicate integration →
Cerebras cerebras

Wafer-scale inference — fastest tokens/sec on the market

View Cerebras integration →
Lambda Labs lambdalabs

GPU-cloud inference — Hermes 3, Llama 3.3

View Lambda Labs integration →
OctoAI octoai

Optimized inference — Llama, Stable Diffusion, custom

View OctoAI integration →
Lepton AI lepton

Production-grade open-weights inference

View Lepton AI integration →
NVIDIA NIM nvidia-nim

NVIDIA-optimized microservices for open-weights LLMs

View NVIDIA NIM integration →
Databricks Mosaic databricks-mosaic

DBRX, Llama, Mixtral served on Databricks

View Databricks Mosaic integration →
AI21 Jamba jamba

Hybrid Mamba-Transformer with 256K context — open weights

View AI21 Jamba integration →