Replicate via VerticalAPI
Replicate's broad model catalog (Llama, FLUX image gen, Whisper, custom) via VerticalAPI's OpenAI-compatible endpoint where applicable. BYOK with your Replicate token, zero markup.
Replicate models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New Replicate models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
meta/meta-llama-3.3-70b-instruct |
Llama 3.3 70B (Replicate) | 128K | $0.65 / $2.75 per 1M tok |
black-forest-labs/flux-1.1-pro |
FLUX 1.1 Pro (image) | image | $0.04 per image |
openai/whisper |
Whisper (audio) | audio | $0.0029 per minute |
Pricing reflects Replicate's rates — you pay Replicate directly. VerticalAPI adds zero markup on tokens.
5-line Replicate call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "r8_..."} ) response = client.chat.completions.create( model="meta/meta-llama-3.3-70b-instruct", # Replicate messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route Replicate through us
Zero token markup
You pay Replicate directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
Replicate alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Replicate to other providers on identical prompts.
Observability built in
Every Replicate call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where Replicate shines
Common questions about Replicate on VerticalAPI
Are non-LLM models supported?
Replicate's chat-completion-shaped LLMs go through /v1/chat/completions. Image and audio endpoints are exposed at /v1/images and /v1/audio respectively, matching the OpenAI shapes where possible.
Can I run my own Replicate model?
Yes — pass the cog-published model ID (owner/name:version) as the model field. VerticalAPI proxies the request and surfaces logs in the dashboard.
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on Replicate in 60 seconds
Free tier — bring your own Replicate key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →