Replicate via VerticalAPI

Updated May 04, 2026·By VerticalAPI Team

Replicate's broad model catalog (Llama, FLUX image gen, Whisper, custom) via VerticalAPI's OpenAI-compatible endpoint where applicable. BYOK with your Replicate token, zero markup.

Start free with your Replicate key → Read the docs

Endpoint: https://api.verticalapi.com/v1/chat/completions · BYOK header: X-Provider-Key: r8_...

Supported models

Replicate models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Replicate models are typically supported within 24h of release.

Model ID	Name	Context	Pricing (provider)
`meta/meta-llama-3.3-70b-instruct`	Llama 3.3 70B (Replicate)	128K	$0.65 / $2.75 per 1M tok
`black-forest-labs/flux-1.1-pro`	FLUX 1.1 Pro (image)	image	$0.04 per image
`openai/whisper`	Whisper (audio)	audio	$0.0029 per minute

Pricing reflects Replicate's rates — you pay Replicate directly. VerticalAPI adds zero markup on tokens.

Quickstart

5-line Replicate call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

                replicate_quickstart.py
                Python
            
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "r8_..."}
)

response = client.chat.completions.create(
    model="meta/meta-llama-3.3-70b-instruct",  # Replicate
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Why use Replicate via VerticalAPI

Four reasons developers route Replicate through us

Zero token markup

You pay Replicate directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Replicate alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Replicate to other providers on identical prompts.

Observability built in

Every Replicate call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Best for

Where Replicate shines

custom model hosting image generation (FLUX) audio transcription research models

FAQ

Common questions about Replicate on VerticalAPI

Are non-LLM models supported?

Replicate's chat-completion-shaped LLMs go through /v1/chat/completions. Image and audio endpoints are exposed at /v1/images and /v1/audio respectively, matching the OpenAI shapes where possible.

Can I run my own Replicate model?

Yes — pass the cog-published model ID (owner/name:version) as the model field. VerticalAPI proxies the request and surfaces logs in the dashboard.

Switch providers

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.

Ship on Replicate in 60 seconds

Free tier — bring your own Replicate key, zero markup, OpenAI-compatible endpoint.

Get your VerticalAPI key →