DeepInfra via VerticalAPI

Updated May 04, 2026·By VerticalAPI Team

DeepInfra's low-cost open-weights catalog (Llama 3.3, Qwen 2.5, Mixtral, DeepSeek) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup, $0.10/M tokens for small models.

Start free with your DeepInfra key → Read the docs

Endpoint: https://api.verticalapi.com/v1/chat/completions · BYOK header: X-Provider-Key: <deepinfra-key>

Supported models

DeepInfra models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New DeepInfra models are typically supported within 24h of release.

Model ID	Name	Context	Pricing (provider)
`meta-llama/Llama-3.3-70B-Instruct`	Llama 3.3 70B	128K	$0.23 / $0.40 per 1M tok
`Qwen/Qwen2.5-72B-Instruct`	Qwen2.5 72B	32K	$0.35 / $0.40 per 1M tok
`mistralai/Mixtral-8x7B-Instruct-v0.1`	Mixtral 8x7B	32K	$0.24 / $0.24 per 1M tok
`deepseek-ai/DeepSeek-V3`	DeepSeek V3	64K	$0.49 / $0.89 per 1M tok

Pricing reflects DeepInfra's rates — you pay DeepInfra directly. VerticalAPI adds zero markup on tokens.

Quickstart

5-line DeepInfra call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

                deepinfra_quickstart.py
                Python
            
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct",  # DeepInfra
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Why use DeepInfra via VerticalAPI

Four reasons developers route DeepInfra through us

Zero token markup

You pay DeepInfra directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

DeepInfra alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare DeepInfra to other providers on identical prompts.

Observability built in

Every DeepInfra call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Best for

Where DeepInfra shines

budget batch inference high-volume classification embedding pipelines fine-tuned models

FAQ

Common questions about DeepInfra on VerticalAPI

How does DeepInfra pricing compare to Together?

DeepInfra is typically 30-40% cheaper on Llama 3.3 70B and similar volume. Latency is comparable for non-batch workloads.

Are embeddings supported?

Yes — POST /v1/embeddings routes to DeepInfra's embedding models (BGE, E5) at OpenAI-format response shape.

Switch providers

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.

Ship on DeepInfra in 60 seconds

Free tier — bring your own DeepInfra key, zero markup, OpenAI-compatible endpoint.

Get your VerticalAPI key →