DeepInfra via VerticalAPI
DeepInfra's low-cost open-weights catalog (Llama 3.3, Qwen 2.5, Mixtral, DeepSeek) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup, $0.10/M tokens for small models.
DeepInfra models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New DeepInfra models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
meta-llama/Llama-3.3-70B-Instruct |
Llama 3.3 70B | 128K | $0.23 / $0.40 per 1M tok |
Qwen/Qwen2.5-72B-Instruct |
Qwen2.5 72B | 32K | $0.35 / $0.40 per 1M tok |
mistralai/Mixtral-8x7B-Instruct-v0.1 |
Mixtral 8x7B | 32K | $0.24 / $0.24 per 1M tok |
deepseek-ai/DeepSeek-V3 |
DeepSeek V3 | 64K | $0.49 / $0.89 per 1M tok |
Pricing reflects DeepInfra's rates — you pay DeepInfra directly. VerticalAPI adds zero markup on tokens.
5-line DeepInfra call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "..."} ) response = client.chat.completions.create( model="meta-llama/Llama-3.3-70B-Instruct", # DeepInfra messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route DeepInfra through us
Zero token markup
You pay DeepInfra directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
DeepInfra alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare DeepInfra to other providers on identical prompts.
Observability built in
Every DeepInfra call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where DeepInfra shines
Common questions about DeepInfra on VerticalAPI
How does DeepInfra pricing compare to Together?
DeepInfra is typically 30-40% cheaper on Llama 3.3 70B and similar volume. Latency is comparable for non-batch workloads. <!-- TODO Hugo: verify with current pricing pages -->
Are embeddings supported?
Yes — POST /v1/embeddings routes to DeepInfra's embedding models (BGE, E5) at OpenAI-format response shape.
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on DeepInfra in 60 seconds
Free tier — bring your own DeepInfra key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →