Lepton AI via VerticalAPI

Lepton AI's production inference stack (Llama 3.3, Mixtral, Whisper) via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Lepton key, zero markup.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <lepton-key>

Lepton AI models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Lepton AI models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
llama3-3-70b Llama 3.3 70B (Lepton) 128K $0.80 per 1M tok
mixtral-8x7b Mixtral 8x7B (Lepton) 32K $0.50 per 1M tok
whisper-large-v3 Whisper Large v3 (Lepton) audio $0.10 per hour audio

Pricing reflects Lepton AI's rates — you pay Lepton AI directly. VerticalAPI adds zero markup on tokens.

5-line Lepton AI call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

lepton_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="llama3-3-70b",  # Lepton AI
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Lepton AI through us

Zero token markup

You pay Lepton AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Lepton AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Lepton AI to other providers on identical prompts.

Observability built in

Every Lepton AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Lepton AI shines

dedicated endpoints auto-scaling inference hybrid cloud fine-tuned LoRAs

Common questions about Lepton AI on VerticalAPI

What's Lepton's edge over Together?

Lepton focuses on dedicated endpoints with predictable latency at higher QPS. Useful for customer-facing apps where p99 latency matters more than per-token cost.