Meta Llama via VerticalAPI

Llama 3.3 70B, Llama 3.2 Vision and Llama 4 via VerticalAPI's OpenAI-compatible endpoint. BYOK through your Together, Groq, Fireworks or Bedrock account — zero markup.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <host-specific>

Meta Llama models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Meta Llama models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
llama-3.3-70b-instruct Llama 3.3 70B 128K Host-dependent — typically $0.50-$0.90 per 1M tok
llama-3.2-90b-vision Llama 3.2 90B Vision 128K Host-dependent
llama-3.1-405b-instruct Llama 3.1 405B 128K Host-dependent — flagship open-weights
llama-4-scout Llama 4 Scout (preview) 10M Preview — host-dependent

Pricing reflects Meta Llama's rates — you pay Meta Llama directly. VerticalAPI adds zero markup on tokens.

5-line Meta Llama call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

meta_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "varies by host..."}
)

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",  # Meta Llama
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Meta Llama through us

Zero token markup

You pay Meta Llama directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Meta Llama alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Meta Llama to other providers on identical prompts.

Observability built in

Every Meta Llama call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Meta Llama shines

open-weights flexibility self-hosting vision (3.2) 10M-context (Llama 4 Scout)

Common questions about Meta Llama on VerticalAPI

Which host serves Llama models?

VerticalAPI lets you BYOK to Together AI, Groq, Fireworks, AWS Bedrock or your own self-hosted endpoint. Pick a host in the dashboard, paste its key, and call the model name — we route under the hood.

Why not call Together AI directly?

VerticalAPI gives you a single OpenAI-compatible endpoint, single key, and switchable hosts. Move from Together to Groq for speed without changing app code.

Is Llama 4 available?

Llama 4 Scout (10M context) is supported in preview where the host has rolled it out. Maverick variants are added as hosts release them.

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.