Meta Llama via VerticalAPI

Llama 3.3 70B, Llama 3.2 Vision and Llama 4 via VerticalAPI's OpenAI-compatible endpoint. BYOK through your Together, Groq, Fireworks or Bedrock account — zero markup.

Start free with your Meta Llama key → Read the docs

Endpoint: https://api.verticalapi.com/v1/chat/completions · BYOK header: X-Provider-Key: <host-specific>

Supported models

Meta Llama models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Meta Llama models are typically supported within 24h of release.

Model ID	Name	Context	Pricing (provider)
`llama-3.3-70b-instruct`	Llama 3.3 70B	128K	Host-dependent — typically $0.50-$0.90 per 1M tok
`llama-3.2-90b-vision`	Llama 3.2 90B Vision	128K	Host-dependent
`llama-3.1-405b-instruct`	Llama 3.1 405B	128K	Host-dependent — flagship open-weights
`llama-4-scout`	Llama 4 Scout (preview)	10M	Preview — host-dependent

Pricing reflects Meta Llama's rates — you pay Meta Llama directly. VerticalAPI adds zero markup on tokens.

Quickstart

5-line Meta Llama call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

                meta_quickstart.py
                Python
            
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "varies by host..."}
)

response = client.chat.completions.create(
    model="llama-3.3-70b-instruct",  # Meta Llama
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Why use Meta Llama via VerticalAPI

Four reasons developers route Meta Llama through us

Zero token markup

You pay Meta Llama directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Meta Llama alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Meta Llama to other providers on identical prompts.

Observability built in

Every Meta Llama call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Best for

Where Meta Llama shines

open-weights flexibility self-hosting vision (3.2) 10M-context (Llama 4 Scout)

FAQ

Common questions about Meta Llama on VerticalAPI

Which host serves Llama models?

VerticalAPI lets you BYOK to Together AI, Groq, Fireworks, AWS Bedrock or your own self-hosted endpoint. Pick a host in the dashboard, paste its key, and call the model name — we route under the hood.

Why not call Together AI directly?

VerticalAPI gives you a single OpenAI-compatible endpoint, single key, and switchable hosts. Move from Together to Groq for speed without changing app code.

Is Llama 4 available?

Llama 4 Scout (10M context) is supported in preview where the host has rolled it out. Maverick variants are added as hosts release them.

Switch providers

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.

OpenAI Anthropic Google Gemini Mistral AI Meta Llama xAI Grok Groq Together AI Fireworks AI Perplexity Sonar Cohere AI21 Labs AWS Bedrock Azure OpenAI Google Vertex AI

Ship on Meta Llama in 60 seconds

Free tier — bring your own Meta Llama key, zero markup, OpenAI-compatible endpoint.

Get your VerticalAPI key →