Groq via VerticalAPI

Call Groq's LPU-accelerated Llama 3.3, Mixtral and Whisper via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Groq key, zero markup, ~500 tok/s typical.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: gsk_...

Groq models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Groq models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
llama-3.3-70b-versatile Llama 3.3 70B (Groq) 128K $0.59 / $0.79 per 1M tok
llama-3.1-8b-instant Llama 3.1 8B Instant 128K $0.05 / $0.08 per 1M tok
mixtral-8x7b-32768 Mixtral 8x7B 32K $0.24 / $0.24 per 1M tok
whisper-large-v3 Whisper Large v3 audio $0.111 per hour audio

Pricing reflects Groq's rates — you pay Groq directly. VerticalAPI adds zero markup on tokens.

5-line Groq call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

groq_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "gsk_..."}
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",  # Groq
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Groq through us

Zero token markup

You pay Groq directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Groq alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Groq to other providers on identical prompts.

Observability built in

Every Groq call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Groq shines

sub-100ms first-token latency real-time agents voice (Whisper) interactive UX

Common questions about Groq on VerticalAPI

Why route Groq through VerticalAPI?

Speed isn't the bottleneck — orchestration is. Use Groq for time-critical hops and Claude for hard reasoning, all via the same OpenAI-compatible endpoint and one key.

What's the typical latency?

Groq's LPU regularly delivers ~500 tok/s sustained on Llama 3.3 70B. VerticalAPI adds ~5-10ms gateway overhead, dashboard-tracked per request.

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.