Groq via VerticalAPI

Call Groq's LPU-accelerated Llama 3.3, Mixtral and Whisper via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Groq key, zero markup, ~500 tok/s typical.

Start free with your Groq key → Read the docs

Endpoint: https://api.verticalapi.com/v1/chat/completions · BYOK header: X-Provider-Key: gsk_...

Supported models

Groq models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Groq models are typically supported within 24h of release.

Model ID	Name	Context	Pricing (provider)
`llama-3.3-70b-versatile`	Llama 3.3 70B (Groq)	128K	$0.59 / $0.79 per 1M tok
`llama-3.1-8b-instant`	Llama 3.1 8B Instant	128K	$0.05 / $0.08 per 1M tok
`mixtral-8x7b-32768`	Mixtral 8x7B	32K	$0.24 / $0.24 per 1M tok
`whisper-large-v3`	Whisper Large v3	audio	$0.111 per hour audio

Pricing reflects Groq's rates — you pay Groq directly. VerticalAPI adds zero markup on tokens.

Quickstart

5-line Groq call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

                groq_quickstart.py
                Python
            
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "gsk_..."}
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",  # Groq
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Why use Groq via VerticalAPI

Four reasons developers route Groq through us

Zero token markup

You pay Groq directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Groq alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Groq to other providers on identical prompts.

Observability built in

Every Groq call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Best for

Where Groq shines

sub-100ms first-token latency real-time agents voice (Whisper) interactive UX

FAQ

Common questions about Groq on VerticalAPI

Why route Groq through VerticalAPI?

Speed isn't the bottleneck — orchestration is. Use Groq for time-critical hops and Claude for hard reasoning, all via the same OpenAI-compatible endpoint and one key.

What's the typical latency?

Groq's LPU regularly delivers ~500 tok/s sustained on Llama 3.3 70B. VerticalAPI adds ~5-10ms gateway overhead, dashboard-tracked per request.

Switch providers

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.

OpenAI Anthropic Google Gemini Mistral AI Meta Llama xAI Grok Groq Together AI Fireworks AI Perplexity Sonar Cohere AI21 Labs AWS Bedrock Azure OpenAI Google Vertex AI

Ship on Groq in 60 seconds

Free tier — bring your own Groq key, zero markup, OpenAI-compatible endpoint.

Get your VerticalAPI key →