Groq via VerticalAPI
Call Groq's LPU-accelerated Llama 3.3, Mixtral and Whisper via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Groq key, zero markup, ~500 tok/s typical.
Groq models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New Groq models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
llama-3.3-70b-versatile |
Llama 3.3 70B (Groq) | 128K | $0.59 / $0.79 per 1M tok |
llama-3.1-8b-instant |
Llama 3.1 8B Instant | 128K | $0.05 / $0.08 per 1M tok |
mixtral-8x7b-32768 |
Mixtral 8x7B | 32K | $0.24 / $0.24 per 1M tok |
whisper-large-v3 |
Whisper Large v3 | audio | $0.111 per hour audio |
Pricing reflects Groq's rates — you pay Groq directly. VerticalAPI adds zero markup on tokens.
5-line Groq call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "gsk_..."} ) response = client.chat.completions.create( model="llama-3.3-70b-versatile", # Groq messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route Groq through us
Zero token markup
You pay Groq directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
Groq alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Groq to other providers on identical prompts.
Observability built in
Every Groq call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where Groq shines
Common questions about Groq on VerticalAPI
Why route Groq through VerticalAPI?
Speed isn't the bottleneck — orchestration is. Use Groq for time-critical hops and Claude for hard reasoning, all via the same OpenAI-compatible endpoint and one key.
What's the typical latency?
Groq's LPU regularly delivers ~500 tok/s sustained on Llama 3.3 70B. VerticalAPI adds ~5-10ms gateway overhead, dashboard-tracked per request.
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on Groq in 60 seconds
Free tier — bring your own Groq key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →