Cerebras via VerticalAPI
Cerebras CS-3 wafer-scale inference (Llama 3.3, Llama 4) via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Cerebras key, zero markup, ~2000 tok/s typical.
Cerebras models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New Cerebras models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
llama3.3-70b |
Llama 3.3 70B (Cerebras) | 128K | $0.85 / $1.20 per 1M tok |
llama-4-scout |
Llama 4 Scout (Cerebras) | 10M | Preview pricing — host-dependent |
llama3.1-8b |
Llama 3.1 8B (Cerebras) | 128K | $0.10 / $0.10 per 1M tok |
Pricing reflects Cerebras's rates — you pay Cerebras directly. VerticalAPI adds zero markup on tokens.
5-line Cerebras call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "csk-..."} ) response = client.chat.completions.create( model="llama3.3-70b", # Cerebras messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route Cerebras through us
Zero token markup
You pay Cerebras directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
Cerebras alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Cerebras to other providers on identical prompts.
Observability built in
Every Cerebras call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where Cerebras shines
Common questions about Cerebras on VerticalAPI
How fast is Cerebras vs Groq?
Cerebras typically delivers 2-3x more tokens/sec than Groq on Llama 3.3 70B (~2000 tok/s vs ~500 tok/s). Time-to-first-token is similarly faster. <!-- TODO Hugo: refresh with current public benchmarks -->
Is Cerebras production-ready?
Cerebras' inference cloud has been GA since 2024. VerticalAPI surfaces per-request latency so you can verify SLA against your own traffic patterns.
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on Cerebras in 60 seconds
Free tier — bring your own Cerebras key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →