Fireworks AI via VerticalAPI

Fireworks AI's optimized Llama 3.3, DeepSeek V3 and function-calling models via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: fw_...

Fireworks AI models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Fireworks AI models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
accounts/fireworks/models/llama-v3p3-70b-instruct Llama 3.3 70B (FW) 128K $0.90 per 1M tok
accounts/fireworks/models/deepseek-v3 DeepSeek V3 (FW) 64K $1.20 per 1M tok
accounts/fireworks/models/firefunction-v2 FireFunction v2 32K $0.90 per 1M tok — tool-tuned

Pricing reflects Fireworks AI's rates — you pay Fireworks AI directly. VerticalAPI adds zero markup on tokens.

5-line Fireworks AI call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

fireworks_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "fw_..."}
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",  # Fireworks AI
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Fireworks AI through us

Zero token markup

You pay Fireworks AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Fireworks AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Fireworks AI to other providers on identical prompts.

Observability built in

Every Fireworks AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Fireworks AI shines

function-calling tuned models DeepSeek hosted FireOptimizer fine-tunes

Frequently asked questions

What is Fireworks AI and what models do they offer?

Fireworks AI is a US inference startup focused on production open-weight serving. The 2026 catalog includes Llama 3.3 70B, Llama 3.1 8B and 405B, Qwen 2.5 (up to 72B), DeepSeek V3 and R1, Mixtral 8x22B, Yi-Large, plus Fireworks-tuned variants like FireFunction (best-in-class tool use on open weights) and FireLLaVA (vision). Fireworks also offers fine-tuning (LoRA, full SFT), embeddings and image generation.

How much does Fireworks AI cost in 2026?

Llama 3.3 70B is roughly $0.90 per 1M input and $0.90 per 1M output. Llama 405B is around $3/$3. Llama 8B is approximately $0.10/$0.10. Qwen 2.5 72B is in the $0.90 range. DeepSeek V3 is competitive on price for its quality tier. Fine-tuning is metered per training token plus serving. Via VerticalAPI BYOK you pay Fireworks directly at list with zero markup.

How do I use Fireworks AI via VerticalAPI BYOK?

Create a key at fireworks.ai/api-keys, paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. Fireworks is OpenAI-compatible; VerticalAPI passes through, adds unified logging, observability and automatic fallback to Together, Groq or DeepInfra. Billing stays on your Fireworks invoice.

What is Fireworks AI best for compared to alternatives?

Fireworks wins on tool calling for open weights (FireFunction is exceptional), production stability, and fine-tuning ergonomics. Compared to Groq it is slower but more flexible (LoRA, custom models, dedicated endpoints). Compared to Together it is at price parity with a narrower but well-optimized catalog. Not a fit for frontier closed models or for cheapest-possible inference (DeepInfra often wins on raw price).

Where is Fireworks AI hosted / data privacy?

Fireworks runs on US GPU datacenters. API data is not used to train models. Enterprise tier includes zero retention, SOC 2 and HIPAA. Dedicated and on-prem deployments are available. Via VerticalAPI BYOK your Fireworks contract terms remain intact.

Limitations and trade-offs

  • Pricing is slightly higher than DeepInfra on some Llama models.
  • Geographic coverage is US-focused — higher RTT for EU and Asia.
  • No frontier closed models — open weights only.
  • Throughput is GPU-based — slower than Groq LPU or Cerebras WSE on Llama 70B.
  • Catalog is narrower than Together for niche open-weight models.

Where Fireworks AI is heading

  1. Continued FireFunction and FireOptimizer improvements for tool-calling and structured output.
  2. Faster custom kernels and FireAttention 3 for higher per-GPU throughput.
  3. Expanded fine-tuning (DPO, RLHF) and dedicated deployment options.
  4. More multimodal models (vision, speech) added through 2026.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Fireworks AI vs Together AI — which is better for production Llama?
  • Is FireFunction the best open-weight tool-calling model?
  • Fireworks fine-tuning vs OpenAI fine-tuning — cost and quality?
  • Best Fireworks model for cheap RAG?
  • How does Fireworks compare to DeepInfra on Llama 3.3 70B price?