AI21 Labs via VerticalAPI

AI21 Jamba 1.5 Large and Mini (hybrid Mamba/Transformer, 256K context) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <ai21-key>

AI21 Labs models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New AI21 Labs models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
jamba-1.5-large Jamba 1.5 Large 256K $2 / $8 per 1M tok
jamba-1.5-mini Jamba 1.5 Mini 256K $0.20 / $0.40 per 1M tok

Pricing reflects AI21 Labs's rates — you pay AI21 Labs directly. VerticalAPI adds zero markup on tokens.

5-line AI21 Labs call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

ai21_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="jamba-1.5-mini",  # AI21 Labs
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route AI21 Labs through us

Zero token markup

You pay AI21 Labs directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

AI21 Labs alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare AI21 Labs to other providers on identical prompts.

Observability built in

Every AI21 Labs call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where AI21 Labs shines

256K context at low cost structured JSON output long-doc QA hybrid architecture experiments

Common questions about AI21 Labs on VerticalAPI

What makes Jamba different?

Jamba's hybrid Mamba/Transformer architecture is faster on long-context inputs than pure-Transformer rivals, especially in the 100K+ token range.

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.