AI21 Jamba via VerticalAPI

AI21 Jamba 1.6 family (open-weights, hybrid Mamba/Transformer, 256K context) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup, Studio or self-hosted.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <ai21-jamba-key>

AI21 Jamba models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New AI21 Jamba models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
jamba-1.6-large Jamba 1.6 Large 256K $2 / $8 per 1M tok
jamba-1.6-mini Jamba 1.6 Mini 256K $0.20 / $0.40 per 1M tok

Pricing reflects AI21 Jamba's rates — you pay AI21 Jamba directly. VerticalAPI adds zero markup on tokens.

5-line AI21 Jamba call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

jamba_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="jamba-1.6-mini",  # AI21 Jamba
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route AI21 Jamba through us

Zero token markup

You pay AI21 Jamba directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

AI21 Jamba alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare AI21 Jamba to other providers on identical prompts.

Observability built in

Every AI21 Jamba call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where AI21 Jamba shines

long-doc QA at low cost structured JSON output self-hosted (open weights) compliance-friendly fine-tunes

Frequently asked questions

What is Jamba and what models do they offer?

Jamba is the open-weight + closed-API LLM family from AI21 Labs based on a hybrid Mamba (state-space model) + Transformer architecture with Mixture-of-Experts. The 2026 lineup is Jamba 1.5 Large (the flagship, 94B active parameters) and Jamba 1.5 Mini (12B active). Both support a 256K context window, tool use, JSON mode and streaming. Open-weight checkpoints are available on Hugging Face under a non-commercial Jamba Open Model License.

How much does Jamba cost in 2026?

Jamba 1.5 Large via AI21 Studio is roughly $2 per 1M input tokens and $8 per 1M output. Jamba 1.5 Mini is around $0.20/$0.40. AWS Bedrock and Azure pricing matches list. The Mamba layers make long-context (>32K) effectively cheaper than Transformer-only competitors. Via VerticalAPI BYOK you pay AI21 directly with zero token markup.

How do I use Jamba via VerticalAPI BYOK?

Create a key at studio.ai21.com or use AWS Bedrock / Azure AI Foundry, paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. VerticalAPI translates OpenAI chat completions into Jamba's chat endpoint, preserves the full 256K context, tool calls and streaming. Billing stays with the underlying provider.

What is Jamba best for compared to alternatives?

Jamba wins on long-context economics: at 100K–256K context the Mamba layers are 2–3× cheaper and faster than equivalent Transformer-only models (GPT-4o, Claude). Ideal for long contracts, financial filings, codebase analysis, long-form RAG. Compared to Gemini 2.5 Pro (2M) it has shorter context but lower per-token cost. Not the right pick for short-context agentic coding where Claude leads.

Where is Jamba hosted / data privacy?

Jamba runs on AI21's AWS infrastructure, plus AWS Bedrock, Azure AI Foundry, Vertex AI and Snowflake Cortex. Inputs and outputs are not used to train models. Enterprise tiers include zero data retention, SOC 2 and HIPAA. Via VerticalAPI BYOK your AI21 or hyperscaler contract terms remain intact.

Limitations and trade-offs

  • Quality on coding (SWE-Bench) and complex reasoning trails Claude Sonnet 4.5 and GPT-5.
  • Open-weight checkpoints are released under a non-commercial license — not freely usable in SaaS.
  • Ecosystem of fine-tunes and community tooling is much smaller than Llama.
  • 256K context is below Gemini 2.5 Pro's 2M for the largest-document tasks.
  • No native multimodal — text only as of 2026.

Where Jamba is heading

  1. Jamba 2 expected to extend context and improve quality.
  2. More efficient open-weight releases targeting commercial use.
  3. Multimodal Jamba variants exploring vision input.
  4. Wider sovereign cloud availability via hyperscaler partners.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Jamba 1.5 Large vs Claude Sonnet 4.5 for 200K context RAG?
  • Is Mamba really faster than Transformer at long context?
  • Best provider for Jamba — direct AI21, AWS Bedrock or Azure?
  • Can I self-host Jamba open weights for commercial use?
  • Jamba Mini vs Mistral Small 3 for cheap enterprise workloads?