Together AI via VerticalAPI

Together AI's open-weights catalog (Llama, Qwen, DeepSeek, Mixtral) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup, fine-tuning friendly.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: <together-key>

Together AI models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Together AI models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
meta-llama/Llama-3.3-70B-Instruct-Turbo Llama 3.3 70B Turbo 128K $0.88 per 1M tok
Qwen/Qwen2.5-72B-Instruct-Turbo Qwen2.5 72B Turbo 32K $1.20 per 1M tok
deepseek-ai/DeepSeek-V3 DeepSeek V3 64K $1.25 per 1M tok
mistralai/Mixtral-8x22B-Instruct-v0.1 Mixtral 8x22B 64K $1.20 per 1M tok

Pricing reflects Together AI's rates — you pay Together AI directly. VerticalAPI adds zero markup on tokens.

5-line Together AI call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

together-ai_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "..."}
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",  # Together AI
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Together AI through us

Zero token markup

You pay Together AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Together AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Together AI to other providers on identical prompts.

Observability built in

Every Together AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Together AI shines

open-weights variety fine-tuned models DeepSeek reasoning Qwen multilingual

Frequently asked questions

What is Together AI and what models do they offer?

Together AI is an open-source AI cloud hosting 200+ models. The 2026 catalog includes Llama 3.3 70B and 405B, Llama 3.1 8B, Mixtral 8x7B and 8x22B, Qwen 2.5 (7B–72B), DeepSeek V3 and R1, Gemma 2, plus image (FLUX.1, Stable Diffusion 3) and code (DeepSeek Coder, Qwen Coder) models. Together also offers fine-tuning, dedicated endpoints and an OpenAI-compatible API.

How much does Together AI cost in 2026?

Llama 3.3 70B Instruct Turbo is around $0.88 per 1M tokens (input and output). Llama 405B Instruct Turbo is roughly $3.50/$3.50. Llama 8B is ~$0.18/$0.18. Mixtral 8x22B is around $1.20/$1.20. DeepSeek R1 is competitive on reasoning at low cost. FLUX.1 schnell is $0.0003 per image. Fine-tuning is per-token + storage. Via VerticalAPI BYOK you pay Together directly at list with zero markup.

How do I use Together AI via VerticalAPI BYOK?

Create a key at api.together.xyz/settings/api-keys, paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. Together is OpenAI-compatible, so VerticalAPI passes through, adding unified logging and automatic fallback to Groq, Fireworks or DeepInfra if Together is saturated. Billing stays on your Together invoice.

What is Together AI best for compared to alternatives?

Together wins on breadth of open-weight catalog (200+ models in one API), fine-tuning support, and competitive pricing on Llama 3.3 and DeepSeek. Compared to Groq or Cerebras it is slower (GPU-based, not LPU/WSE) but offers many more models. Compared to DeepInfra and Fireworks it is roughly at price parity with broader catalog. Not a fit for frontier closed models — pick OpenAI, Anthropic or Google directly.

Where is Together AI hosted / data privacy?

Together runs on its own GPU clusters across US datacenters. API data is not used to train models. Enterprise contracts include zero data retention, SOC 2 Type II and HIPAA. Dedicated endpoints provide isolated inference. Via VerticalAPI BYOK your Together contract terms remain intact.

Limitations and trade-offs

  • Inference speed is lower than Groq or Cerebras for the same Llama 70B model (GPU vs specialized chips).
  • No frontier closed models (no GPT-5, no Claude Opus) — open weights only.
  • Geographic coverage is US-centric — higher RTT for European apps.
  • Quality on coding benchmarks (SWE-Bench) trails Claude Sonnet 4.5 even at Llama 405B.
  • Fine-tuning storage and dedicated endpoint pricing can add hidden monthly costs.

Where Together AI is heading

  1. Continued addition of new open-weight releases (Llama 4, Qwen 3, DeepSeek next-gen) within days of release.
  2. Faster inference via hardware upgrades and new Turbo quantization variants.
  3. Expanded fine-tuning (LoRA, full SFT) and dedicated multi-tenant endpoints.
  4. EU region launch for sovereignty-conscious customers.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Together AI vs Fireworks AI — which is cheaper for Llama 3.3 70B?
  • Best provider for Llama 405B production inference?
  • Is DeepSeek R1 on Together a viable o1 alternative?
  • How does Together fine-tuning compare to OpenAI fine-tuning?
  • Together AI vs Groq — quality vs speed tradeoff?