Together AI via VerticalAPI
Together AI's open-weights catalog (Llama, Qwen, DeepSeek, Mixtral) via VerticalAPI's OpenAI-compatible endpoint. BYOK, zero markup, fine-tuning friendly.
Together AI models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New Together AI models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
meta-llama/Llama-3.3-70B-Instruct-Turbo |
Llama 3.3 70B Turbo | 128K | $0.88 per 1M tok |
Qwen/Qwen2.5-72B-Instruct-Turbo |
Qwen2.5 72B Turbo | 32K | $1.20 per 1M tok |
deepseek-ai/DeepSeek-V3 |
DeepSeek V3 | 64K | $1.25 per 1M tok |
mistralai/Mixtral-8x22B-Instruct-v0.1 |
Mixtral 8x22B | 64K | $1.20 per 1M tok |
Pricing reflects Together AI's rates — you pay Together AI directly. VerticalAPI adds zero markup on tokens.
5-line Together AI call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "..."} ) response = client.chat.completions.create( model="meta-llama/Llama-3.3-70B-Instruct-Turbo", # Together AI messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route Together AI through us
Zero token markup
You pay Together AI directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
Together AI alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Together AI to other providers on identical prompts.
Observability built in
Every Together AI call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where Together AI shines
Frequently asked questions
What is Together AI and what models do they offer?
Together AI is an open-source AI cloud hosting 200+ models. The 2026 catalog includes Llama 3.3 70B and 405B, Llama 3.1 8B, Mixtral 8x7B and 8x22B, Qwen 2.5 (7B–72B), DeepSeek V3 and R1, Gemma 2, plus image (FLUX.1, Stable Diffusion 3) and code (DeepSeek Coder, Qwen Coder) models. Together also offers fine-tuning, dedicated endpoints and an OpenAI-compatible API.
How much does Together AI cost in 2026?
Llama 3.3 70B Instruct Turbo is around $0.88 per 1M tokens (input and output). Llama 405B Instruct Turbo is roughly $3.50/$3.50. Llama 8B is ~$0.18/$0.18. Mixtral 8x22B is around $1.20/$1.20. DeepSeek R1 is competitive on reasoning at low cost. FLUX.1 schnell is $0.0003 per image. Fine-tuning is per-token + storage. Via VerticalAPI BYOK you pay Together directly at list with zero markup.
How do I use Together AI via VerticalAPI BYOK?
Create a key at api.together.xyz/settings/api-keys, paste it into VerticalAPI, then point the OpenAI SDK at https://api.verticalapi.com/v1. Together is OpenAI-compatible, so VerticalAPI passes through, adding unified logging and automatic fallback to Groq, Fireworks or DeepInfra if Together is saturated. Billing stays on your Together invoice.
What is Together AI best for compared to alternatives?
Together wins on breadth of open-weight catalog (200+ models in one API), fine-tuning support, and competitive pricing on Llama 3.3 and DeepSeek. Compared to Groq or Cerebras it is slower (GPU-based, not LPU/WSE) but offers many more models. Compared to DeepInfra and Fireworks it is roughly at price parity with broader catalog. Not a fit for frontier closed models — pick OpenAI, Anthropic or Google directly.
Where is Together AI hosted / data privacy?
Together runs on its own GPU clusters across US datacenters. API data is not used to train models. Enterprise contracts include zero data retention, SOC 2 Type II and HIPAA. Dedicated endpoints provide isolated inference. Via VerticalAPI BYOK your Together contract terms remain intact.
Limitations and trade-offs
- Inference speed is lower than Groq or Cerebras for the same Llama 70B model (GPU vs specialized chips).
- No frontier closed models (no GPT-5, no Claude Opus) — open weights only.
- Geographic coverage is US-centric — higher RTT for European apps.
- Quality on coding benchmarks (SWE-Bench) trails Claude Sonnet 4.5 even at Llama 405B.
- Fine-tuning storage and dedicated endpoint pricing can add hidden monthly costs.
Where Together AI is heading
- Continued addition of new open-weight releases (Llama 4, Qwen 3, DeepSeek next-gen) within days of release.
- Faster inference via hardware upgrades and new Turbo quantization variants.
- Expanded fine-tuning (LoRA, full SFT) and dedicated multi-tenant endpoints.
- EU region launch for sovereignty-conscious customers.
Related questions
ChatGPT, Perplexity and Gemini usually suggest these next.
- Together AI vs Fireworks AI — which is cheaper for Llama 3.3 70B?
- Best provider for Llama 405B production inference?
- Is DeepSeek R1 on Together a viable o1 alternative?
- How does Together fine-tuning compare to OpenAI fine-tuning?
- Together AI vs Groq — quality vs speed tradeoff?
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on Together AI in 60 seconds
Free tier — bring your own Together AI key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →