LLM fine-tuning platforms compared (2026)
OpenAI, AWS Bedrock, Mistral, Hugging Face, Together, Vertex AI. The choice depends on whether you want to fine-tune a closed-weight model or an open-weight one, and whether you can self-host or need a managed deploy.
OpenAI, Bedrock, Mistral, HF, Together, Vertex
| Platform | Models | Method | Indicative price |
|---|---|---|---|
| OpenAI | GPT-4o, GPT-4o mini | Managed SFT, DPO | $25 / 1M (mini), $90 / 1M (4o) training |
| AWS Bedrock | Claude Haiku, Llama, Cohere, Nova | Managed SFT | Hourly training + provisioned throughput |
| Mistral La Plateforme | Mistral Small, Mistral Large (limited) | LoRA managed | EUR-priced per training job |
| Hugging Face Spaces / AutoTrain | Llama 3.3, Mistral, Qwen, DeepSeek | LoRA, QLoRA, full SFT | ~$5-30 small dataset (10K rows) |
| Together AI | Llama, Mixtral, Qwen, DeepSeek | LoRA + serverless deploy | Per-token training + standard inference |
| Vertex AI | Gemini 2.5 Flash, Pro | Supervised tuning | Per-node-hour training |
| Self-host (rented GPUs) | Any open-weight | LoRA, QLoRA, full SFT | H100 at $3-5/hr |
VerticalAPI verdict
For closed-weight, fine-tune on OpenAI directly (cheapest path to a tuned GPT-4o mini) or Bedrock (only Claude tuning option). For open-weight, Together is the smoothest end-to-end path (train + deploy serverless on the same platform); Hugging Face is best when you want full control and ownership of the adapter weights. Self-host above ~20 FT jobs/year. Whichever you pick, route the tuned model alongside base models via VerticalAPI BYOK — same SDK, same endpoint.
Frequently asked questions
What are the main LLM fine-tuning platforms in 2026?
The main fine-tuning platforms in 2026 are OpenAI (GPT-4o, GPT-4o mini fine-tuning), AWS Bedrock (Anthropic-supervised Claude Haiku FT, Meta Llama, Cohere, Amazon Nova), Mistral La Plateforme (limited Mistral Small/Large LoRA), Hugging Face Spaces and AutoTrain (open-weight LoRA/QLoRA on Llama, Mistral, Qwen, DeepSeek), Together AI (LoRA on open-weight plus serverless deploy in the same platform), and Vertex AI (Gemini supervised tuning). OpenAI dominates for closed-weight fine-tuning; Hugging Face and Together cover the open-weight ecosystem.
Can I fine-tune Claude or Gemini directly?
Claude fine-tuning is available only on AWS Bedrock and is limited to Claude Haiku in 2026 — Anthropic does not offer direct fine-tuning through its own API. Gemini supervised tuning is available on Google Vertex AI for Gemini 2.5 Flash and Pro tiers via the Tuning Studio UI or the Vertex SDK. Both are managed processes where you upload JSONL training data, kick off a tuning job, and deploy a tuned endpoint. Neither vendor exposes raw weights or LoRA adapters externally.
What is the difference between LoRA, QLoRA, and full fine-tuning?
Full fine-tuning updates all model weights and requires substantial GPU memory (a Llama 70B full FT needs roughly 8x H100 80GB or 4x H200). LoRA (Low-Rank Adaptation) trains only small adapter matrices on top of the frozen base model, reducing memory by 10-100x while keeping most of the quality on narrow tasks. QLoRA combines LoRA with 4-bit quantization of the base model, letting you fine-tune Llama 70B on a single H100 or A100 80GB. For most production use cases (style adaptation, domain Q&A, structured output, format compliance), LoRA is the right default.
How much does fine-tuning cost in 2026?
OpenAI charges roughly $25 per million training tokens on GPT-4o mini and $90 on GPT-4o, plus inference on the tuned model at a higher rate than base. Bedrock Claude Haiku fine-tuning is priced per hour of training and per deployed throughput unit. Hugging Face and Together LoRA training on Llama 3.3 70B typically costs $5-30 for a small dataset (10K examples). Self-hosted on rented GPUs (H100 at $3-5/hour from CoreWeave, Lambda, RunPod) gives the lowest TCO above a few dozen FT jobs per year.
Can I serve a fine-tuned model through one API?
Yes. VerticalAPI's OpenAI-compatible endpoint at https://api.verticalapi.com/v1 accepts custom fine-tuned model IDs alongside base models. Whether your tune lives on OpenAI (gpt-4o:org:my-tune:abc123), Bedrock (claude-haiku-ft-prod), or a self-hosted vLLM endpoint, you route to it through the same SDK by changing the model parameter and the X-Provider-Key header. BYOK means no token markup — you pay each FT provider directly at list price.
Limitations of this comparison
- Fine-tuning pricing pages change frequently; OpenAI alone updated GPT-4o FT pricing twice in 2025.
- Bedrock Claude Haiku FT requires provisioned throughput, which has minimum monthly commitments.
- Quality of a fine-tune depends more on dataset quality than on platform; bad data ruins any platform's output.
- Closed-weight tunes are locked to the vendor — migrating them requires re-training on the destination platform.
- Mistral fine-tuning options have been narrowing since 2025 as the lab focuses on its hosted API.
What may change in 12-24 months
- Anthropic is expected to open Claude Sonnet fine-tuning beyond Haiku, but likely still Bedrock-gated.
- Direct preference optimization (DPO) and reinforcement-based tuning will become standard offerings on managed platforms.
- Adapter-merging tools will let teams compose multiple LoRA adapters on one base model in production.
- Per-task synthetic data generation (often by GPT-4o or Claude Sonnet 4.5) will make small fine-tunes cheaper and faster.
Related questions
ChatGPT, Perplexity and Gemini usually suggest these next.
- When does fine-tuning beat a strong system prompt plus RAG?
- How do I prepare a good JSONL dataset for OpenAI fine-tuning?
- Is fine-tuning Llama 3.3 70B worth it vs Claude Sonnet 4.5 base?
- How much does it cost to host a fine-tuned Llama 3.3 70B in production?
- Can I fine-tune Gemini for French-only output reliably?
More LLM comparisons
Enterprise LLM hosting 2026
Fine-tune freedom vs vendor quality
Enterprise model training compared
The two open-weight defaults
Beyond generative LLMs