Databricks Mosaic via VerticalAPI

Databricks Mosaic AI inference (DBRX, Llama 3.3, Mixtral) via VerticalAPI's OpenAI-compatible endpoint. BYOK with your Databricks PAT, zero markup, your data stays in your workspace.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: dapi... + x-databricks-host header

Databricks Mosaic models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New Databricks Mosaic models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
databricks-dbrx-instruct DBRX Instruct 32K Databricks DBU pricing
databricks-meta-llama-3-3-70b-instruct Llama 3.3 70B (Databricks) 128K Databricks DBU pricing
databricks-mixtral-8x7b-instruct Mixtral 8x7B (Databricks) 32K Databricks DBU pricing

Pricing reflects Databricks Mosaic's rates — you pay Databricks Mosaic directly. VerticalAPI adds zero markup on tokens.

5-line Databricks Mosaic call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

databricks-mosaic_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "dapi..."}
)

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",  # Databricks Mosaic
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route Databricks Mosaic through us

Zero token markup

You pay Databricks Mosaic directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

Databricks Mosaic alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare Databricks Mosaic to other providers on identical prompts.

Observability built in

Every Databricks Mosaic call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where Databricks Mosaic shines

data-residency in Databricks workspace Unity Catalog governance fine-tuned models on customer data lakehouse-integrated agents

Frequently asked questions

What is Databricks Mosaic and what models do they offer?

Databricks Mosaic AI (formerly MosaicML, acquired in 2023) is the LLM and AI stack inside Databricks. The 2026 catalog on Foundation Model APIs includes Llama 3.3 70B, Llama 3.1 8B and 405B, Mixtral 8x7B, DBRX Instruct, Meta-Llama-3 variants and BGE / GTE embeddings. Mosaic AI adds Vector Search, RAG Studio, AI Gateway, model serving, fine-tuning and the Agent Framework — all integrated with Unity Catalog governance.

How much does Databricks Mosaic cost in 2026?

Foundation Model APIs are billed in DBUs (Databricks Units) which translate to dollars depending on workspace tier — roughly $1–$2 per 1M input tokens and $3–$6 per 1M output for Llama 3.3 70B on serverless pay-per-token, with provisioned throughput discounts for committed capacity. Plus underlying compute and storage. Via VerticalAPI BYOK you pay Databricks directly with zero token markup.

How do I use Databricks Mosaic via VerticalAPI BYOK?

Generate a Personal Access Token in your Databricks workspace, paste it into VerticalAPI with the workspace URL, then point the OpenAI SDK at https://api.verticalapi.com/v1. VerticalAPI translates OpenAI chat completions into Databricks' Foundation Model API. Billing stays inside your Databricks workspace under standard DBU consumption.

What is Databricks Mosaic best for compared to alternatives?

Databricks Mosaic wins for enterprises already on Databricks: data and models in one platform, Unity Catalog governance, native RAG against Lakehouse tables, and the Agent Framework on production data. Compared to AWS Bedrock or Vertex AI it is narrower on third-party model catalog but uniquely strong on data-native AI workflows. Not the cheapest path for raw token inference.

Where is Databricks Mosaic hosted / data privacy?

Databricks runs on AWS, Azure and GCP across 50+ regions. Mosaic Foundation Model APIs respect workspace region. Inputs and outputs are not used to train models. Enterprise tier includes HIPAA, PCI, FedRAMP, EU residency, customer-managed keys and PrivateLink. Via VerticalAPI BYOK your Databricks workspace remains the data controller.

Limitations and trade-offs

  • Pricing in DBUs is more expensive per token than dedicated open-weight hosts like DeepInfra or Together.
  • Catalog is narrower than AWS Bedrock or Vertex AI for third-party models.
  • Authentication via PAT requires a Databricks workspace and Unity Catalog setup — significant onboarding.
  • Latency from outside the Databricks region can be higher than a dedicated inference host.
  • Best leveraged when data already lives in Databricks — overkill for non-data-platform use cases.

Where Databricks Mosaic is heading

  1. Deeper integration of Mosaic Agent Framework with Genie and AI/BI dashboards.
  2. More frontier-class third-party models added to Foundation Model APIs (Claude, Mistral expansion).
  3. Improved fine-tuning and pretraining tooling targeting domain-specific LLMs.
  4. Stronger multi-cloud parity across AWS, Azure and GCP workspaces.

Related questions

ChatGPT, Perplexity and Gemini usually suggest these next.

  • Databricks Mosaic vs AWS Bedrock for enterprise LLM serving?
  • Is DBRX still competitive in 2026?
  • How does Mosaic AI Agent Framework compare to LangGraph or AutoGen?
  • Best way to do RAG on Lakehouse data with Mosaic?
  • Mosaic Foundation Model APIs vs running Llama on your own Databricks cluster?