NVIDIA NIM via VerticalAPI

NVIDIA NIM (NVIDIA Inference Microservices) for Llama, Mistral, Phi via VerticalAPI's OpenAI-compatible endpoint. BYOK with your NGC API key, zero markup, TensorRT-optimized.

Endpoint: https://api.verticalapi.com/v1/chat/completions  ·  BYOK header: X-Provider-Key: nvapi-...

NVIDIA NIM models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New NVIDIA NIM models are typically supported within 24h of release.

Model IDNameContextPricing (provider)
meta/llama-3.3-70b-instruct Llama 3.3 70B (NIM) 128K NGC subscription pricing
mistralai/mistral-large-2 Mistral Large 2 (NIM) 128K NGC subscription pricing
microsoft/phi-3.5-moe-instruct Phi 3.5 MoE (NIM) 128K NGC pricing — efficient

Pricing reflects NVIDIA NIM's rates — you pay NVIDIA NIM directly. VerticalAPI adds zero markup on tokens.

5-line NVIDIA NIM call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

nvidia-nim_quickstart.py Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "nvapi-..."}
)

response = client.chat.completions.create(
    model="meta/llama-3.3-70b-instruct",  # NVIDIA NIM
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Four reasons developers route NVIDIA NIM through us

Zero token markup

You pay NVIDIA NIM directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

NVIDIA NIM alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare NVIDIA NIM to other providers on identical prompts.

Observability built in

Every NVIDIA NIM call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Where NVIDIA NIM shines

TensorRT-optimized inference DGX Cloud deployment on-prem NIM containers NeMo fine-tunes

Common questions about NVIDIA NIM on VerticalAPI

Can NIM be self-hosted?

Yes. NIM ships as Docker containers; VerticalAPI can route to either the hosted NVIDIA endpoint or your self-hosted NIM via the dashboard's endpoint override field.

Does VerticalAPI add markup on NIM?

No — same zero-markup policy. You pay NVIDIA directly for the NGC subscription or self-hosted licensing.