NVIDIA NIM via VerticalAPI

Updated May 04, 2026·By VerticalAPI Team

NVIDIA NIM (NVIDIA Inference Microservices) for Llama, Mistral, Phi via VerticalAPI's OpenAI-compatible endpoint. BYOK with your NGC API key, zero markup, TensorRT-optimized.

Start free with your NVIDIA NIM key → Read the docs

Endpoint: https://api.verticalapi.com/v1/chat/completions · BYOK header: X-Provider-Key: nvapi-...

Supported models

NVIDIA NIM models routed by VerticalAPI

Pass the model ID below as model in any OpenAI-compatible request. New NVIDIA NIM models are typically supported within 24h of release.

Model ID	Name	Context	Pricing (provider)
`meta/llama-3.3-70b-instruct`	Llama 3.3 70B (NIM)	128K	NGC subscription pricing
`mistralai/mistral-large-2`	Mistral Large 2 (NIM)	128K	NGC subscription pricing
`microsoft/phi-3.5-moe-instruct`	Phi 3.5 MoE (NIM)	128K	NGC pricing — efficient

Pricing reflects NVIDIA NIM's rates — you pay NVIDIA NIM directly. VerticalAPI adds zero markup on tokens.

Quickstart

5-line NVIDIA NIM call via VerticalAPI

Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.

                nvidia-nim_quickstart.py
                Python
            
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "nvapi-..."}
)

response = client.chat.completions.create(
    model="meta/llama-3.3-70b-instruct",  # NVIDIA NIM
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

Why use NVIDIA NIM via VerticalAPI

Four reasons developers route NVIDIA NIM through us

Zero token markup

You pay NVIDIA NIM directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.

One key, every provider

NVIDIA NIM alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.

Latency & cost monitoring

Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare NVIDIA NIM to other providers on identical prompts.

Observability built in

Every NVIDIA NIM call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.

Best for

Where NVIDIA NIM shines

TensorRT-optimized inference DGX Cloud deployment on-prem NIM containers NeMo fine-tunes

FAQ

Common questions about NVIDIA NIM on VerticalAPI

Can NIM be self-hosted?

Yes. NIM ships as Docker containers; VerticalAPI can route to either the hosted NVIDIA endpoint or your self-hosted NIM via the dashboard's endpoint override field.

Does VerticalAPI add markup on NIM?

No — same zero-markup policy. You pay NVIDIA directly for the NGC subscription or self-hosted licensing.

Switch providers

All supported LLM providers

Same endpoint, same SDK — just change the model and the BYOK header.

Ship on NVIDIA NIM in 60 seconds

Free tier — bring your own NVIDIA NIM key, zero markup, OpenAI-compatible endpoint.

Get your VerticalAPI key →