NVIDIA NIM via VerticalAPI
NVIDIA NIM (NVIDIA Inference Microservices) for Llama, Mistral, Phi via VerticalAPI's OpenAI-compatible endpoint. BYOK with your NGC API key, zero markup, TensorRT-optimized.
NVIDIA NIM models routed by VerticalAPI
Pass the model ID below as model in any OpenAI-compatible request. New NVIDIA NIM models are typically supported within 24h of release.
| Model ID | Name | Context | Pricing (provider) |
|---|---|---|---|
meta/llama-3.3-70b-instruct |
Llama 3.3 70B (NIM) | 128K | NGC subscription pricing |
mistralai/mistral-large-2 |
Mistral Large 2 (NIM) | 128K | NGC subscription pricing |
microsoft/phi-3.5-moe-instruct |
Phi 3.5 MoE (NIM) | 128K | NGC pricing — efficient |
Pricing reflects NVIDIA NIM's rates — you pay NVIDIA NIM directly. VerticalAPI adds zero markup on tokens.
5-line NVIDIA NIM call via VerticalAPI
Drop-in replacement for the OpenAI SDK. Works with the OpenAI Python client, Node, Go, curl — anything that speaks HTTP.
from openai import OpenAI client = OpenAI( base_url="https://api.verticalapi.com/v1", api_key="vapi_...", default_headers={"X-Provider-Key": "nvapi-..."} ) response = client.chat.completions.create( model="meta/llama-3.3-70b-instruct", # NVIDIA NIM messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
Four reasons developers route NVIDIA NIM through us
Zero token markup
You pay NVIDIA NIM directly with your own key. VerticalAPI's revenue is the gateway subscription, not a tax on your tokens.
One key, every provider
NVIDIA NIM alongside OpenAI, Anthropic, Gemini and 12 more — same OpenAI-compatible endpoint, same SDK, switchable per-request.
Latency & cost monitoring
Per-request token counts, p50/p95 latency and cost dashboards out of the box. Compare NVIDIA NIM to other providers on identical prompts.
Observability built in
Every NVIDIA NIM call gets a trace ID, replayable payload and audit log entry. Wire to Datadog or Sentry via OpenTelemetry.
Where NVIDIA NIM shines
Common questions about NVIDIA NIM on VerticalAPI
Can NIM be self-hosted?
Yes. NIM ships as Docker containers; VerticalAPI can route to either the hosted NVIDIA endpoint or your self-hosted NIM via the dashboard's endpoint override field.
Does VerticalAPI add markup on NIM?
No — same zero-markup policy. You pay NVIDIA directly for the NGC subscription or self-hosted licensing.
All supported LLM providers
Same endpoint, same SDK — just change the model and the BYOK header.
Ship on NVIDIA NIM in 60 seconds
Free tier — bring your own NVIDIA NIM key, zero markup, OpenAI-compatible endpoint.
Get your VerticalAPI key →