Build a chatbot via VerticalAPI

Recommended providers

Best models for this use case

Claude Haiku 4.5

Cheapest agentic chat model with strong tool use ($1 / $5 per 1M tok)

View Claude Haiku 4.5 integration →

Gemini 2.5 Flash

Generous free tier, fast TTFT, multimodal in if you need it ($0.30 / $2.50)

View Gemini 2.5 Flash integration →

Llama 3.3 70B (Groq)

Sub-100ms latency for typing-feel UX; cheapest open-weights chat

View Llama 3.3 70B (Groq) integration →

Architecture

How it fits together

Frontend (React/Next) → POST /api/chat → your backend → VerticalAPI /v1/chat/completions (streaming) → token-streamed response back to frontend. Add a Redis cache for system prompts and a per-user rate limit at your edge.

Code example

Working example in python

chatbot.pythonPython
from openai import OpenAI

client = OpenAI(
    base_url="https://api.verticalapi.com/v1",
    api_key="vapi_...",
    default_headers={"X-Provider-Key": "sk-ant-..."}
)

response = client.chat.completions.create(
    model="claude-haiku-4-5",
    messages=[
        {"role": "system", "content": "You are Acme's friendly support bot."},
        {"role": "user", "content": user_message}
    ],
    stream=True,
)
for chunk in response:
    yield chunk.choices[0].delta.content or ""

Pricing estimate

Typical cost at production volume

Typical chatbot serving 100K conversations/month at ~10 turns each (~500 tokens per turn) costs roughly $50-150/month on Claude Haiku 4.5, $20-60 on Gemini Flash, or $30-80 on Llama 3.3 70B via Groq.

See VerticalAPI plan pricing →

FAQ

Common questions

Should I stream responses?

Yes — streaming reduces perceived latency dramatically. VerticalAPI passes streaming through unchanged; the OpenAI SDK's stream=True works on every supported provider.

How do I switch model based on user tier?

Pass model='claude-sonnet-4-5' for paid users, model='claude-haiku-4-5' for free. Same endpoint, same auth, just a different model field.

What about tool calling / function calling?

Standard OpenAI tools[] array works on Claude, GPT-4o, Gemini, Mistral, Llama (via Together/Groq/Fireworks). VerticalAPI normalizes provider differences so your tool definitions are portable.

More guides

Other use cases

Build a RAG application via VerticalAPI

I want to build retrieval-augmented generation (RAG)

Read guide →

Build autonomous agents via VerticalAPI

I want to build an agentic / autonomous LLM workflow

Read guide →

Function calling and tool use across providers

I want consistent tool-calling behavior across multiple LLM providers

Read guide →

Vision + text via VerticalAPI

I want to send images, audio or video to an LLM

Read guide →