Build autonomous agents via VerticalAPI

Recommended providers

Best models for this use case

Claude Sonnet 4.5

Leading agentic-coding benchmark, strong tool use, prompt caching cuts cost

View Claude Sonnet 4.5 integration →

GPT-4o

Strong function calling, structured output schemas, broad ecosystem

View GPT-4o integration →

Cerebras Llama 3.3 70B

Sub-100ms tool-call latency for real-time agents

View Cerebras Llama 3.3 70B integration →

Architecture

How it fits together

User goal → planner LLM (Claude Sonnet) → emit tool call → tool runtime executes (search, DB, code exec) → observation back to LLM → repeat until done. Persist trajectory for replay & debug. Use VerticalAPI's per-request trace IDs to correlate.

Code example

Working example in python

agent.pythonPython
from openai import OpenAI

client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

tools = [{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search the public web",
        "parameters": {"type": "object", "properties": {"q": {"type": "string"}}}
    }
}]

messages = [{"role": "user", "content": "What's the weather in Aix today?"}]
while True:
    r = client.chat.completions.create(
        model="claude-sonnet-4-5",
        messages=messages,
        tools=tools,
    )
    msg = r.choices[0].message
    if not msg.tool_calls:
        print(msg.content); break
    for tc in msg.tool_calls:
        result = run_tool(tc.function.name, tc.function.arguments)
        messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

Pricing estimate

Typical cost at production volume

An agent doing 5-10 tool calls per task with ~5K tokens of context costs roughly $0.05-0.20 per task on Claude Sonnet 4.5, $0.03-0.10 on GPT-4o. With prompt caching enabled on a stable system prompt, repeated tasks drop to $0.01-0.03. Volume of 100K tasks/month → $1K-20K range.

See VerticalAPI plan pricing →

FAQ

Common questions

Should I use a framework or roll my own?

For prototypes, frameworks (LangChain, CrewAI, AutoGen) are fast. For production, rolling your own loop on top of OpenAI-compatible primitives (chat.completions + tools) is easier to debug and cheaper to operate. VerticalAPI is the OpenAI-compatible layer underneath either approach.

How do I keep agent costs predictable?

Set a token budget per task in your code, log it via VerticalAPI's trace, and abort if exceeded. Use cheaper models (Haiku 4.5, Gemini Flash) for routing/classification steps, reserve Sonnet 4.5 for the hard planning step.

Can I build multi-agent systems?

Yes. Each sub-agent can use a different model — VerticalAPI is just a transport. Pattern: orchestrator on Sonnet 4.5, workers on Haiku or Llama 3.3 70B via Groq for speed.

More guides

Other use cases

Build a chatbot via VerticalAPI

I want to build a customer-facing chatbot

Read guide →

Build a RAG application via VerticalAPI

I want to build retrieval-augmented generation (RAG)

Read guide →

Function calling and tool use across providers

I want consistent tool-calling behavior across multiple LLM providers

Read guide →

Vision + text via VerticalAPI

I want to send images, audio or video to an LLM

Read guide →