Build autonomous agents via VerticalAPI

Agents — multi-step LLM workflows that plan, call tools, and observe results — are the dominant 2026 use case. The right model choice depends on the agent's depth (single-turn tool call vs deep planning) and on cost sensitivity. VerticalAPI lets you compose models per-step: cheap classification on Haiku, expensive reasoning on Opus, fast tool calls on Groq.

How it fits together

User goal → planner LLM (Claude Sonnet) → emit tool call → tool runtime executes (search, DB, code exec) → observation back to LLM → repeat until done. Persist trajectory for replay & debug. Use VerticalAPI's per-request trace IDs to correlate.

Working example in python

agent.pythonPython
from openai import OpenAI

client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")

tools = [{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search the public web",
        "parameters": {"type": "object", "properties": {"q": {"type": "string"}}}
    }
}]

messages = [{"role": "user", "content": "What's the weather in Aix today?"}]
while True:
    r = client.chat.completions.create(
        model="claude-sonnet-4-5",
        messages=messages,
        tools=tools,
    )
    msg = r.choices[0].message
    if not msg.tool_calls:
        print(msg.content); break
    for tc in msg.tool_calls:
        result = run_tool(tc.function.name, tc.function.arguments)
        messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

Typical cost at production volume

An agent doing 5-10 tool calls per task with ~5K tokens of context costs roughly $0.05-0.20 per task on Claude Sonnet 4.5, $0.03-0.10 on GPT-4o. With prompt caching enabled on a stable system prompt, repeated tasks drop to $0.01-0.03. Volume of 100K tasks/month → $1K-20K range.

See VerticalAPI plan pricing →

Common questions

Should I use a framework or roll my own?

For prototypes, frameworks (LangChain, CrewAI, AutoGen) are fast. For production, rolling your own loop on top of OpenAI-compatible primitives (chat.completions + tools) is easier to debug and cheaper to operate. VerticalAPI is the OpenAI-compatible layer underneath either approach.

How do I keep agent costs predictable?

Set a token budget per task in your code, log it via VerticalAPI's trace, and abort if exceeded. Use cheaper models (Haiku 4.5, Gemini Flash) for routing/classification steps, reserve Sonnet 4.5 for the hard planning step.

Can I build multi-agent systems?

Yes. Each sub-agent can use a different model — VerticalAPI is just a transport. Pattern: orchestrator on Sonnet 4.5, workers on Haiku or Llama 3.3 70B via Groq for speed.