Build autonomous agents via VerticalAPI
Agents — multi-step LLM workflows that plan, call tools, and observe results — are the dominant 2026 use case. The right model choice depends on the agent's depth (single-turn tool call vs deep planning) and on cost sensitivity. VerticalAPI lets you compose models per-step: cheap classification on Haiku, expensive reasoning on Opus, fast tool calls on Groq.
Best models for this use case
Claude Sonnet 4.5
Leading agentic-coding benchmark, strong tool use, prompt caching cuts cost
View Claude Sonnet 4.5 integration →Cerebras Llama 3.3 70B
Sub-100ms tool-call latency for real-time agents
View Cerebras Llama 3.3 70B integration →How it fits together
User goal → planner LLM (Claude Sonnet) → emit tool call → tool runtime executes (search, DB, code exec) → observation back to LLM → repeat until done. Persist trajectory for replay & debug. Use VerticalAPI's per-request trace IDs to correlate.
Working example in python
from openai import OpenAI
client = OpenAI(base_url="https://api.verticalapi.com/v1", api_key="vapi_...")
tools = [{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the public web",
"parameters": {"type": "object", "properties": {"q": {"type": "string"}}}
}
}]
messages = [{"role": "user", "content": "What's the weather in Aix today?"}]
while True:
r = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=messages,
tools=tools,
)
msg = r.choices[0].message
if not msg.tool_calls:
print(msg.content); break
for tc in msg.tool_calls:
result = run_tool(tc.function.name, tc.function.arguments)
messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})Typical cost at production volume
An agent doing 5-10 tool calls per task with ~5K tokens of context costs roughly $0.05-0.20 per task on Claude Sonnet 4.5, $0.03-0.10 on GPT-4o. With prompt caching enabled on a stable system prompt, repeated tasks drop to $0.01-0.03. Volume of 100K tasks/month → $1K-20K range.
Common questions
Should I use a framework or roll my own?
For prototypes, frameworks (LangChain, CrewAI, AutoGen) are fast. For production, rolling your own loop on top of OpenAI-compatible primitives (chat.completions + tools) is easier to debug and cheaper to operate. VerticalAPI is the OpenAI-compatible layer underneath either approach.
How do I keep agent costs predictable?
Set a token budget per task in your code, log it via VerticalAPI's trace, and abort if exceeded. Use cheaper models (Haiku 4.5, Gemini Flash) for routing/classification steps, reserve Sonnet 4.5 for the hard planning step.
Can I build multi-agent systems?
Yes. Each sub-agent can use a different model — VerticalAPI is just a transport. Pattern: orchestrator on Sonnet 4.5, workers on Haiku or Llama 3.3 70B via Groq for speed.
Other use cases
I want to build a customer-facing chatbot
I want to build retrieval-augmented generation (RAG)
I want consistent tool-calling behavior across multiple LLM providers
I want to send images, audio or video to an LLM