By the end of section 4 you can build a workflow: a fixed sequence of steps that include LLM calls. By the end of section 5 you can measure quality. The last section of this chapter is about when one call is enough, when a workflow is enough, and when you actually need a real agent.
Three architectures
Single call
input → LLM → output
Stateless. One step. Right when the task fits in one shot:
- Classify a ticket
- Summarize an email
- Translate a string
- Extract structured data from text
Most v0 products are here, and most should stay here longer than they think. A single call traces clean, evaluates simple, and fails predictably.
Workflow
input → step 1 → step 2 → step 3 → ... → output
A fixed sequence of steps, with LLM calls at some of them. Order is decided by you, in code. Right when the task has known sub-tasks:
- Look up the customer's order, then write a reply
- Retrieve from KB, generate, verify, escalate if low confidence
- Parse the input, classify, route, respond
Most production AI products that look like "agents" are workflows. Workflows are easier to debug because the steps are predictable.
Agent
input → LLM decides next action → LLM acts → LLM decides → ... → done
An LLM-driven loop where the model decides its own next action each turn until the task is complete. Right when the next step is genuinely a function of the model's reasoning, not the request shape:
- Research a prospect across the web and write a personalized outreach
- Debug a failing test by reading code, running diagnostics, and proposing fixes
- Plan a multi-day travel itinerary by checking flights, hotels, and weather
Agents are powerful and harder to bound, eval, and trace. The decision rule:
Do not reach for an agent if a workflow does the job.
Most products do not need true agents. The honest order is: try a single call first. If that is not enough, try a workflow. If that is not enough, try an agent. Most teams skip directly to step three because "agent" sounds modern. They regret it.
What is tool use
Tool use (also called function calling) is what lets an agent do things in the real world. The flow:
- The LLM is given a list of available tools (functions) with their parameter schemas.
- During generation, the LLM can output a "tool call" instead of a regular response: "call
search_kbwithquery='refund policy'." - The surrounding code runs the function and feeds the result back into the conversation.
- The LLM continues, maybe calling more tools, until it produces a final answer.
Tool use is the bridge between a chatty model and a system that actually does things.
A tool definition (OpenAI's function-calling format):
tools = [
{
"type": "function",
"function": {
"name": "search_kb",
"description": "Search the help center for relevant articles.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
},
"required": ["query"],
},
},
},
]The model decides when to call which tool based on the user's request. Your code runs the tool. The result feeds back as a tool message in the next call.
Building a small agent
Here is a tiny agent in Python with one tool. The Respan tracing decorators make the structure visible: a @workflow is the request boundary, an @agent is the LLM-decision boundary, a @tool is an external function call, and a @task is a deterministic step.
from respan import Respan
from respan.decorators import workflow, task, agent, tool
from openai import OpenAI
Respan()
client = OpenAI()
@tool(name="search_kb")
def search_kb(query: str):
# retrieval over your help center (vector search, etc.)
return [{"title": "Refund policy", "body": "Returns within 30 days..."}]
@task(name="think")
def think(prompt: str):
completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
return completion.choices[0].message.content
@agent(name="support_agent")
def support_agent(customer_question: str):
context = search_kb(customer_question)
return think(f"Context: {context}\n\nQuestion: {customer_question}")
@workflow(name="handle_ticket")
def handle_ticket(ticket: str):
return support_agent(ticket)
print(handle_ticket("How long do I have to return a product?"))This is technically still a workflow (the order is fixed: search, then think). A true agent would let the model decide whether to search again, ask a follow-up question, or call another tool, in a loop. The decorators stay the same; the code inside the agent function gets a loop and a tool-call dispatcher.
When to add a framework
LangChain, LlamaIndex, OpenAI Agents SDK, CrewAI, AutoGen, and similar frameworks provide primitives that help with agent loops, memory, tool routing, and multi-agent coordination.
The honest rule: try plain Python first. Add a framework only when you hit the second abstraction pain point (e.g., you want to add automatic retry, branching, or memory and writing it yourself feels redundant).
A LangChain wrapper around what should be a 200-line workflow is the most common over-engineering pattern in the industry. It hides the prompt, makes debugging harder, and adds dependencies you do not control. Chapter 2 covers framework choice in more depth.
Bounding agent behavior
Once you let the LLM decide its own actions, you need guardrails. Common ones:
- Action authorization: the LLM does not authorize refunds; deterministic code checks if the proposed refund is within policy and rejects out-of-policy actions.
- Bounded authority: the agent can authorize refunds up to $X without escalation. Above $X, it must escalate.
- Iteration cap: the loop stops after N steps to avoid runaway loops.
- Eval-driven safety: an evaluator scores every output before it goes to the customer; failed outputs are blocked or re-generated.
These are architectural patterns, not framework features. They sit at the application layer regardless of which agent framework you use.
What you have at the end of Chapter 1
Putting all six sections together:
- Every LLM call routes through a gateway with logging, fallbacks, and caching (1.2)
- Every prompt is a versioned artifact in a registry (1.3)
- Every request produces a trace tree you can replay later (1.4)
- Every change in quality is caught by an evaluator before it ships (1.5)
- Every architectural decision (single call, workflow, or agent) is grounded in what tracing and evals tell you ([1.6, this section])
The team that has all five layers operates production AI. The team that has none operates a v0 that is one bad week from a public reversal.
Next: choosing the right stack
Chapter 2: Choose the Right Stack covers the tools you might layer on top of these six sections (RAG, agent frameworks, fine-tuning, memory, web search) and when each is actually worth adding.
Chapter 3: Build a Customer Support Agent is the worked example that uses every layer in this chapter on one real product.
Or back to the Chapter 1 hub.
