You’ve built a chatbot. It answers questions. It summarizes documents. It’s useful — but it’s passive. It waits for input, generates text, and stops. It doesn’t do anything.
Now imagine something different: you tell an AI “find all products expiring within 48 hours, check their stock levels, and create markdown recommendations for anything with more than 10 units remaining.” And it does it. Not by generating a paragraph about how you could do it — but by actually querying your systems, reasoning about the results, and taking action.
That’s an AI agent. And it’s the most important shift happening in how we build with LLMs.
Related: If you’re new to LLMs, start with my Simple Guide to LLMs. If you’ve already built a basic LLM app, this is your next step.
What Is an AI Agent?
An AI agent is an LLM that can reason, plan, and act — not just generate text.
The simplest way to understand it: a chatbot is a calculator. You give it input, it gives you output. An agent is an employee. You give it a goal, and it figures out the steps, uses the tools available, and delivers the result.
Here’s the key difference:
| Chatbot | Agent | |
|---|---|---|
| Input | A question or prompt | A goal or task |
| Process | Generate a response | Reason → Plan → Act → Observe → Repeat |
| Output | Text | Text + actions (API calls, database writes, file creation) |
| Memory | Stateless (or limited context) | Maintains state across steps |
| Tools | None — just the model | Functions, APIs, databases, other agents |
A chatbot answers “what products are expiring soon?” with a helpful paragraph. An agent answers by querying your product database, checking stock levels, cross-referencing promotion schedules, and returning a prioritized list with recommended actions.
Why Agents Matter Now
Three things converged to make agents practical:
1. Function Calling / Tool Use
Modern LLMs (GPT-4, Claude, Gemini) can be given a list of available tools — functions with typed parameters — and decide when and how to call them. This isn’t prompt hacking. It’s a native capability:
{
"tools": [
{
"name": "get_stock",
"description": "Get current stock level for a product in a store",
"parameters": {
"product_id": { "type": "integer" },
"store_number": { "type": "integer" }
}
},
{
"name": "get_price",
"description": "Get current selling price for a product",
"parameters": {
"product_id": { "type": "integer" }
}
}
]
}
The LLM sees these tools, understands what they do from the descriptions, and decides which to call based on the user’s request. You write the tool implementations. The LLM decides the orchestration.
2. Reasoning Models
Models have gotten significantly better at multi-step reasoning. They can break a complex goal into sub-tasks, execute them in order, handle failures, and adjust their plan based on intermediate results. This is the difference between “answer this question” and “solve this problem.”
3. Context Windows
With context windows now reaching 100K-1M+ tokens, agents can hold an entire conversation history, tool results, and intermediate reasoning in a single session. They don’t forget what they were doing halfway through.
How Agents Work: The Reasoning Loop
Every agent follows the same core loop, regardless of framework:

This is sometimes called the ReAct pattern (Reason + Act). Let’s see it in action.
A Concrete Example: The Markdown Agent
A store team leader asks: “What should I mark down today?”
Here’s what the agent does internally:
Step 1 — Reason: “I need to find products that are expiring soon and have high stock. Let me check expiry dates first.”
Step 2 — Act: Calls get_expiring_products(store_number=1234, days_ahead=48)
Step 3 — Observe: Returns 14 products with expiry dates within 48 hours.
Step 4 — Reason: “Now I need stock levels for these 14 products to see which ones actually need markdowns.”
Step 5 — Act: Calls get_stock(product_id, for each of the 14 products.store_number)
Step 6 — Observe: 6 products have stock > 10 units. 8 products have stock ≤ 10 (will likely sell through).
Step 7 — Reason: “For the 6 high-stock products, I should check if any are already on promotion — no point marking down a product that’s already in bonus.”
Step 8 — Act: Calls get_promotions( for the 6 products.)product_id
Step 9 — Observe: 2 are already on promotion. 4 remain.
Step 10 — Reason: “I have 4 products that need markdowns. I should recommend 25% or 35% based on stock levels and shelf position.”
Step 11 — Return result:
“4 products need markdowns today:
- Yoghurt (spar 45678): 15 units, aisle 3 (high traffic) → 25%
- Salad mix (spar 45679): 22 units, aisle 7 (low traffic) → 35%
- …”
The team leader asked one question. The agent made 20+ API calls, reasoned about the results, and delivered an actionable recommendation. No bespoke endpoint. No pre-built dashboard. The agent composed the answer from the tools available.
The Agent Stack: What You Need to Build One
Layer 1: The Brain (LLM)
The LLM does the reasoning. You need a model that supports tool use / function calling:
- Claude (Anthropic) — strong at multi-step reasoning, large context
- GPT-4 (OpenAI) — mature function calling, wide tool ecosystem
- Gemini (Google) — large context window, multimodal
Layer 2: The Tools
Tools are functions the agent can call. Each tool has:
- A name (what it’s called)
- A description (what it does — the LLM reads this to decide when to use it)
- Parameters (typed inputs)
- An implementation (your code that actually executes the action)
tools = [
{
"name": "get_stock",
"description": "Returns current stock quantity for a product in a specific store",
"parameters": {
"type": "object",
"properties": {
"product_id": {"type": "integer", "description": "Product identifier"},
"store_number": {"type": "string", "description": "Store identifier"}
},
"required": ["product_id", "store_number"]
}
}
]
The quality of your tool descriptions matters enormously. The LLM decides which tool to use based on the description — not the implementation. Think of it as writing documentation for a new team member.
Layer 3: The Orchestration Loop
Something needs to manage the reason-act-observe cycle. You can build this yourself or use a framework:
- Build your own: A simple
whileloop that sends messages to the LLM, checks for tool calls, executes them, and feeds results back. 50-100 lines of code. - LangChain / LangGraph: Full framework with built-in agent patterns, memory, and tool management.
- Claude Agent SDK / OpenAI Agents SDK: Provider-specific SDKs optimized for their models.
- CrewAI / AutoGen: Multi-agent frameworks where multiple agents collaborate.
For your first agent, build your own loop. It’s simpler than you think:
messages = [{"role": "user", "content": user_goal}]
while True:
response = llm.chat(messages=messages, tools=tools)
if response.has_tool_calls():
for tool_call in response.tool_calls:
result = execute_tool(tool_call.name, tool_call.arguments)
messages.append({"role": "tool", "content": result})
else:
# No more tool calls — agent is done
return response.content
That’s the entire agent loop. Everything else is details.
Layer 4: Memory and State
For simple tasks, the conversation history is enough. For longer-running agents, you need:
- Short-term memory: The current conversation / task context (managed by the context window)
- Long-term memory: Facts learned across sessions (stored in a database or vector store)
- Working memory: Intermediate results from tool calls that inform the next step
Agents vs. Chains vs. Pipelines
These terms get confused. Here’s the distinction:
| Pattern | Control Flow | Example |
|---|---|---|
| Pipeline | Fixed sequence: step A → step B → step C | RAG: embed query → retrieve docs → generate answer |
| Chain | Fixed sequence with conditional branches | If sentiment is negative → escalate, else → auto-reply |
| Agent | Dynamic — LLM decides the next step at runtime | “Figure out why this shelf is empty” (unknown steps) |
Pipelines and chains are predetermined. Agents are emergent — the LLM decides what to do based on what it observes.
Use a pipeline when you know the steps in advance. Use an agent when you don’t.
Common Pitfalls
1. Too Many Tools
If you give an agent 50 tools, it struggles to pick the right one. Start with 5-10 well-described tools. You can always add more.
2. Vague Tool Descriptions
“Gets data” is useless. “Returns the current stock quantity for a specific product in a specific store, including last restock date” is actionable. The LLM reads these descriptions to make decisions — treat them like API documentation.
3. No Guardrails
An agent with a delete_product tool and no confirmation step is a disaster waiting to happen. Always separate read tools from write tools. Add confirmation steps for destructive actions. Limit the agent’s scope.
4. Unbounded Loops
Without a maximum step count, an agent can loop forever if it gets confused. Always set a limit: “if you haven’t solved this in 15 steps, stop and explain where you’re stuck.”
5. Ignoring Cost
Each iteration of the loop is an LLM API call. An agent that takes 20 steps to answer a question costs 20x what a single prompt costs. Monitor token usage and optimize tool descriptions to reduce unnecessary steps.
When to Use Agents (and When Not To)
Use an agent when:
- The number of steps isn’t known in advance
- The task requires reasoning about intermediate results
- Different inputs might require different tool combinations
- You want to handle novel questions without building new endpoints
Don’t use an agent when:
- The workflow is fixed and predictable (use a pipeline)
- Latency matters more than flexibility (agents are slower)
- The task is simple enough for a single prompt
- You can’t tolerate non-deterministic behavior
Final Thoughts: Agents Are the Interface Layer
The most important mental model shift: agents are not a replacement for your backend. They’re an interface layer on top of it.
Your APIs, databases, and services stay the same. The agent is the layer that understands natural language, reasons about which services to call, and orchestrates the workflow dynamically.
This means the quality of your agent depends directly on the quality of your tools — which depends on the quality of your APIs and how well they’re described.
And that leads to an interesting question: what if your entire API surface was already typed, self-describing, and machine-readable? What if the agent didn’t need hand-written tool descriptions at all?
That’s where the semantic layer comes in — and that’s the next post.
This is part of a series on building with AI. Previously: How to Integrate AI into Existing Applications. Next: Graph-as-Semantic-Layer: Why Your API Architecture Is Your AI Strategy.

Leave a Reply