The Difference Between a Chatbot and an AI Agent

A useful working definition of "AI agent" — and why the distinction matters when you're picking what to pay for.

The word "agent" got loose in 2024 and has been printing on every landing page since. Most things called agents are not agents. Most things called chatbots are also not chatbots.

Here's the clean way to tell them apart, and why it matters when you're shopping.

The split, in one line

A chatbot answers. An agent acts.

A chatbot turns a question into a response. An agent turns an objective into a sequence of actions in the world — calling tools, editing files, sending requests, navigating pages — and reports back on what happened.

That's the only line that matters. Everything else is decoration.

What follows from the split

Once you frame it that way, the differences fall out of the definition:

Tool use. A chatbot doesn't need tools. An agent's whole point is calling them.
Multi-step planning. A chatbot can handle a turn. An agent has to chain decisions over time.
State and memory. A chatbot is mostly stateless. An agent has to keep track of what it already did so it doesn't loop.
Failure surface. A chatbot can be wrong. An agent can be wrong AND change something. The blast radius is bigger.
Evaluation. A chatbot is judged on response quality. An agent is judged on outcome quality plus side effects.

That last one is the one buyers underestimate. An agent that produces a good answer but also accidentally deleted a database row is not a good agent. The grader has to look at the world it touched, not just the words it wrote.

Why this matters when you're picking what to pay for

A lot of "AI agent" SKUs are chatbots dressed up. They take a question, return text, and call it agentic because the text contains a plan. That's a chatbot.

When you evaluate, ask:

Does it actually call tools, or does it just describe what it would do?
Does it carry state across steps, or does each turn start cold?
Can I see, audit, and approve every action it took?
What happens when a step fails — does it retry, ask, or hide it?

If those questions don't have crisp answers, the product is closer to a chatbot than its marketing claims.

The framing we use here

Every score on AgentVerdict is grounded in tasks where the agent actually has to do something — fix a bug, navigate a site, fill a form, build a workflow. We don't score conversational quality. We score outcomes and what got broken on the way.

That's what makes a verdict worth anything.