Agent vs workflow — the distinction that saves your production system
In late 2024, Anthropic published an internal note that got widely quoted: "Don't build an agent when a workflow will do." It's still the single best advice in this field.
Here's the distinction:
- A workflow is a predetermined pipeline. The LLM runs at specific steps; the control flow is code. Predictable, debuggable, cheap.
- An agent is open-ended. The LLM decides what to do next, which tool to call, when to stop. Flexible, creative, expensive, occasionally wildly wrong.
| Dimension | Workflow | Agent |
|---|---|---|
| Control flow | Hardcoded by you | Decided by the LLM at runtime |
| Predictability | High — same input → same path | Low — may loop, may skip steps, may go wild |
| Debuggability | Inspect each step's I/O | Good luck |
| Cost per run | Known in advance | Unbounded |
| Best for | Known pipelines you can flowchart | Open-ended tasks that need improvisation |
| Production systems | ~90% of them | The rare exception |
Rule of thumb: if you can describe the task as a flowchart, build a workflow. If the task requires dynamic planning that you can't flowchart in advance, build an agent.
Almost every production LLM system is a workflow. Agents are the exception, not the default.
The three workflow patterns you'll actually use
Once you start building, you'll notice the same three shapes show up over and over. Learn them and you've got 80% of real-world LLM pipelines covered.
1. Prompt chaining
Split a big task into a sequence of small, reliable LLM calls. Each step's output feeds the next.
Cheaper and more reliable than asking one giant prompt to do everything. Easier to debug because you can inspect each intermediate step.
2. Routing
A first LLM call classifies the input, then routes it to a specialized second prompt.
Huge for customer support, help desks, email triage. You get specialized handling without juggling one monster prompt.
3. Parallelization
Fan out, aggregate.
Or: ask the same question 5 times, vote. Huge quality boost for anything where consistency matters — classification, evaluation, extraction.
Three LLM calls at 1000 tokens each usually beats one LLM call at 3000 tokens. The model is better at small focused jobs than one sprawling one.
A fourth pattern you'll outgrow fast: naive loops
Sometimes people write:
while not done:
response = llm(prompt + history)
done = check_if_done(response)
This is a dumb-but-sometimes-useful pattern. It shades into "agent" the moment the LLM itself decides whether
doneIf you find yourself writing one, ask: can I replace this with a workflow where the control flow is code, not LLM output? Usually yes.
Where workflows meet MCP
Here's why you just learned MCP first: workflow steps are often MCP tool calls.
A realistic production pipeline:
- User uploads a PDF (code step).
- LLM extracts entities from the PDF (prompt step).
- MCP — look up the entities in your database (tool step).
postgres.query - MCP — notify the account manager (tool step).
slack.send - LLM drafts a response email (prompt step).
- MCP — save the draft (tool step).
gmail.create_draft - Return success to the user (code step).
Steps 1, 7 are code. Steps 2, 5 are LLM prompts. Steps 3, 4, 6 are MCP tool calls. That is a workflow — a predefined sequence stitching together code + prompts + MCP calls.
What workflows are not
A few common mistakes to avoid:
- Not chains of thought. "Chain of thought" is a prompting technique inside a single LLM call. A workflow is multiple LLM calls.
- Not agents. Agents decide the control flow at runtime. Workflows have their control flow hardcoded.
- Not state machines in disguise. If your workflow has 40 states with complex transitions, you actually built a state machine. That's fine, but know what you have.
How to pick a framework
For Python, the main options in 2026:
- LangGraph — state machines plus LLM nodes, built by the LangChain folks. Great for workflows that include agents as sub-steps. You'll use this in the next lesson.
- LlamaIndex Workflows — event-driven workflows with a very clean API. Good for RAG-heavy pipelines.
- Pydantic AI — lightweight, typed, less orchestration-heavy.
- DSPy — research-flavored, optimizes prompts programmatically. Advanced.
For TypeScript: Vercel AI SDK has decent workflow-ish primitives; most teams roll their own or use Inngest for long-running jobs.
If you're just starting: pick LangGraph. It's the most flexible, has the biggest ecosystem, and you'll recognize its patterns in every other framework anyway.
Reliability checklist
Whatever framework you use, a production workflow needs:
- Retries with backoff — LLM APIs fail; tool APIs fail more.
- Timeouts — don't let a stuck call hang forever.
- Fallbacks — cheap model first, expensive model on failure.
- Observability — log every step's input/output; you'll need it.
- Idempotency — so retries don't double-send emails or double-charge cards.
- Human-in-the-loop gates — for anything destructive or high-stakes.
None of these are optional. The difference between a demo and a product is these six lines of plumbing.
What to take away
- Workflows = predefined pipelines; agents = dynamic control flow. Prefer workflows.
- Three workflow patterns cover most real systems: chaining, routing, parallelization.
- Workflows glue together code, prompts, and MCP tool calls.
- LangGraph is the default Python framework right now.
- Reliability is plumbing, not cleverness — retries, timeouts, observability.
Next: LangGraph: Stateful AI Workflows — pick up the framework and build one.