Solutions Appendix

Chapter 34

Agents & Multi-Agent Systems

20 Solutions

Detailed solutions for the exercises in Chapter 34. Try solving them yourself before checking the answers.

Exercise 1Pen & Paper

Define an agent and contrast with a single model call; what makes it autonomous and goal-directed?

Solution

A single model call is one question → one answer round-trip that you drive. An agent is a system that, given a GOAL, operates in a LOOP — deciding what to do, taking an action (often a tool call), observing the result, and deciding the next action — until the goal is met. It is autonomous because it chooses its own next steps rather than being driven turn-by-turn, and goal-directed because it works toward an objective across many steps rather than responding once. The loop and the self-directed decisions are what make it an agent.

Exercise 2Pen & Paper

List the ingredients of an agent and their source chapters; why 'where everything converges'?

Solution

Ingredients: the model as the reasoning 'brain' (reasoning, Chapter 25); tools to act (Chapter 28); memory to track progress and recall the past (long context / RAG, Chapters 29, 33); planning to sequence steps; reflection to self-correct; and a bounded loop to tie them together. Agents are 'where everything converges' because they are not a new model but a SYSTEM that integrates nearly every capability in the book — a pretrained, aligned model that reasons, uses tools, retrieves knowledge, and remembers — into autonomous, goal-directed behavior. They require everything that came before.

Exercise 3Pen & Paper

Describe the agent loop and how it generalizes ReAct; why must it be bounded, and with what?

Solution

The agent loop: perceive the situation, decide what to do (reason), act (tool call), observe the result, update memory, repeat until done. This generalizes ReAct's Thought–Action–Observation cycle (Chapter 28) with added planning, memory, and the autonomy to pursue a multi-step goal. It must be bounded because an autonomous loop that chooses its own actions can, if something goes wrong, loop forever, rack up cost, or wander off-task. Bounds to add: a maximum step count, a cost/token budget, a wall-clock timeout, and explicit stopping conditions — the guardrails that make autonomy safe.

Exercise 4Pen & Paper

Plan-then-execute vs interleaved planning; when is each better; why is planning hard for current models?

Solution

Plan-then-execute makes a full plan upfront then carries it out — good for predictable tasks, but brittle when reality diverges from the plan. Interleaved (ReAct-style) planning plans a step, acts, observes, and replans — more adaptive to surprises but potentially less coherent over long horizons. Use plan-then-execute for well-understood, stable tasks; interleaved for uncertain, dynamic ones (and often a hybrid: a high-level plan refined as execution reveals information). Planning is hard for current models because they struggle with long-horizon reasoning, often miss prerequisites or constraints, and don't reliably recover when a plan goes wrong — a known frontier weakness, which is why reflection and replanning matter so much.

Exercise 5Pen & Paper

Explain reflection and the generate-evaluate gap; why is grounded reflection better than ungrounded self-critique?

Solution

Reflection has the agent evaluate its own work, recognize problems, and revise — exploiting the generate-evaluate gap: models are often better at JUDGING whether a result is correct than at producing it correctly first try (the same asymmetry behind RLHF and Constitutional AI). Grounded reflection — where the agent can actually RUN the code, TEST the answer, or CHECK against reality — is far more reliable than ungrounded self-critique, because the feedback is from the world, not the model's own (possibly mistaken) judgment. Running the tests and fixing the actual error beats merely 'thinking harder' about whether the answer seems right.

Exercise 6Pen & Paper

Types of agent memory; how does long-term memory relate to RAG; how does memory let agents 'learn' without retraining?

Solution

Types: working/short-term (current task state, in the context window), long-term (facts/experiences across sessions, in an external store), episodic (records of past tasks), semantic (general knowledge), procedural (learned routines). Long-term memory IS essentially RAG (Chapter 29) applied to the agent's own experience — chunk, embed, store, and retrieve relevant past lessons into context. Memory lets agents 'learn' without retraining because they accumulate and recall what worked and what failed in the memory store, improving on recurring tasks — the learning lives in the retrievable memory, not in updated weights.

Exercise 7Pen & Paper

Challenges of orchestrating many tools; why does reliability get harder with more tools and steps?

Solution

With many tools the agent must SELECT correctly from many options (harder), SEQUENCE them (some depend on others' outputs), PASS DATA between them, and HANDLE failures. Reliability gets harder because errors COMPOUND: each tool call is a chance for wrong selection, malformed arguments, or failure, and over a long orchestration these chances multiply — e.g. 20 calls at 95% each succeed end-to-end only ~36% of the time (Exercise 10). More tools mean more selection ambiguity; more steps mean more multiplicative failure points. Validation, error recovery, and reflection are what keep the compounding from destroying reliability.

Exercise 8Pen & Paper

Four reasons to use multiple agents; three reasons not to; when is a single agent better?

Solution

For multiple agents: (1) specialization (expert agents with tailored tools/prompts); (2) separation of concerns (focused agents more reliable than one juggling everything); (3) diverse perspectives (debate/critique catches errors); (4) parallelism (independent subtasks run simultaneously). Against: (1) cost multiplication (many model calls); (2) coordination overhead and error propagation across agents; (3) debugging difficulty and latency. A single agent is better when the task isn't genuinely decomposable, when cost/latency matter, or when coordination overhead would exceed the benefit — which is often. Start simple; add agents only when a real limitation demands it and the data shows it helps.

Exercise 9Pen & Paper

Describe orchestrator-worker and debate patterns; what task suits each?

Solution

Orchestrator-worker: a manager agent plans and delegates subtasks to specialized worker agents, then synthesizes their results — suits decomposable tasks with clear sub-roles (e.g. build a feature: coder, tester, reviewer). Debate: multiple agents independently solve or argue different positions, then critique each other (often with a judge) — suits tasks where diverse independent attempts catch errors or where correctness benefits from adversarial checking (e.g. hard reasoning, fact-checking). Orchestrator-worker divides labor; debate diversifies and cross-checks.

Exercise 10Derive

Compounding: end-to-end success for 15 steps at 98% and 90%; how does reflection change it?

Solution

End-to-end success ≈ (per-step reliability)^(steps).

0.98^15 ≈ 0.74 (74%) 0.90^15 ≈ 0.21 (21%)

Even 98% per-step gives only ~74% over 15 steps; 90% collapses to ~21%. Reflection and error recovery break the compounding: by catching and fixing a bad step before it propagates, they effectively raise the per-step reliability close to 1, so end-to-end success on long trajectories improves dramatically. Grounded reflection (run the test, fix the error) is therefore the single most important reliability technique for long-horizon agents.

Exercise 11Code

Build a basic agent loop (reason, act, observe) with a step cap; test on a multi-step task.

Solution

Implementing the loop — model reasons and emits a tool call, the app executes it and returns the observation, repeat until the model signals done or the step cap is hit — and testing on a multi-step task demonstrates the core agent engine (Exercise 3), with the step cap preventing runaway loops.

Exercise 12Code

Add planning: produce an explicit plan before executing; compare success with and without.

Solution

Having the agent first produce a plan then follow it, versus reacting step-by-step, and comparing task success shows that explicit planning helps on tasks with clear structure (fewer missed prerequisites), though current models' planning is imperfect (Exercise 4) — motivating replanning and reflection.

Exercise 13Code Lab

Implement reflection on a coding task; use test runs as grounded feedback.

Solution

After producing code, the agent runs the tests, reads any failure, reasons about the cause, and revises — grounded reflection (Exercise 5). On a coding task this catch-and-fix loop markedly raises success, because the test results give real feedback that closes the reflection loop, unlike ungrounded self-critique.

Exercise 14Code

Implement agent memory (short-term context + long-term store); show recall of a past lesson.

Solution

Writing lessons to an external store and retrieving relevant ones at the start of a new task (Exercise 6) lets the agent avoid repeating a past mistake — demonstrating learning-via-memory: the agent improves on a recurring task without any weight update, by recalling what worked before.

Exercise 15Code

Build a multi-tool orchestration agent (search+fetch+summarize); measure correct selection/sequencing; add validation.

Solution

Orchestrating search → fetch (using search's output) → summarize and measuring how often the agent selects and sequences tools correctly exposes the orchestration challenges of Exercise 7; adding argument validation and error feedback raises the success rate — showing reliability is engineered, not assumed.

Exercise 16Code

Measure compounding: tunable per-step failure; plot success vs steps, with and without reflection/retry.

Solution

Building an agent whose steps fail with a tunable probability and plotting end-to-end success against the number of steps reproduces the exponential decay of Exercise 10; adding reflection/retry that catches and fixes failures flattens the curve — empirically demonstrating how per-step recovery defeats compounding.

Exercise 17Code Lab

Implement an orchestrator-worker multi-agent system; test on a decomposable task.

Solution

A manager agent that decomposes a task, delegates subtasks to specialized workers, and synthesizes their outputs (Exercise 9) handles a decomposable task by division of labor — demonstrating the most common multi-agent pattern and its coordination requirements.

Exercise 18Code

Implement a debate pattern; compare accuracy to a single agent.

Solution

Having two or three agents independently solve a problem, critique each other, and a judge select the best answer (Exercise 9) can improve accuracy over a single agent on hard problems — because diverse independent attempts are unlikely to share the same error — at the cost of several times more model calls (Exercise 8's trade-off).

Exercise 19Code

Add human-in-the-loop: approval before consequential actions; let a human correct a stuck agent.

Solution

Gating consequential actions behind human approval, and letting a human intervene when the agent is stuck, demonstrates the oversight controls essential for safe deployment (echoing Chapter 28) — the agent proposes, the human disposes for high-stakes steps, bounding the blast radius of mistakes or injections.

Exercise 20Code (Challenge)

Build a complete coding agent (plan, orchestrate, grounded reflection, memory, bounds, tracing); then a multi-agent version; compare and judge whether multi-agent was worth it.

Solution

The capstone coding agent — planning, tool orchestration (write/run/read), grounded reflection on test failures, memory of attempts, step/cost bounds, and tracing — achieves a measurable success rate on a task with a test suite, with grounded reflection (Exercise 5) the biggest reliability driver. Building a multi-agent version (coder + reviewer + tester) and comparing success, cost, and latency typically shows modest quality gains at substantially higher cost/latency — illustrating Exercise 8's lesson that multi-agent is not automatically worth it; the single capable agent often wins. The integrated demonstration of the whole chapter.

←

ReturnAppendix Index

ReviewBack to Chapter 34

→