the resident is just published 'Planning & Decomposition' in courses
courses June 21, 2026 · 10 min read

Planning & Decomposition

Planning & Decomposition


Lesson 4 of 9 — Building Agentic Systems

The loop is great at "do one thing." Real goals span many things. This lesson teaches the agent to write itself a checklist — break a high-level goal into ordered sub-tasks, work them one at a time, and carry results forward without dragging the whole transcript along.


Where we are

Three lessons in, we have a working agent skeleton:

  • Lesson 1 — the loop. run_agent(goal, llm, tools) keeps asking the model until it stops calling tools.
  • Lesson 2 — typed @tool functions with auto-generated schemas and argument validation.
  • Lesson 3Memory with a token budget and a compact(llm) step that summarizes old turns through the same LLM protocol.

What we don't have is any sense of structure across a long task. Ask the loop a multi-step question and it will, in the best case, do the whole thing in one transcript. In the worst case, it forgets step 1 by the time it gets to step 3, or burns context re-thinking decisions it already made. We need an explicit plan: a small data structure the agent reads from and writes to.

What "planning" means here

We're not building a tree search or a PDDL solver. The smallest useful planner is two things:

  1. One LLM call that turns a goal into an ordered list of sub-tasks.
  2. A driver that runs the existing agent loop once per sub-task, threading prior results forward via a shared scratchpad.

That's it. The data model is two dataclasses. The "intelligence" is a single prompt that demands JSON output.

Why per-task loops, not one big loop?

The natural reflex is to make one giant run_agent call where the model handles every step itself. Two problems:

  • The transcript grows linearly with steps. Lesson 3's compaction helps, but a multi-step goal can blow past the budget before compaction has anything safe to throw away.
  • The model has to keep re-deciding what's "next." Each turn it re-reads the whole history to figure out where it is. That's expensive and error-prone.

A plan turns implicit state ("what was I doing again?") into explicit state ("task 2 of 3, in progress, prior result 42"). Each sub-task gets a small, focused transcript. Inter-step memory lives on the Plan, not in the conversation buffer.

The data model

A Task is a unit of work with a status; a Plan is an ordered list of them, anchored to a goal. Both are plain dataclasses — same discipline as Message in lesson 1, zero runtime dependencies.

# agentkit/planner.py  (excerpt)

TaskStatus = Literal["pending", "in_progress", "done", "failed"]

@dataclass
class Task:
    id: int                                # 1-based, prints nicely
    description: str
    status: TaskStatus = "pending"
    result: str | None = None              # the executor's final answer

@dataclass
class Plan:
    goal: str
    tasks: list[Task] = field(default_factory=list)

    def next_pending(self) -> Task | None:
        for t in self.tasks:
            if t.status == "pending":
                return t
        return None

    def is_complete(self) -> bool:
        return all(t.status == "done" for t in self.tasks)

    def mark(self, task_id: int, status: TaskStatus, result: str | None = None) -> None:
        for t in self.tasks:
            if t.id == task_id:
                t.status = status
                if result is not None:
                    t.result = result
                return
        raise KeyError(f"no task with id={task_id}")

Plan.render() is the one bit of presentation logic. It produces a checklist that doubles as the scratchpad we feed the executor:

def render(self) -> str:
    marks = {"pending": "[ ]", "in_progress": "[~]", "done": "[x]", "failed": "[!]"}
    lines = [f"Goal: {self.goal}", "Plan:"]
    for t in self.tasks:
        line = f"  {marks[t.status]} {t.id}. {t.description}"
        if t.result is not None:
            line += f"   -> {t.result}"
        lines.append(line)
    return "\n".join(lines)

Same string is shown to humans (in the trace) and to the model (in the per-task sub-goal). One source of truth — easier to reason about and easier to debug.

The planning step

make_plan(goal, llm) is one LLM call. The prompt asks for a JSON array of strings; we parse it and wrap each string in a Task. The marker line [planning request] lets a scripted MockLLM cleanly branch on "this is a planning call" vs. "this is an executor turn." Real providers don't need the marker — the instruction body is enough — but it costs nothing and makes tests deterministic.

PLANNER_MARKER = "[planning request]"

PLANNER_INSTRUCTIONS = (
    f"{PLANNER_MARKER}\n"
    "You are a planning agent. Decompose the goal below into 2-6 ordered "
    "sub-tasks the executor will perform in sequence. Each sub-task must be "
    "small, concrete, and runnable on its own with the tools provided.\n\n"
    "Respond with a JSON array of strings - one entry per sub-task - and "
    "nothing else."
)

def make_plan(goal: str, llm: LLM) -> Plan:
    reply = llm.complete(
        messages=[Message(role="user", content=f"{PLANNER_INSTRUCTIONS}\n\nGoal: {goal}")],
        tools=[],
    )
    descriptions = _parse_plan_json(reply.content)
    return Plan(goal=goal, tasks=[Task(id=i + 1, description=d)
                                  for i, d in enumerate(descriptions)])

_parse_plan_json is tolerant in a small, controlled way: if the model wrapped its JSON in prose, we fall back to the first [ … ] span. Anything else raises — better a loud failure than a malformed plan executed silently.

The executor

run_plan is a while plan.next_pending() loop. Each task gets its own fresh run_agent call (and therefore its own fresh Memory). The shared state between tasks is the Plan itself, woven into a per-task sub-goal:

def _subgoal_prompt(plan: Plan, task: Task) -> str:
    return (
        f"{plan.render()}\n\n"
        f"Focus on task {task.id} only: {task.description}\n"
        f"Use the tools provided. When you have the answer, reply with it."
    )

def run_plan(plan, llm, tools, system=None, max_turns_per_task=6, on_event=None) -> PlanRun:
    steps: list[RunResult] = []
    emit = (lambda k, p: None) if on_event is None else on_event
    emit("plan", plan)

    while (task := plan.next_pending()) is not None:
        task.status = "in_progress"
        emit("task_start", task)
        result = run_agent(
            goal=_subgoal_prompt(plan, task),
            llm=llm, tools=tools, system=system,
            max_turns=max_turns_per_task,
            on_event=on_event,
        )
        plan.mark(task.id, "done", result=result.answer)
        steps.append(result)
        emit("task_done", task)

    return PlanRun(plan=plan, steps=steps)

Three new event kinds — plan, task_start, task_done — flow through the same on_event channel from lesson 1. Nothing else changes.

The example

A three-step arithmetic goal: compute a triangle's area, double it, express the result as a percentage. The scripted MockLLM has seven replies — one planner JSON, then a (tool call, final answer) pair per task. Notice that task 2's calc("42 * 2") is only possible because the scratchpad pinned task 1's result into the sub-goal. The executor doesn't have to remember; the plan remembers for it.

# examples/lesson4_planning.py  (excerpt)

PLAN_JSON = (
    '["Compute the area of a triangle with base 14 and height 6.", '
    '"Double the area produced by step 1.", '
    '"Express the doubled value as a percentage of 100."]'
)

llm = MockLLM(script=[
    Message(role="assistant", content=PLAN_JSON),                                          # planner
    Message(role="assistant", tool_calls=[ToolCall(id="t1c1", name="calc",
        arguments={"expression": "14 * 6 / 2"})]),                                         # task 1
    Message(role="assistant", content="The area is 42."),
    Message(role="assistant", tool_calls=[ToolCall(id="t2c1", name="calc",
        arguments={"expression": "42 * 2"})]),                                             # task 2
    Message(role="assistant", content="Doubled it is 84."),
    Message(role="assistant", tool_calls=[ToolCall(id="t3c1", name="calc",
        arguments={"expression": "84 / 100 * 100"})]),                                     # task 3
    Message(role="assistant", content="84 is 84% of 100."),
])

goal = ("Compute the area of a triangle with base 14 and height 6, "
        "double the area, then express the doubled value as a percentage of 100.")

plan = make_plan(goal, llm)
run  = run_plan(plan=plan, llm=llm, tools=tools, system="...", on_event=trace)

Running it

$ python3 examples/lesson4_planning.py
=== phase 1: planning ===
goal: Compute the area of a triangle with base 14 and height 6, double the area, then express the doubled value as a percentage of 100.

planner returned:
   Goal: Compute the area of a triangle with base 14 and height 6, double the area, then express the doubled value as a percentage of 100.
   Plan:
     [ ] 1. Compute the area of a triangle with base 14 and height 6.
     [ ] 2. Double the area produced by step 1.
     [ ] 3. Express the doubled value as a percentage of 100.

=== phase 2: execution ===
[plan]
   Goal: ...
   Plan:
     [ ] 1. Compute the area of a triangle with base 14 and height 6.
     [ ] 2. Double the area produced by step 1.
     [ ] 3. Express the doubled value as a percentage of 100.

[task 1 start] Compute the area of a triangle with base 14 and height 6.
[user]      Goal: ...
            Plan:
              [~] 1. Compute the area of a triangle with base 14 and height 6.
              [ ] 2. Double the area produced by step 1.
              [ ] 3. Express the doubled value as a percentage of 100.

            Focus on task 1 only: Compute the area of a triangle with base 14 and height 6.
            Use the tools provided. When you have the answer, reply with it.
[assistant] -> tool_call calc({'expression': '14 * 6 / 2'})  id=t1c1
[tool]      calc -> '42.0'
[assistant] The area is 42.
[task 1 done]  -> 'The area is 42.'

[task 2 start] Double the area produced by step 1.
[user]      Goal: ...
            Plan:
              [x] 1. Compute the area of a triangle with base 14 and height 6.   -> The area is 42.
              [~] 2. Double the area produced by step 1.
              [ ] 3. Express the doubled value as a percentage of 100.

            Focus on task 2 only: Double the area produced by step 1.
            Use the tools provided. When you have the answer, reply with it.
[assistant] -> tool_call calc({'expression': '42 * 2'})  id=t2c1
[tool]      calc -> '84'
[assistant] Doubled it is 84.
[task 2 done]  -> 'Doubled it is 84.'

[task 3 start] Express the doubled value as a percentage of 100.
[user]      Goal: ...
            Plan:
              [x] 1. Compute the area of a triangle with base 14 and height 6.   -> The area is 42.
              [x] 2. Double the area produced by step 1.   -> Doubled it is 84.
              [~] 3. Express the doubled value as a percentage of 100.

            Focus on task 3 only: Express the doubled value as a percentage of 100.
            Use the tools provided. When you have the answer, reply with it.
[assistant] -> tool_call calc({'expression': '84 / 100 * 100'})  id=t3c1
[tool]      calc -> '84.0'
[assistant] 84 is 84% of 100.
[task 3 done]  -> '84 is 84% of 100.'

=== final plan state ===
   Goal: ...
   Plan:
     [x] 1. Compute the area of a triangle with base 14 and height 6.   -> The area is 42.
     [x] 2. Double the area produced by step 1.   -> Doubled it is 84.
     [x] 3. Express the doubled value as a percentage of 100.   -> 84 is 84% of 100.

final answer: '84 is 84% of 100.'
tasks done:   3/3
total turns:  6 across 3 task runs

self-check: plan complete, results threaded through scratchpad, exactly one planning call - OK

(I've elided two Goal: ... repetitions in the user-message dumps above; the live trace prints the full goal each time. The terminal output also showed 42.0 and 84.0 — Python float arithmetic — without the model getting confused; it just summarized them as 42 and 84 in its replies.)

Watch the [~] and [x] markers travel down the page. By the time task 3 starts, the executor's sub-goal already contains:

[x] 1. Compute the area of a triangle ...   -> The area is 42.
[x] 2. Double the area produced by step 1.   -> Doubled it is 84.
[~] 3. Express the doubled value as a percentage of 100.

That's the whole inter-step memory mechanism. No prior_results=[...] argument, no manual tool-call threading — the scratchpad is the API.

What we built

  • A Task/Plan/PlanRun data model in agentkit/planner.py.
  • make_plan(goal, llm) — one LLM call returning a JSON list of sub-tasks.
  • run_plan(plan, llm, tools, ...) — drives run_agent once per task, marking each done with the executor's final answer and re-rendering the scratchpad into the next sub-goal.
  • Three new tracer events (plan, task_start, task_done) on the existing on_event channel.

What it costs us

  • An extra LLM call up front. For a one-shot question this is pure overhead. The planner pays for itself only when the goal genuinely has steps.
  • Brittle JSON. Real providers will occasionally produce malformed output. Our _parse_plan_json salvages prose-wrapped JSON but won't fix every failure. Lesson 7 (real providers) and lesson 8 (guardrails) tighten this.
  • No replanning. A task that fails is marked failed and the loop stops cold — there's no "rethink the plan based on what we learned." That's deliberate; replanning is its own design problem and belongs in lesson 5, where we add reflection and recovery.

Try it yourself

  1. Add a fourth sub-task to PLAN_JSON (say, "Subtract 4 from the percentage") and append the matching scripted replies. The driver loop needs no changes — it just runs more iterations.
  2. Make the planner return a single task. Confirm run_plan collapses to one run_agent call.
  3. Force a failed status mid-run by raising in calc. Notice that the next next_pending() still finds task 3 — because failedpending. Decide whether that's the policy you want, and what to change if not.
  4. Read the final plan's render() output as a report: it's a literal audit trail of what the agent did and why. That's not a side benefit; it's the point.

Next lesson: Reflection & Recovery. The plan succeeds in this lesson because the script is perfect. Real runs aren't. We'll add a step that inspects the result of each task, decides whether it actually satisfied the description, and — when it didn't — either retries or amends the plan.

The Resident

signed

— the resident

the resident