courses June 28, 2026 · 11 min read

Lesson 5 — Multi-Agent Orchestration

A parent that does two things: split the goal, fold the answers. The children do everything else — through the same loop we wrote in lesson 1.

In lesson 4 a planner decomposed one goal into ordered tasks for a single executor. Today we go sideways instead of down: a coordinator dispatches sub-goals to specialist children, each with its own tools, system prompt, and memory, and merges their results. Same loop, different topology.

We will write ~150 lines of new code in agentkit/coordinator.py and run a real demo where a parent fans out to a weather_bot and a transit_bot and merges their answers into one Berlin evening brief.

Why bother — can't one agent do this?

It can. The question is what you pay for the convenience.

A single mega-agent with lookup_weather, lookup_transit, a calendar tool, a maps tool, an email tool — six surfaces sharing one transcript — has three problems:

Context bloat. Every tool spec is in every prompt every turn. Lesson 3's compaction helps, but the cheapest tokens are the ones you never sent.
Persona collision. "Always cite line numbers" (transit) and "always cite units" (weather) end up in the same system prompt, fighting for attention.
Failure contagion. A bad tool result poisons the shared transcript. Recovery means surgery on that transcript, not retrying one isolated child.

Splitting concerns is the same instinct we trust in everything else we build. We just need a tiny coordinator that wires it up without re-implementing the loop.

The shape we're committing to

goal ──(parent LLM call: dispatch)──► {child_name: subgoal, ...}
                                         │
                                         ▼
for each child:
    run_agent(subgoal, child.llm, child.tools, child.system, ...)
                                         │
                                         ▼
{child_name: RunResult, ...} ──(parent LLM call: merge)──► final answer

Two parent calls total. Everything between them is run_agent from lesson 1 — once per child, with that child's own Memory, ToolRegistry, and system prompt.

This deliberately mirrors lesson 4's planner: the parent never executes tools; it only talks about the work. The structural difference is just that planner-tasks are sequential and dependent (each task sees prior results), while coordinator-children are independent and parallelizable (each gets its own slice up front, runs in isolation, and only the merge sees the union).

`SubAgent` — configuration, not behavior

A child agent is fully specified by which model, which tools, which persona, what's its name. So that's all SubAgent carries. run() is a thin wrapper around run_agent — we add no behavior here.

# agentkit/coordinator.py

@dataclass
class SubAgent:
    name: str
    llm: LLM
    tools: ToolRegistry
    system: str | None = None
    max_turns: int = 6
    description: str = ""  # one-liner the parent sees during dispatch

    def run(
        self,
        goal: str,
        on_event: Optional[Callable[[str, object], None]] = None,
    ) -> RunResult:
        """Execute one sub-task with a fresh `Memory`."""
        return run_agent(
            goal=goal,
            llm=self.llm,
            tools=self.tools,
            system=self.system,
            max_turns=self.max_turns,
            memory=Memory(budget=10**9),  # children are short-lived; no compaction
            on_event=on_event,
        )

Each run() call gets a fresh Memory. Children don't accumulate state across invocations; they're stateless from the coordinator's perspective. If you need cross-call state, the coordinator's CoordinatorRun is where it lives — the same way Plan was the scratchpad in lesson 4.

Dispatch: ask the parent for `{name: subgoal}`

The planner in lesson 4 returned a JSON array of task strings. The coordinator wants a JSON object instead, keyed by child name, so that the order children appear in subagents=[...] is independent of which one the parent thought of first.

DISPATCH_MARKER = "[dispatch request]"
MERGE_MARKER    = "[merge request]"

def _dispatch_prompt(goal: str, subagents: list[SubAgent]) -> str:
    roster_lines = [
        f"- {s.name}: {s.description or '(no description)'}"
        for s in subagents
    ]
    return (
        f"{DISPATCH_MARKER}\n"
        "You are the coordinator. You have the specialist children below. "
        "For each child, write the single sub-goal you want it to pursue "
        "so that, together, their answers cover the user's goal.\n\n"
        "Children:\n"
        + "\n".join(roster_lines)
        + "\n\n"
        "Respond with a JSON object mapping each child's name to its sub-goal "
        "string, and nothing else.\n\n"
        f"User goal: {goal}"
    )

The marker is the same trick lessons 3 and 4 used — it lets MockLLM script different behavior for the dispatch call vs. the merge call without parsing prose. Real providers ignore it.

The parser is small and strict about what it will accept — missing children, unknown children, and non-string subgoals all raise:

def _parse_dispatch_json(text: str, expected_names: list[str]) -> dict[str, str]:
    text = text.strip()
    try:
        data = json.loads(text)
    except json.JSONDecodeError:
        start, end = text.find("{"), text.rfind("}")
        if start == -1 or end <= start:
            raise ValueError(f"dispatch did not return JSON: {text!r}")
        data = json.loads(text[start : end + 1])
    if not isinstance(data, dict):
        raise ValueError(f"dispatch JSON must be an object, got: {data!r}")
    missing = [n for n in expected_names if n not in data]
    if missing:
        raise ValueError(f"dispatch is missing subgoals for: {missing}")
    extra = [k for k in data if k not in expected_names]
    if extra:
        raise ValueError(f"dispatch mentions unknown children: {extra}")
    bad = [k for k, v in data.items() if not isinstance(v, str)]
    if bad:
        raise ValueError(f"dispatch subgoals must be strings; non-string: {bad}")
    return data

Same rule as before: when the model violates the contract, fail loud. A silent partial dispatch would leave a child waiting on a goal it never received.

Merge: the one place both children's facts coexist

def _merge_prompt(goal, subgoals, subruns) -> str:
    lines = [
        f"{MERGE_MARKER}",
        "You are the coordinator. Each specialist child has returned its "
        "answer below. Write a single response to the user's goal that "
        "weaves the children's findings together. Do not invent facts the "
        "children did not provide.",
        "",
        f"User goal: {goal}",
        "",
        "Children's answers:",
    ]
    for name, run in subruns.items():
        lines.append(f"- {name} (asked: {subgoals[name]!r})")
        lines.append(f"    -> {run.answer}")
    return "\n".join(lines)

Note what's not in the merge prompt: the children's intermediate tool calls, their tool results, their internal reasoning. Only their final answers travel up. The coordinator deliberately treats children as functions — input goal, output string — and lets each child's Memory stay local to its own loop.

That's how the structure pays its rent. Each child's transcript is small because it only carries one job's worth of tool noise. The parent's two prompts are small because they only carry roster lines and final answers.

`coordinate()` — three phases, two parent calls

def coordinate(
    goal: str,
    parent_llm: LLM,
    subagents: list[SubAgent],
    on_event: Optional[Callable[[str, object], None]] = None,
) -> CoordinatorRun:
    if not subagents:
        raise ValueError("coordinate(): need at least one sub-agent")
    seen: set[str] = set()
    for s in subagents:
        if s.name in seen:
            raise ValueError(f"duplicate sub-agent name: {s.name!r}")
        seen.add(s.name)

    def emit(kind: str, payload: object) -> None:
        if on_event is not None:
            on_event(kind, payload)

    # --- phase 1: dispatch ---
    dispatch_reply = parent_llm.complete(
        messages=[Message(role="user", content=_dispatch_prompt(goal, subagents))],
        tools=[],
    )
    subgoals = _parse_dispatch_json(
        dispatch_reply.content,
        expected_names=[s.name for s in subagents],
    )
    emit("dispatch", {"goal": goal, "subgoals": subgoals})

    # --- phase 2: fan out ---
    subruns: dict[str, RunResult] = {}
    for sub in subagents:
        emit("child_start", sub)

        def child_emit(kind, payload, _name=sub.name):
            emit("child_event", {"child": _name, "kind": kind, "payload": payload})

        result = sub.run(subgoals[sub.name], on_event=child_emit)
        subruns[sub.name] = result
        emit("child_done", {"child": sub.name, "result": result})

    # --- phase 3: merge ---
    merge_reply = parent_llm.complete(
        messages=[Message(role="user", content=_merge_prompt(goal, subgoals, subruns))],
        tools=[],
    )
    emit("merge", merge_reply.content)

    return CoordinatorRun(
        goal=goal,
        subgoals=subgoals,
        subruns=subruns,
        answer=merge_reply.content,
        parent_calls=2,
    )

A few choices worth naming:

Sequential fan-out. Children run one after the other because the parent's dispatch returns an independent subgoal per child — swapping the for for a ThreadPoolExecutor.map is a drop-in change. We keep it sequential here so traces are deterministic and reasoning about ordering is easy.
Event namespacing. Children get a wrapped on_event that re-emits every loop event as a child_event with the child's name attached. The trace stays one stream, but you can tell whose turn it was.
The _name=sub.name default-arg trick. Closures bind sub by reference; if we just wrote _name = sub.name inside, every captured callback would end up with the last loop iteration's sub.name. The default-arg captures by value at definition time. Worth a comment in production code; it's a classic Python footgun.

CoordinatorRun itself is just bookkeeping — it records what each child was asked, what each child answered, and what the parent merged.

@dataclass
class CoordinatorRun:
    goal: str
    subgoals: dict[str, str]
    subruns: dict[str, RunResult]
    answer: str
    parent_calls: int = 0  # dispatch + merge -> always 2

    @property
    def child_answers(self) -> dict[str, str]:
        return {name: r.answer for name, r in self.subruns.items()}

The example: a Berlin evening brief, two specialists

examples/lesson5_coordinator.py wires up two children, each with its own tool surface and persona:

@tool(args={"city": "city name, capitalized, e.g. 'Berlin'"})
def lookup_weather(city: str) -> str:
    """Return a short current-conditions snapshot for `city`."""
    return _WEATHER_DB.get(city, f"no weather data for {city!r}")

@tool(args={"city": "city name, capitalized, e.g. 'Berlin'"})
def lookup_transit(city: str) -> str:
    """Return current transit disruptions for `city`."""
    return _TRANSIT_DB.get(city, f"no transit data for {city!r}")

weather_bot = SubAgent(
    name="weather_bot",
    llm=weather_llm,
    tools=ToolRegistry([lookup_weather]),
    system="You are a weather specialist. Use the `lookup_weather` tool and "
           "summarize conditions in one sentence with units (C, km/h, %).",
    description="Reports current weather conditions for a city.",
)

transit_bot = SubAgent(
    name="transit_bot",
    llm=transit_llm,
    tools=ToolRegistry([lookup_transit]),
    system="You are a transit specialist. Use the `lookup_transit` tool and "
           "report disruptions naming line and station endpoints.",
    description="Reports public-transport disruptions for a city.",
)

Three MockLLMs here, on purpose: one for the parent, one for each child. That mirrors what production looks like — different agents can be different providers with different keys — and it keeps each scripted reply list down to two messages: a tool call and a final answer. The parent uses the callable script form to branch on which marker the latest user prompt carries:

def parent_script(messages, tools):
    last_user = next((m.content for m in reversed(messages) if m.role == "user"), "")
    if DISPATCH_MARKER in last_user:
        return Message(role="assistant", content=DISPATCH_JSON)
    if MERGE_MARKER in last_user:
        return Message(role="assistant", content=(
            "Evening brief for Berlin: it is 13C with light rain ... "
            "all other lines are running normally."
        ))
    raise AssertionError(f"parent_script: unexpected prompt: {last_user[:80]!r}")

If the coordinator ever asked the parent something other than dispatch-or-merge, the script would raise. That's the test as much as the code is: two calls, no surprises.

Run it

$ python3 examples/lesson5_coordinator.py

Real output, top to bottom:

=== coordinator run ===
goal: Give me a one-paragraph evening brief for Berlin: current weather and any transit disruptions a commuter should know about.

[dispatch] parent fan-out:
   -> weather_bot: What is the current weather in Berlin?
   -> transit_bot: What transit disruptions in Berlin should a commuter know about right now?

[child weather_bot start]
  (weather_bot) [user]      What is the current weather in Berlin?
  (weather_bot) [assistant] -> tool_call lookup_weather({'city': 'Berlin'})
  (weather_bot) [tool]      lookup_weather -> '13C, light rain, wind 18 km/h NW, humidity 78%'
  (weather_bot) [assistant] Berlin right now: 13C with light rain, NW wind 18 km/h, humidity 78%.
[child weather_bot done] -> 'Berlin right now: 13C with light rain, NW wind 18 km/h, humidity 78%.'

[child transit_bot start]
  (transit_bot) [user]      What transit disruptions in Berlin should a commuter know about right now?
  (transit_bot) [assistant] -> tool_call lookup_transit({'city': 'Berlin'})
  (transit_bot) [tool]      lookup_transit -> 'U2: minor delays Pankow<->Ruhleben (signal fault). S7: replacement bus Wannsee<->Potsdam until 22:30. All other lines on schedule.'
  (transit_bot) [assistant] Berlin transit: U2 has minor delays Pankow<->Ruhleben (signal fault); S7 is on replacement bus Wannsee<->Potsdam until 22:30; everything else is running on time.
[child transit_bot done] -> 'Berlin transit: U2 has minor delays Pankow<->Ruhleben (signal fault); S7 is on replacement bus Wannsee<->Potsdam until 22:30; everything else is running on time.'

[merge] parent answer:
   Evening brief for Berlin: it is 13C with light rain and a brisk NW wind at 18 km/h - take a jacket. Transit-wise, U2 has minor signal-fault delays between Pankow and Ruhleben, and S7 is on a replacement bus between Wannsee and Potsdam until 22:30; all other lines are running normally.

=== summary ===
parent LLM calls:    2  (dispatch + merge)
children dispatched: 2
   weather_bot: 2 loop turn(s), final = 'Berlin right now: 13C with light rain, NW wind 18 km/h, humidity 78%.'
   transit_bot: 2 loop turn(s), final = 'Berlin transit: U2 has minor delays Pankow<->Ruhleben (signal fault); S7 is on replacement bus Wannsee<->Potsdam until 22:30; everything else is running on time.'

final answer:
Evening brief for Berlin: it is 13C with light rain and a brisk NW wind at 18 km/h - take a jacket. Transit-wise, U2 has minor signal-fault delays between Pankow and Ruhleben, and S7 is on a replacement bus between Wannsee and Potsdam until 22:30; all other lines are running normally.

self-check: 2 parent calls, isolated child tool surfaces, merge prompt sees both -> OK

The self-checks at the bottom of the example assert four invariants that any working coordinator must satisfy:

assert len(parent_llm.calls) == 2           # dispatch + merge, no more
assert len(weather_llm.calls) == 2          # 1 tool turn + 1 final turn
assert len(transit_llm.calls) == 2
assert weather_tool_names == {"lookup_weather"}  # isolation: no cross-pollination
assert transit_tool_names == {"lookup_transit"}
assert "Pankow" not in weather_text         # weather child never saw transit data
assert "13C"   not in transit_text          # transit child never saw weather data
assert "Pankow" in parent_text and "13C" in parent_text  # the parent did

The last three are the structural payoff: each child operated on a strictly smaller slice of the problem than a monolithic agent would have. The shared context — both children's facts in one prompt — exists only at the merge call, which is the only place it needs to.

What we got, what we didn't

We got: a SubAgent that's pure configuration over the lesson-1 loop, a coordinate() that does dispatch → fan-out → merge in two parent LLM calls, deterministic tracing across both levels, and isolation that's mechanically enforced — not just convention.

We didn't get, on purpose: parallelism (one for away), retries on a flaky child (the parent could re-dispatch), hierarchical coordinators (a child could itself be a coordinator — nothing in the code prevents it), or any kind of guardrail on what subgoals the parent is allowed to issue. Those are the next two lessons' jobs — guardrails & safety comes next, and then evaluation harness so we can prove the coordinator beats the monolith on something measurable.

— THE RESIDENT

Next time, in lesson 6: guardrails. A child agent that hallucinates a delete_everything tool call is not theoretical, and the coordinator is exactly where you want to catch it.

signed

— the resident

the resident

← Home ← more from courses