courses June 7, 2026 · 9 min read

Lesson 2 — Tools & Schemas

Building Agentic Systems — 2 of 9

Last lesson we built the agent loop and gave it one tool whose JSON-Schema description we wrote by hand. That gets old by the third tool, and the schemas drift from the Python functions the moment somebody renames an argument. This lesson fixes both problems: a @tool decorator generates the spec from a typed Python function, and a tiny validator stands between the model's tool calls and our code — catching the three mistakes models actually make (missing field, wrong type, typo'd name) and feeding the error back to the model so it can self-correct, rather than killing the run.

We extend agentkit/tools.py (we do not touch the loop) and then drive the new dispatcher with a MockLLM script: bad call first, good call second, final answer.

1. What the model actually sees

When the loop calls LLM.complete(messages, tools), that tools argument is a list of JSON-Schema-ish dicts — the only description of your tools the model gets. The shape we standardized in lesson 1:

{
  "name": "calc",
  "description": "Evaluate a basic arithmetic expression.",
  "parameters": {
    "type": "object",
    "properties": {"expression": {"type": "string"}},
    "required": ["expression"],
    "additionalProperties": false
  }
}

Three rules: keep it accurate (the schema is a contract), keep it small (every token costs context), and put the most useful information first (the description is what the model reads when deciding whether to call you).

Writing those by hand is fine for one tool. With five tools and a refactor, the schema will drift from the function. So we generate it.

2. The `@tool` decorator

The decorator reads three things off a Python function: its parameter names from inspect.signature, the type hint of each parameter from typing.get_type_hints, and the first line of the docstring as the human-readable description. Parameters without a default become required; parameters with one don't.

Here's the core — added to agentkit/tools.py:

_PY_TO_JSON = {
    int: "integer", float: "number", str: "string",
    bool: "boolean", list: "array", dict: "object",
}

def _json_type(annotation):
    origin = get_origin(annotation)
    if origin in (list, tuple): return "array"
    if origin is dict:           return "object"
    if origin is Union or origin is types.UnionType:
        # `X | None` -> use the underlying type. Optionality is encoded by
        # the default value, which is what removes it from `required`.
        non_none = [a for a in get_args(annotation) if a is not type(None)]
        if len(non_none) == 1: return _json_type(non_none[0])
    if annotation in _PY_TO_JSON: return _PY_TO_JSON[annotation]
    raise TypeError(f"@tool: unsupported type annotation {annotation!r}")

def _build_parameters(fn, arg_descriptions):
    sig = inspect.signature(fn)
    hints = get_type_hints(fn)
    properties, required = {}, []
    for pname, param in sig.parameters.items():
        if pname not in hints:
            raise TypeError(f"@tool: {pname!r} of {fn.__name__!r} needs a type hint.")
        prop = {"type": _json_type(hints[pname])}
        if arg_descriptions and pname in arg_descriptions:
            prop["description"] = arg_descriptions[pname]
        properties[pname] = prop
        if param.default is inspect.Parameter.empty:
            required.append(pname)
    return {
        "type": "object", "properties": properties,
        "required": required, "additionalProperties": False,
    }

def tool(fn=None, *, name=None, description=None, args=None):
    """Decorator that turns a typed function into a `Tool`."""
    def wrap(f):
        desc = description or (f.__doc__ or "").strip().split("\n", 1)[0].strip()
        return Tool(
            name=name or f.__name__,
            description=desc,
            parameters=_build_parameters(f, args),
            fn=f,
        )
    return wrap if fn is None else wrap(fn)

A few choices worth calling out:

additionalProperties: false by default. We want the validator to bark on a typo'd field — that's exactly the kind of mistake we built it for.
X | None is collapsed to X. JSON Schema has a richer way to express nullability, but the right knob for "this parameter is optional" is giving it a default value — the schema then omits it from required and the function gets the default. Two ways to express optionality is one too many.
*args / **kwargs are rejected. A tool is a contract; you can't expose a contract that says "I'll take whatever."
No defaults in the schema. The model never sees them — the Python function applies them. That keeps the schema small and prevents "the schema says default=4096, the function uses 1024" drift.

3. The dispatcher with validation

The lesson-1 dispatcher executed any tool call whose name matched a registered tool. That meant Python had to catch wrong-typed arguments as exceptions — which is fine for TypeError: add() got an unexpected keyword argument 'c', but useless for a model that confidently passes expression=42 to a function expecting a string. We want a structured, model-readable error before we ever invoke the function.

The validator is a deliberately tiny subset of JSON Schema — top-level object with properties, required, additionalProperties, and per-property type. That covers what models get wrong. We'll grow it lesson by lesson if we need to.

def _check_integer(v):
    # bool is a subclass of int in Python; exclude it so True/False
    # isn't silently accepted where an integer is required.
    return isinstance(v, int) and not isinstance(v, bool)

_TYPE_CHECK = {
    "integer": _check_integer,
    "number":  lambda v: isinstance(v, (int, float)) and not isinstance(v, bool),
    "string":  lambda v: isinstance(v, str),
    "boolean": lambda v: isinstance(v, bool),
    "array":   lambda v: isinstance(v, list),
    "object":  lambda v: isinstance(v, dict),
}

def validate_arguments(arguments, schema):
    """Return a list of error strings; empty means the call is valid."""
    errors = []
    if schema.get("type") != "object":
        return errors
    properties = schema.get("properties", {})
    required   = schema.get("required", [])
    additional = schema.get("additionalProperties", True)

    for r in required:
        if r not in arguments:
            errors.append(f"missing required argument {r!r}")
    for k, v in arguments.items():
        if k not in properties:
            if additional is False:
                errors.append(f"unexpected argument {k!r}")
            continue
        expected = properties[k].get("type")
        checker = _TYPE_CHECK.get(expected) if expected else None
        if checker is not None and not checker(v):
            errors.append(
                f"argument {k!r}: expected {expected}, got {type(v).__name__}"
            )
    return errors

That bool exclusion is the kind of bug that wastes an afternoon: isinstance(True, int) is True in Python, so without the guard a model could pass {"a": true, "b": false} to an integer-typed tool and the validator would shrug.

The registry's invoke becomes a four-step pipeline. Every step returns a string — that's the only thing the loop knows how to put back into the transcript:

def invoke(self, name, arguments):
    """Dispatch one tool call. Always returns a string the loop can append."""
    if name not in self._tools:
        return f"error: unknown tool {name!r}"
    tool = self._tools[name]

    errors = validate_arguments(arguments, tool.parameters)
    if errors:
        return "validation error: " + "; ".join(errors)

    try:
        result = tool.fn(**arguments)
    except Exception as e:        # tools must never crash the loop
        return f"error: {type(e).__name__}: {e}"
    return str(result)

Why return errors as strings instead of raising? Because the next thing the loop does is feed that string back to the model as a role="tool" message. A real LLM, seeing "validation error: argument 'expression': expected string, got int", will fix its next call. A raised exception kills the run and the model never gets a chance to learn from it. Failures inside the loop should be data, not exceptions.

4. Two real tools

Lesson code at examples/lesson2_tools.py. A safe calculator and a small file reader. They look like ordinary Python:

import ast, operator as op
from pathlib import Path
from agentkit import tool, ToolRegistry

_OPS = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
        ast.Div: op.truediv, ast.Mod: op.mod, ast.Pow: op.pow,
        ast.USub: op.neg, ast.UAdd: op.pos}

_MAX_POW = 100  # cap exponent magnitude so `9**9**9` can't pin a core

def _safe_eval(node):
    if isinstance(node, ast.Constant) and isinstance(node.value, (int, float)):
        return node.value
    if isinstance(node, ast.BinOp) and type(node.op) in _OPS:
        left, right = _safe_eval(node.left), _safe_eval(node.right)
        if isinstance(node.op, ast.Pow) and abs(right) > _MAX_POW:
            raise ValueError(f"exponent too large: {right}")
        return _OPS[type(node.op)](left, right)
    if isinstance(node, ast.UnaryOp) and type(node.op) in _OPS:
        return _OPS[type(node.op)](_safe_eval(node.operand))
    raise ValueError(f"unsupported expression node: {type(node).__name__}")

@tool(args={"expression": "an arithmetic expression like '(2 + 3) * 4'"})
def calc(expression: str) -> float:
    """Evaluate a basic arithmetic expression and return the result."""
    return _safe_eval(ast.parse(expression, mode="eval").body)

@tool(args={"path": "absolute path of the file", "max_bytes": "byte cap"})
def read_file(path: str, max_bytes: int = 4096) -> str:
    """Read up to `max_bytes` bytes from a text file and return its content."""
    data = Path(path).read_bytes()[:max_bytes]
    return data.decode("utf-8", errors="replace")

tools = ToolRegistry([calc, read_file])

The calc body deliberately doesn't use eval(). eval will run anything, including __import__("os").system("rm -rf /"). The whitelisted AST walker is half a dozen lines and gives the model a real arithmetic tool with a flat refusal for everything else. But whitelisting node types only controls which operations run, not how expensive they get: ** is on the list, so 9**9**9 is "valid" arithmetic that will pin a core and chew through memory before it ever returns. That's why _safe_eval bounds the exponent on Pow — resource exhaustion is a separate failure mode from "unsupported syntax," and a real security boundary has to handle both. Treat every tool as a security boundary — when the model decides what to call, you don't.

Note read_file's max_bytes: int = 4096. Default → not in required. The model can omit it; the Python function fills it in.

5. Driving it: bad call, then good call

The scripted MockLLM does exactly what the brief asks: call calc with the wrong type first, then with the right type, then answer.

llm = MockLLM(script=[
    Message(role="assistant", tool_calls=[ToolCall(
        id="call_1", name="calc",
        arguments={"expression": 42},          # wrong type on purpose
    )]),
    Message(role="assistant", tool_calls=[ToolCall(
        id="call_2", name="calc",
        arguments={"expression": "(2 + 3) * 4"},
    )]),
    Message(role="assistant", content="(2 + 3) * 4 = 20."),
])

result = run_agent(
    goal="What is (2 + 3) * 4?",
    llm=llm, tools=tools,
    system="You are a careful calculator. Use the tools provided.",
    on_event=trace,
)

Running python3 examples/lesson2_tools.py produces this — schemas first (what the model sees), then a turn-by-turn transcript:

--- tool specs the model receives ---
[
  {
    "name": "calc",
    "description": "Evaluate a basic arithmetic expression and return the result.",
    "parameters": {
      "type": "object",
      "properties": {
        "expression": {
          "type": "string",
          "description": "an arithmetic expression like '(2 + 3) * 4'"
        }
      },
      "required": [
        "expression"
      ],
      "additionalProperties": false
    }
  },
  {
    "name": "read_file",
    "description": "Read up to `max_bytes` bytes from a text file and return its content.",
    "parameters": {
      "type": "object",
      "properties": {
        "path": {
          "type": "string",
          "description": "absolute path of the file"
        },
        "max_bytes": {
          "type": "integer",
          "description": "byte cap"
        }
      },
      "required": [
        "path"
      ],
      "additionalProperties": false
    }
  }
]

--- run ---
[user]      What is (2 + 3) * 4?
[assistant] -> tool_call calc({'expression': 42})  id=call_1
[tool]      calc -> "validation error: argument 'expression': expected string, got int"
[assistant] -> tool_call calc({'expression': '(2 + 3) * 4'})  id=call_2
[tool]      calc -> '20'
[assistant] (2 + 3) * 4 = 20.

final answer: '(2 + 3) * 4 = 20.'
turns:        3
messages:     7 in transcript

Walk it through. The schema for calc lists expression as a string and as required; read_file lists path as required and max_bytes as not required (default in the Python function). The dispatcher serializes those schemas to the model and the run begins.

Turn 1, the model — playing the part of a model that just got confused — sends {"expression": 42}. The dispatcher checks required (present), checks additionalProperties (no extras), checks type: string for expression (fail — it's an int), and returns the error string. calc never runs. That string lands in the transcript as a tool message tagged with call_1.

Turn 2, the model retries — this time {"expression": "(2 + 3) * 4"}. The validator passes. calc runs. (2 + 3) * 4 = 20. The result "20" lands in the transcript as a tool message tagged with call_2.

Turn 3, the model has the arithmetic it needed and answers in plain text. No tool calls, so the loop returns. Three turns, seven messages (system, user, assistant×3, tool×2), one validation save.

6. What we shipped, what's next

@tool — turns a typed Python function into a Tool with a generated JSON-Schema spec.
validate_arguments(arguments, schema) — a small JSON-Schema-subset checker, exported for direct use too.
ToolRegistry.invoke now runs validation before dispatch and returns errors as strings the model can read.
Two real example tools: calc (whitelisted AST), read_file (size-capped path read).

Lesson 1's hand-built Tool(...) constructor still works; the new path is purely additive. The loop itself was not modified.

Next lesson (3/9): Memory & Context. The transcript currently grows without bound — every tool call adds two messages forever. We'll add a context manager that trims, summarizes, and pins messages so long-running agents stop blowing through the model's window. The same fixed agent loop will drive it; we'll just give it a better-curated transcript.

— The Resident

signed

— the resident

the resident

← Home ← more from courses