Lab 10 extends the mini-runtime with a tool registry and policy gate so software decides whether a proposed tool call may run.

Section
Hands-On Labs
Type
Lab
Level
Hands-on
Read
4 min
Effort
45-90 min lab
BuilderStudentSecurity

Lab 10 - Build a Tool Registry and Policy Gate

Download the lab completion worksheet and lab production readiness worksheet before you start.

Objective

Extend the mini-runtime with a tool registry and policy gate. The model can propose a tool call, but software decides whether the tool exists, whether the input is acceptable, and whether policy allows execution.

What You Will Use

Exercise Time Budget

These estimates assume the Lab 09 loop is already available.

Exercise Time Output
Run the baseline tool contract 10 min Passing mini-runtime test output.
Add registry and policy checks 20-25 min Separate tool lookup, schema handling, and policy decision.
Exercise denial and failure cases 10-15 min Unknown-tool, approval-required, and tool-failure signals.
Review production boundary 10-20 min Notes for registry ownership, approval records, and trace fields.

Setup

Start from the Lab 09 loop. Add a registry object or map keyed by tool name.

Use deterministic tools. Do not call external systems in this lab.

Reference files:

  • minimal-agent-runtime/typescript/src/runtime.ts
  • minimal-agent-runtime/typescript/test/runtime.spec.ts

Run the reference test before editing:

npm run mini-runtime:test

Runtime Contract

type ToolResult =
  | { status: "ok"; data: unknown }
  | { status: "refused"; reason: string }
  | { status: "error"; reason: string };

type ToolDefinition = {
  name: string;
  description: string;
  sideEffect: "read" | "draft" | "write";
  execute(input: unknown): Promise<ToolResult>;
};

type PolicyDecision =
  | { status: "allow" }
  | { status: "deny"; reason: string }
  | { status: "approval_required"; reason: string };

Guided Change

Add two tools:

const tools: Record<string, ToolDefinition> = {
  lookup_policy: {
    name: "lookup_policy",
    description: "Read policy guidance for the current task.",
    sideEffect: "read",
    execute: async input => ({ status: "ok", data: { input, policy: "approval required for writes" } }),
  },
  draft_message: {
    name: "draft_message",
    description: "Create a draft message for review.",
    sideEffect: "draft",
    execute: async input => ({ status: "ok", data: { draft: String(input) } }),
  },
};

Then add policy:

function authorize(tool: ToolDefinition): PolicyDecision {
  if (tool.sideEffect === "write") {
    return { status: "approval_required", reason: "write_tool_requires_approval" };
  }
  return { status: "allow" };
}

Update the loop so tool decisions pass through:

  1. registry lookup;
  2. policy decision;
  3. tool execution only when allowed;
  4. observation recording for refused, denied, approval-required, and successful outcomes.

Baseline Run

Use the reference demo or a decision function that calls lookup_policy.

npm run mini-runtime

Expected Result

The allowed read-tool path in the reference demo should include:

toolsCalled: ["lookup_policy"]
observations include: lookup_policy:allow
observations include: lookup_policy:ok
trace includes: policy_decision
trace includes: tool_result
stopReason: success

The reference test covers these refusal and failure signals:

Case Expected Signal
Unknown tool delete_customer stopReason: refused; delete_customer is not executed.
Write tool send_message stopReason: blocked; send_message is not executed because approval is required.
Failing read tool flaky_lookup stopReason: tool_failure; trace includes upstream_timeout.
Permissive unsafe write policy Final answer can be success, but trajectory eval fails because send_message was called.
sequenceDiagram participant Model participant Runtime participant Registry participant Policy participant Tool participant Trace participant Eval Model->>Runtime: Propose tool name and input Runtime->>Registry: Look up tool definition alt Unknown tool Registry-->>Runtime: Not found Runtime->>Trace: Record refused tool proposal else Tool exists Registry-->>Runtime: Tool manifest and side-effect class Runtime->>Policy: Authorize actor, tool, input, side effect alt Allowed Policy-->>Runtime: allow Runtime->>Tool: Execute typed call Tool-->>Runtime: ok, refused, or error result Runtime->>Trace: Record policy and tool result else Denied or approval required Policy-->>Runtime: deny or approval_required Runtime->>Trace: Record blocked execution end end Runtime->>Eval: Check trajectory for forbidden tools

Use this flow as the lab’s acceptance model. The model may propose a tool, but the registry, policy gate, trace, and eval decide whether execution is allowed and reviewable.

Failure Cases

Test these cases:

  1. Unknown tool name.
  2. Tool with write side effect.
  3. Tool execution returns error.
  4. Permissive policy that allows a write tool, so trajectory eval must catch the unsafe path.

The exact stop behavior can vary, but the runtime must not silently execute forbidden or unknown tools.

Verify

Check these assertions manually or with npm run mini-runtime:test:

  • unknown tools are not executed;
  • policy runs before execution;
  • approval-required is represented as a runtime state;
  • tool results are structured;
  • observations record both allowed and refused paths.

The reference test also proves that send_message is blocked before execution when the default policy requires approval for write tools.

Lab Review Gate

Before moving on, verify the tool boundary:

Check Evidence
Tool lookup is explicit Unknown tool names are refused before execution.
Policy runs before execution Write tools produce approval_required before any side effect.
Results are structured Tool outcomes use ok, refused, or error status.
Side-effect class matters Read, draft, and write tools can be governed differently.
Observations preserve the path Allowed and refused outcomes are visible in runtime observations.

Record the allowed read path, unknown-tool refusal, write-tool approval requirement, and error case in the lab completion worksheet.

Production Extension

Before using this pattern with real tools, add:

  • input schema validation;
  • idempotency keys;
  • timeout and retry policy;
  • actor, tenant, route, and approval context;
  • trace IDs for proposed call, policy decision, execution, and result;
  • separate policies for read, draft, write, external communication, money movement, memory write, and code execution.

Production Bridge

Use this table when adapting the registry to production:

Lab Concept Production Version
Tool map Versioned capability registry with owner, permissions, and disable switch.
ToolDefinition Tool manifest with schema, side-effect class, timeout, retry, and audit fields.
authorize Policy engine using actor, tenant, resource, risk, budget, and approval context.
approval_required Durable approval request with reviewer, expiry, exact action, and trace link.
Tool observation Trace span with proposed input, decision, result, cost, latency, and redaction.

The first production milestone is a tool path that can prove why execution was allowed, denied, or paused.

Cross-Framework Mapping

  • In LangGraph, this can be implemented as a tool node guarded by a policy node.
  • In Mastra AI, this maps to tool definitions plus workflow or tool-level policy.
  • In AutoGen-style systems, this maps to function execution guarded by the manager or runtime.
  • In CrewAI, this maps to role-assigned tools plus flow-level constraints.