Lab 10 - Build a Tool Registry and Policy Gate
Download the lab completion worksheet and lab production readiness worksheet before you start.
Objective
Extend the mini-runtime with a tool registry and policy gate. The model can propose a tool call, but software decides whether the tool exists, whether the input is acceptable, and whether policy allows execution.
What You Will Use
- Language: TypeScript or Python
- Framework/runtime: from-scratch educational runtime
- Framework-agnostic lesson: tool descriptions are not permissions; registry and policy are separate runtime boundaries.
- Pattern chapters: Tool Use, Tool Capability Design, Policy Enforcement
- Previous lab: Lab 09 - Minimal Agent Loop
Exercise Time Budget
These estimates assume the Lab 09 loop is already available.
| Exercise | Time | Output |
|---|---|---|
| Run the baseline tool contract | 10 min | Passing mini-runtime test output. |
| Add registry and policy checks | 20-25 min | Separate tool lookup, schema handling, and policy decision. |
| Exercise denial and failure cases | 10-15 min | Unknown-tool, approval-required, and tool-failure signals. |
| Review production boundary | 10-20 min | Notes for registry ownership, approval records, and trace fields. |
Setup
Start from the Lab 09 loop. Add a registry object or map keyed by tool name.
Use deterministic tools. Do not call external systems in this lab.
Reference files:
minimal-agent-runtime/typescript/src/runtime.tsminimal-agent-runtime/typescript/test/runtime.spec.ts
Run the reference test before editing:
npm run mini-runtime:test
Runtime Contract
type ToolResult =
| { status: "ok"; data: unknown }
| { status: "refused"; reason: string }
| { status: "error"; reason: string };
type ToolDefinition = {
name: string;
description: string;
sideEffect: "read" | "draft" | "write";
execute(input: unknown): Promise<ToolResult>;
};
type PolicyDecision =
| { status: "allow" }
| { status: "deny"; reason: string }
| { status: "approval_required"; reason: string };
Guided Change
Add two tools:
const tools: Record<string, ToolDefinition> = {
lookup_policy: {
name: "lookup_policy",
description: "Read policy guidance for the current task.",
sideEffect: "read",
execute: async input => ({ status: "ok", data: { input, policy: "approval required for writes" } }),
},
draft_message: {
name: "draft_message",
description: "Create a draft message for review.",
sideEffect: "draft",
execute: async input => ({ status: "ok", data: { draft: String(input) } }),
},
};
Then add policy:
function authorize(tool: ToolDefinition): PolicyDecision {
if (tool.sideEffect === "write") {
return { status: "approval_required", reason: "write_tool_requires_approval" };
}
return { status: "allow" };
}
Update the loop so tool decisions pass through:
- registry lookup;
- policy decision;
- tool execution only when allowed;
- observation recording for refused, denied, approval-required, and successful outcomes.
Baseline Run
Use the reference demo or a decision function that calls lookup_policy.
npm run mini-runtime
Expected Result
The allowed read-tool path in the reference demo should include:
toolsCalled: ["lookup_policy"]
observations include: lookup_policy:allow
observations include: lookup_policy:ok
trace includes: policy_decision
trace includes: tool_result
stopReason: success
The reference test covers these refusal and failure signals:
| Case | Expected Signal |
|---|---|
Unknown tool delete_customer |
stopReason: refused; delete_customer is not executed. |
Write tool send_message |
stopReason: blocked; send_message is not executed because approval is required. |
Failing read tool flaky_lookup |
stopReason: tool_failure; trace includes upstream_timeout. |
| Permissive unsafe write policy | Final answer can be success, but trajectory eval fails because send_message was called. |
Use this flow as the lab’s acceptance model. The model may propose a tool, but the registry, policy gate, trace, and eval decide whether execution is allowed and reviewable.
Failure Cases
Test these cases:
- Unknown tool name.
- Tool with
writeside effect. - Tool execution returns
error. - Permissive policy that allows a write tool, so trajectory eval must catch the unsafe path.
The exact stop behavior can vary, but the runtime must not silently execute forbidden or unknown tools.
Verify
Check these assertions manually or with npm run mini-runtime:test:
- unknown tools are not executed;
- policy runs before execution;
- approval-required is represented as a runtime state;
- tool results are structured;
- observations record both allowed and refused paths.
The reference test also proves that send_message is blocked before execution when the default policy requires approval for write tools.
Lab Review Gate
Before moving on, verify the tool boundary:
| Check | Evidence |
|---|---|
| Tool lookup is explicit | Unknown tool names are refused before execution. |
| Policy runs before execution | Write tools produce approval_required before any side effect. |
| Results are structured | Tool outcomes use ok, refused, or error status. |
| Side-effect class matters | Read, draft, and write tools can be governed differently. |
| Observations preserve the path | Allowed and refused outcomes are visible in runtime observations. |
Record the allowed read path, unknown-tool refusal, write-tool approval requirement, and error case in the lab completion worksheet.
Production Extension
Before using this pattern with real tools, add:
- input schema validation;
- idempotency keys;
- timeout and retry policy;
- actor, tenant, route, and approval context;
- trace IDs for proposed call, policy decision, execution, and result;
- separate policies for read, draft, write, external communication, money movement, memory write, and code execution.
Production Bridge
Use this table when adapting the registry to production:
| Lab Concept | Production Version |
|---|---|
| Tool map | Versioned capability registry with owner, permissions, and disable switch. |
ToolDefinition |
Tool manifest with schema, side-effect class, timeout, retry, and audit fields. |
authorize |
Policy engine using actor, tenant, resource, risk, budget, and approval context. |
approval_required |
Durable approval request with reviewer, expiry, exact action, and trace link. |
| Tool observation | Trace span with proposed input, decision, result, cost, latency, and redaction. |
The first production milestone is a tool path that can prove why execution was allowed, denied, or paused.
Cross-Framework Mapping
- In LangGraph, this can be implemented as a tool node guarded by a policy node.
- In Mastra AI, this maps to tool definitions plus workflow or tool-level policy.
- In AutoGen-style systems, this maps to function execution guarded by the manager or runtime.
- In CrewAI, this maps to role-assigned tools plus flow-level constraints.