Templates and Worksheets provide reusable review artifacts for design, evaluation, production readiness, and release evidence.

Section
Engineering Practice and Frameworks
Type
Reference
Level
Intermediate
Read
9 min
Effort
10-20 min reference
BuilderArchitect

Templates and Worksheets

These templates turn the book’s guidance into reviewable engineering artifacts. Use them when a lab becomes a product slice, when a team chooses a framework, or when an agent gains more authority.

Copy only the sections that matter. A short, complete decision record beats a long template with empty answers.

For filled examples, compare these templates with the Capstone Projects.

Choose The Right Artifact

Start with the artifact that matches the decision in front of you. Do not fill every template by default.

Situation Use This Artifact Done When
Choosing LangGraph, AutoGen, Mastra, CrewAI, or a custom runtime Framework selection ADR template The team can name what the framework owns and what the application still owns.
Studying completed agent ADRs Completed agent ADR examples The team can compare its decision record with realistic authority, RAG, and multi-agent examples.
Studying completed lab evidence Completed lab evidence examples The team can compare its worksheet with concrete commands, traces, failure paths, production gaps, and release decisions.
Studying production readiness evidence Completed production readiness examples The team can compare its production worksheet with concrete owners, gates, blockers, readiness ratings, and next actions.
Reviewing an Agentic RAG answer Agentic RAG query trace worksheet The team can reconstruct source selection, omitted evidence, citation checks, and the final answer/refusal decision.
Reviewing a debate or consensus workflow Debate and consensus review checklist Independence, merge policy, dissent handling, budget, baseline comparison, and judge errors are reviewed.
Threat-modeling an agent route Agent threat model worksheet Private data, untrusted content, tool authority, STRIDE risks, evals, and trace evidence are reviewed together.
Reviewing agent UX and trust Agent UX review worksheet Runtime states, visible evidence, user controls, approval UX, correction paths, and UX evals are explicit.
Finishing a lab Lab completion worksheet The run command, test command, pattern boundary, and lesson are recorded.
Completing Lab 02 Agent Loop and Planning Lab 02 planning loop guided exercise worksheet Baseline plan trace, changed input, unsupported step, missing input, and stop-condition contract are captured.
Completing Lab 03 Agentic RAG Lab 03 Agentic RAG guided exercise worksheet Baseline retrieval, grounding change, missing-evidence behavior, source contract, native graph comparison, and eval fixture are captured.
Completing Lab 06 Observability and Evals Lab 06 observability and evals guided exercise worksheet Trace contract, missing-policy failure, negative case, CI gate, and incident-to-eval note are captured.
Completing Lab 07 Runtime Packaging Lab 07 runtime packaging guided exercise worksheet Runtime boundaries, tool order, forbidden side effects, rollback, and native Mastra comparison are captured.
Completing Lab 12 State Graphs Lab 12 state graph guided exercise worksheet Interrupt, checkpoint, resume, replay safety, native LangGraph comparison, and production follow-up are captured.
Turning a lab into a product slice Lab production readiness worksheet Missing state, policy, eval, trace, rollback, and ownership controls are explicit.
Reviewing a production candidate Production readiness worksheet Every high-authority path has evidence or is scoped out of release.
Operating a production route Runtime SLO and incident review worksheet SLOs, dashboard panels, incident triage, trace review, eval conversion, and rollback actions are explicit.
Preparing a release Release evidence record The release has eval output, known risk, approver, rollback owner, and rollback command.
Reviewing agentic system composition Agentic system architecture review checklist Experience, control, execution, knowledge, safety, observability, evaluation, contracts, and authority boundaries are explicit.
Reviewing a full system architecture Reference architecture review checklist Entry, routing, state, knowledge, tools, memory, approval, evals, observability, and release ownership are reviewed together.
Reviewing unattended event-triggered work Event-triggered agent review checklist Event contract, admission, dedupe, ordering, retries, dead letters, storms, traces, and evals are explicit.
Reviewing a self-healing workflow Self-healing workflow review checklist Failure classes, recovery policy, idempotency, compensation, breakers, replay packets, and regression evals are explicit.
Reviewing a personal agent architecture Personal agent architecture review checklist Consent, local/cloud split, credentials, memory, user controls, connectors, and audit evidence are explicit.
Checking the final bar 10/10 production gate scorecard The system is reviewable, testable, observable, reversible, and owned.
Comparing a capstone with your own system Capstone review scorecard You have a concrete adaptation plan and a list of blocking gaps.
Scoring a pattern before adoption Pattern evaluation scorecard The pattern has evidence for goal, boundary, autonomy, tools, state, context, security, evals, observability, and operations.
Translating older agent terminology Historical pattern migration record A vague legacy label is mapped to a current architecture decision with evidence.
Reviewing a reusable skill package Skill review checklist Activation, instruction shape, assets, security, versioning, tests, observability, and final decision are explicit.
Reviewing one pattern The matching pattern review checklist below Use cases, avoid cases, failure modes, evals, and production controls have evidence.
flowchart LR A[Current decision] --> L{Legacy term?} L -->|yes| M[Historical migration record] L -->|no| B{Framework?} B -->|yes| C[ADR] B -->|no| D{Lab done?} D -->|yes| E[Lab worksheet] D -->|no| F{Product slice?} F -->|yes| G[Production readiness worksheet] F -->|no| H{Release?} H -->|yes| I[Release evidence record] H -->|no| N{Operating?} N -->|yes| O[Runtime SLO worksheet] N -->|no| J[Pattern checklist] G --> K[10/10 gate] I --> K O --> K

Use the smallest artifact that proves the decision. Add the production gate only when the system is close enough to release that missing evidence should block progress.

Completed Examples

Blank templates help only after the reader understands the evidence standard. Use completed examples to compare the density and specificity of your own artifact.

flowchart LR A["Run lab or review"] --> B["Fill worksheet"] B --> C["Compare with completed example"] C --> D{"Evidence specific enough?"} D -->|"yes"| E["Attach to ADR, eval, or release record"] D -->|"no"| F["Add command, trace, failure, owner, or gap"] F --> C

Use Completed agent ADR examples for architecture decisions. Use Completed lab evidence examples for lab evidence packs and eval reviews. Use Completed production readiness examples when a team needs to calibrate owner, gate, blocker, rollback, and release-decision evidence.

Downloadable versions:

Framework Selection ADR

Use this ADR when adopting LangGraph, AutoGen, Mastra, CrewAI, a mini-runtime, or another framework.

# ADR-000: Choose [Framework] for [Agent or Workflow]

## Status

Proposed | Accepted | Superseded

## Context

What product problem are we solving?
What user-visible workflow will this framework host?
What constraints matter: language, deployment, compliance, team skills, latency, cost, or existing infrastructure?

## Decision

We will use [framework] for [scope].
The framework will own [state/control flow/tools/memory/evals/observability/deployment].
The application will still own [policy/domain data/security/approval/rollback].

## Alternatives Considered

| Option | Why It Fit | Why We Did Not Choose It |
| --- | --- | --- |
| Direct code / mini-runtime | | |
| LangGraph | | |
| AutoGen | | |
| Mastra | | |
| CrewAI | | |

## Responsibility Boundary

| Responsibility | Owner | Evidence |
| --- | --- | --- |
| State | framework/application/platform | schema, checkpoint, migration plan |
| Control flow | framework/application/platform | graph, workflow, team, flow, loop |
| Tools | framework/application/platform | manifest, schema, permission model |
| Policy | framework/application/platform | enforcement point before authority |
| Memory | framework/application/platform | retention, deletion, correction rules |
| Observability | framework/application/platform | trace schema and dashboard |
| Evals | framework/application/platform | fixtures, thresholds, CI gate |
| Deployment | framework/application/platform | runbook, rollback, kill switch |

## Vertical Slice Proof

- user request:
- state object:
- read tool:
- side-effect tool:
- policy decision:
- trace:
- eval:
- rollback:

## Acceptance Criteria

- local install and run commands are documented;
- state can be inspected and replayed;
- policy runs before side effects;
- evals fail the build for critical regressions;
- traces explain one failed run;
- rollback can disable the risky capability;
- framework-specific code does not hide product policy.

## Consequences

What gets easier?
What gets harder?
What lock-in or migration risk remains?
Which production incident would make us revisit this decision?

## Review Trigger

Review this ADR after model upgrade, framework upgrade, new write-capable tool, new memory type, production incident, or repeated human override.

Production Readiness Worksheet

Use this worksheet before exposing the system to real users, real data, or real side effects.

Gate Answer Evidence
Owner Who owns the runtime and incidents? team, on-call, runbook
Scope Which users, tenants, tools, data, and workflows are in scope? ADR, service config
State What state exists and where is it persisted? schema, checkpoint store
Idempotency Which actions can be retried safely? idempotency keys, side-effect records
Tools Which tools can be called and with what authority? tool manifest, permission map
Policy Where does policy run before authority? enforcement code, tests
Approval Which actions require approval? approval schema, UI/API, expiry
Memory What can be read or written? retention, deletion, correction rules
Observability Can one failed run be reconstructed? trace dashboard, redaction proof
Evals What blocks release? eval fixtures, thresholds, CI output
Security How are secrets, egress, and sandboxing handled? secret manager, network policy
Rollback How do we disable model, prompt, tool, workflow, or agent? runbook, feature flag
Incident Loop How do incidents become evals? post-incident process

Readiness rating:

green: every high-authority path has evidence
yellow: limited internal or read-only release only
red: demo only; no real users, data, or side effects

Lab-To-Production Checklist

Use this checklist after completing any hands-on lab.

lab:
target product slice:
framework/language:
owner:

Architecture
[ ] Pattern selected and linked
[ ] Framework decision recorded
[ ] State owner named
[ ] Tool boundary defined
[ ] Policy boundary defined
[ ] Human approval boundary defined when needed

Implementation
[ ] Install command documented
[ ] Local run command documented
[ ] Test command documented
[ ] Eval command documented
[ ] .env.example committed
[ ] Secrets excluded from source
[ ] Tool schemas validated
[ ] Side effects use idempotency keys
[ ] Errors have typed outcomes

Production
[ ] Checkpoint or explicit stateless decision recorded
[ ] Trace schema implemented
[ ] Trace redaction implemented
[ ] Eval threshold defined
[ ] CI gate configured
[ ] Rollback path documented
[ ] Kill switch tested
[ ] Runbook created

Evidence
[ ] Test output attached
[ ] Eval output attached
[ ] Example trace attached
[ ] ADR linked
[ ] Owner accepted residual risk

If a checked item has no evidence, leave it unchecked.

Release Gate Checklist

Use this gate for each production release.

Download the release evidence record when preparing a public book release, GitHub Pages deployment, or release PR.

Check Required Before Release
Prompt/model changed run task, refusal, policy, tool, and cost evals
Tool changed run authorization, schema, idempotency, and error evals
Policy changed run false-allow, false-deny, approval, and escalation evals
Memory changed run read-scope, write-policy, deletion, and correction evals
Retrieval changed run access, freshness, citation, and missing-evidence evals
Runtime changed run retry, cancellation, checkpoint, and trace completeness evals

Release decision:

release version:
change type:
eval dataset version:
passing threshold:
actual result:
known failures:
approved by:
rollback owner:
rollback command:

Incident-To-Eval Worksheet

Use this after production incidents, near misses, or serious human overrides.

incident ID:
date:
service:
trace ID:
owner:

What happened?

Which boundary failed?
[ ] state
[ ] tool
[ ] policy
[ ] approval
[ ] memory
[ ] retrieval
[ ] model/prompt
[ ] workflow
[ ] observability
[ ] eval gate

What should have happened?

New eval case:
input:
expected trajectory:
expected tool behavior:
expected policy behavior:
expected output:
blocking threshold:

Release rule:
[ ] blocks future release
[ ] warning only
[ ] monitor only

Follow-up:
code change:
policy change:
runbook change:
ADR update:
owner:
due date:

An incident that does not produce an eval, a policy change, or a runbook update is likely to repeat.