Templates and Worksheets
These templates turn the book’s guidance into reviewable engineering artifacts. Use them when a lab becomes a product slice, when a team chooses a framework, or when an agent gains more authority.
Copy only the sections that matter. A short, complete decision record beats a long template with empty answers.
For filled examples, compare these templates with the Capstone Projects.
Choose The Right Artifact
Start with the artifact that matches the decision in front of you. Do not fill every template by default.
| Situation | Use This Artifact | Done When |
|---|---|---|
| Choosing LangGraph, AutoGen, Mastra, CrewAI, or a custom runtime | Framework selection ADR template | The team can name what the framework owns and what the application still owns. |
| Studying completed agent ADRs | Completed agent ADR examples | The team can compare its decision record with realistic authority, RAG, and multi-agent examples. |
| Studying completed lab evidence | Completed lab evidence examples | The team can compare its worksheet with concrete commands, traces, failure paths, production gaps, and release decisions. |
| Studying production readiness evidence | Completed production readiness examples | The team can compare its production worksheet with concrete owners, gates, blockers, readiness ratings, and next actions. |
| Reviewing an Agentic RAG answer | Agentic RAG query trace worksheet | The team can reconstruct source selection, omitted evidence, citation checks, and the final answer/refusal decision. |
| Reviewing a debate or consensus workflow | Debate and consensus review checklist | Independence, merge policy, dissent handling, budget, baseline comparison, and judge errors are reviewed. |
| Threat-modeling an agent route | Agent threat model worksheet | Private data, untrusted content, tool authority, STRIDE risks, evals, and trace evidence are reviewed together. |
| Reviewing agent UX and trust | Agent UX review worksheet | Runtime states, visible evidence, user controls, approval UX, correction paths, and UX evals are explicit. |
| Finishing a lab | Lab completion worksheet | The run command, test command, pattern boundary, and lesson are recorded. |
| Completing Lab 02 Agent Loop and Planning | Lab 02 planning loop guided exercise worksheet | Baseline plan trace, changed input, unsupported step, missing input, and stop-condition contract are captured. |
| Completing Lab 03 Agentic RAG | Lab 03 Agentic RAG guided exercise worksheet | Baseline retrieval, grounding change, missing-evidence behavior, source contract, native graph comparison, and eval fixture are captured. |
| Completing Lab 06 Observability and Evals | Lab 06 observability and evals guided exercise worksheet | Trace contract, missing-policy failure, negative case, CI gate, and incident-to-eval note are captured. |
| Completing Lab 07 Runtime Packaging | Lab 07 runtime packaging guided exercise worksheet | Runtime boundaries, tool order, forbidden side effects, rollback, and native Mastra comparison are captured. |
| Completing Lab 12 State Graphs | Lab 12 state graph guided exercise worksheet | Interrupt, checkpoint, resume, replay safety, native LangGraph comparison, and production follow-up are captured. |
| Turning a lab into a product slice | Lab production readiness worksheet | Missing state, policy, eval, trace, rollback, and ownership controls are explicit. |
| Reviewing a production candidate | Production readiness worksheet | Every high-authority path has evidence or is scoped out of release. |
| Operating a production route | Runtime SLO and incident review worksheet | SLOs, dashboard panels, incident triage, trace review, eval conversion, and rollback actions are explicit. |
| Preparing a release | Release evidence record | The release has eval output, known risk, approver, rollback owner, and rollback command. |
| Reviewing agentic system composition | Agentic system architecture review checklist | Experience, control, execution, knowledge, safety, observability, evaluation, contracts, and authority boundaries are explicit. |
| Reviewing a full system architecture | Reference architecture review checklist | Entry, routing, state, knowledge, tools, memory, approval, evals, observability, and release ownership are reviewed together. |
| Reviewing unattended event-triggered work | Event-triggered agent review checklist | Event contract, admission, dedupe, ordering, retries, dead letters, storms, traces, and evals are explicit. |
| Reviewing a self-healing workflow | Self-healing workflow review checklist | Failure classes, recovery policy, idempotency, compensation, breakers, replay packets, and regression evals are explicit. |
| Reviewing a personal agent architecture | Personal agent architecture review checklist | Consent, local/cloud split, credentials, memory, user controls, connectors, and audit evidence are explicit. |
| Checking the final bar | 10/10 production gate scorecard | The system is reviewable, testable, observable, reversible, and owned. |
| Comparing a capstone with your own system | Capstone review scorecard | You have a concrete adaptation plan and a list of blocking gaps. |
| Scoring a pattern before adoption | Pattern evaluation scorecard | The pattern has evidence for goal, boundary, autonomy, tools, state, context, security, evals, observability, and operations. |
| Translating older agent terminology | Historical pattern migration record | A vague legacy label is mapped to a current architecture decision with evidence. |
| Reviewing a reusable skill package | Skill review checklist | Activation, instruction shape, assets, security, versioning, tests, observability, and final decision are explicit. |
| Reviewing one pattern | The matching pattern review checklist below | Use cases, avoid cases, failure modes, evals, and production controls have evidence. |
Use the smallest artifact that proves the decision. Add the production gate only when the system is close enough to release that missing evidence should block progress.
Completed Examples
Blank templates help only after the reader understands the evidence standard. Use completed examples to compare the density and specificity of your own artifact.
Use Completed agent ADR examples for architecture decisions. Use Completed lab evidence examples for lab evidence packs and eval reviews. Use Completed production readiness examples when a team needs to calibrate owner, gate, blocker, rollback, and release-decision evidence.
Downloadable versions:
- A2A agent interoperability review checklist
- Agentic RAG query trace worksheet
- Agentic system architecture review checklist
- Agent threat model worksheet
- Agent UX review worksheet
- Capstone review scorecard
- Completed agent ADR examples
- Completed lab evidence examples
- Completed production readiness examples
- Computer-use agent review checklist
- CrewAI flows and crews review checklist
- Debate and consensus review checklist
- Cost controls and runtime budgets review checklist
- Deployment walkthrough review checklist
- Domain agent architecture review checklist
- Durable workflows review checklist
- Evaluator-optimizer review checklist
- Event-triggered agent review checklist
- Framework selection ADR template
- Historical pattern migration record
- Lab 02 planning loop guided exercise worksheet
- Lab 03 Agentic RAG guided exercise worksheet
- Lab 06 observability and evals guided exercise worksheet
- Lab 07 runtime packaging guided exercise worksheet
- Lab 12 state graph guided exercise worksheet
- Lab completion worksheet
- Lab production readiness worksheet
- Knowledge-bound agents review checklist
- Long-term episodic memory review checklist
- Mastra runtime review checklist
- Memory-augmented agent review checklist
- MCP-first tool use review checklist
- Observability and evals review checklist
- Personal agent architecture review checklist
- Parallel agents review checklist
- Pattern evaluation scorecard
- Planning and execution review checklist
- Policy enforcement review checklist
- Production evaluation feedback loop review checklist
- Production readiness worksheet
- Release evidence record
- Reference architecture review checklist
- Reflection pattern review checklist
- ReAct review checklist
- Runtime SLO and incident review worksheet
- Self-improvement review checklist
- Self-healing workflow review checklist
- Secure agent communication review checklist
- Semantic recall and RAG review checklist
- Skill review checklist
- Task delegation review checklist
- Tool capability design review checklist
- Working memory review checklist
- 10/10 production gate scorecard
Framework Selection ADR
Use this ADR when adopting LangGraph, AutoGen, Mastra, CrewAI, a mini-runtime, or another framework.
# ADR-000: Choose [Framework] for [Agent or Workflow]
## Status
Proposed | Accepted | Superseded
## Context
What product problem are we solving?
What user-visible workflow will this framework host?
What constraints matter: language, deployment, compliance, team skills, latency, cost, or existing infrastructure?
## Decision
We will use [framework] for [scope].
The framework will own [state/control flow/tools/memory/evals/observability/deployment].
The application will still own [policy/domain data/security/approval/rollback].
## Alternatives Considered
| Option | Why It Fit | Why We Did Not Choose It |
| --- | --- | --- |
| Direct code / mini-runtime | | |
| LangGraph | | |
| AutoGen | | |
| Mastra | | |
| CrewAI | | |
## Responsibility Boundary
| Responsibility | Owner | Evidence |
| --- | --- | --- |
| State | framework/application/platform | schema, checkpoint, migration plan |
| Control flow | framework/application/platform | graph, workflow, team, flow, loop |
| Tools | framework/application/platform | manifest, schema, permission model |
| Policy | framework/application/platform | enforcement point before authority |
| Memory | framework/application/platform | retention, deletion, correction rules |
| Observability | framework/application/platform | trace schema and dashboard |
| Evals | framework/application/platform | fixtures, thresholds, CI gate |
| Deployment | framework/application/platform | runbook, rollback, kill switch |
## Vertical Slice Proof
- user request:
- state object:
- read tool:
- side-effect tool:
- policy decision:
- trace:
- eval:
- rollback:
## Acceptance Criteria
- local install and run commands are documented;
- state can be inspected and replayed;
- policy runs before side effects;
- evals fail the build for critical regressions;
- traces explain one failed run;
- rollback can disable the risky capability;
- framework-specific code does not hide product policy.
## Consequences
What gets easier?
What gets harder?
What lock-in or migration risk remains?
Which production incident would make us revisit this decision?
## Review Trigger
Review this ADR after model upgrade, framework upgrade, new write-capable tool, new memory type, production incident, or repeated human override.
Production Readiness Worksheet
Use this worksheet before exposing the system to real users, real data, or real side effects.
| Gate | Answer | Evidence |
|---|---|---|
| Owner | Who owns the runtime and incidents? | team, on-call, runbook |
| Scope | Which users, tenants, tools, data, and workflows are in scope? | ADR, service config |
| State | What state exists and where is it persisted? | schema, checkpoint store |
| Idempotency | Which actions can be retried safely? | idempotency keys, side-effect records |
| Tools | Which tools can be called and with what authority? | tool manifest, permission map |
| Policy | Where does policy run before authority? | enforcement code, tests |
| Approval | Which actions require approval? | approval schema, UI/API, expiry |
| Memory | What can be read or written? | retention, deletion, correction rules |
| Observability | Can one failed run be reconstructed? | trace dashboard, redaction proof |
| Evals | What blocks release? | eval fixtures, thresholds, CI output |
| Security | How are secrets, egress, and sandboxing handled? | secret manager, network policy |
| Rollback | How do we disable model, prompt, tool, workflow, or agent? | runbook, feature flag |
| Incident Loop | How do incidents become evals? | post-incident process |
Readiness rating:
green: every high-authority path has evidence
yellow: limited internal or read-only release only
red: demo only; no real users, data, or side effects
Lab-To-Production Checklist
Use this checklist after completing any hands-on lab.
lab:
target product slice:
framework/language:
owner:
Architecture
[ ] Pattern selected and linked
[ ] Framework decision recorded
[ ] State owner named
[ ] Tool boundary defined
[ ] Policy boundary defined
[ ] Human approval boundary defined when needed
Implementation
[ ] Install command documented
[ ] Local run command documented
[ ] Test command documented
[ ] Eval command documented
[ ] .env.example committed
[ ] Secrets excluded from source
[ ] Tool schemas validated
[ ] Side effects use idempotency keys
[ ] Errors have typed outcomes
Production
[ ] Checkpoint or explicit stateless decision recorded
[ ] Trace schema implemented
[ ] Trace redaction implemented
[ ] Eval threshold defined
[ ] CI gate configured
[ ] Rollback path documented
[ ] Kill switch tested
[ ] Runbook created
Evidence
[ ] Test output attached
[ ] Eval output attached
[ ] Example trace attached
[ ] ADR linked
[ ] Owner accepted residual risk
If a checked item has no evidence, leave it unchecked.
Release Gate Checklist
Use this gate for each production release.
Download the release evidence record when preparing a public book release, GitHub Pages deployment, or release PR.
| Check | Required Before Release |
|---|---|
| Prompt/model changed | run task, refusal, policy, tool, and cost evals |
| Tool changed | run authorization, schema, idempotency, and error evals |
| Policy changed | run false-allow, false-deny, approval, and escalation evals |
| Memory changed | run read-scope, write-policy, deletion, and correction evals |
| Retrieval changed | run access, freshness, citation, and missing-evidence evals |
| Runtime changed | run retry, cancellation, checkpoint, and trace completeness evals |
Release decision:
release version:
change type:
eval dataset version:
passing threshold:
actual result:
known failures:
approved by:
rollback owner:
rollback command:
Incident-To-Eval Worksheet
Use this after production incidents, near misses, or serious human overrides.
incident ID:
date:
service:
trace ID:
owner:
What happened?
Which boundary failed?
[ ] state
[ ] tool
[ ] policy
[ ] approval
[ ] memory
[ ] retrieval
[ ] model/prompt
[ ] workflow
[ ] observability
[ ] eval gate
What should have happened?
New eval case:
input:
expected trajectory:
expected tool behavior:
expected policy behavior:
expected output:
blocking threshold:
Release rule:
[ ] blocks future release
[ ] warning only
[ ] monitor only
Follow-up:
code change:
policy change:
runbook change:
ADR update:
owner:
due date:
An incident that does not produce an eval, a policy change, or a runbook update is likely to repeat.