Templates and Worksheets

These templates turn the book’s guidance into reviewable engineering artifacts. Use them when a lab becomes a product slice, when a team chooses a framework, or when an agent gains more authority.

Copy only the sections that matter. A short, complete decision record beats a long template with empty answers.

For filled examples, compare these templates with the Capstone Projects.

Choose The Right Artifact

Start with the artifact that matches the decision in front of you. Do not fill every template by default.

Situation	Use This Artifact	Done When
Choosing LangGraph, AutoGen, Mastra, CrewAI, or a custom runtime	Framework selection ADR template	The team can name what the framework owns and what the application still owns.
Studying completed agent ADRs	Completed agent ADR examples	The team can compare its decision record with realistic authority, RAG, and multi-agent examples.
Studying completed lab evidence	Completed lab evidence examples	The team can compare its worksheet with concrete commands, traces, failure paths, production gaps, and release decisions.
Studying production readiness evidence	Completed production readiness examples	The team can compare its production worksheet with concrete owners, gates, blockers, readiness ratings, and next actions.
Reviewing an Agentic RAG answer	Agentic RAG query trace worksheet	The team can reconstruct source selection, omitted evidence, citation checks, and the final answer/refusal decision.
Reviewing a debate or consensus workflow	Debate and consensus review checklist	Independence, merge policy, dissent handling, budget, baseline comparison, and judge errors are reviewed.
Threat-modeling an agent route	Agent threat model worksheet	Private data, untrusted content, tool authority, STRIDE risks, evals, and trace evidence are reviewed together.
Reviewing agent UX and trust	Agent UX review worksheet	Runtime states, visible evidence, user controls, approval UX, correction paths, and UX evals are explicit.
Finishing a lab	Lab completion worksheet	The run command, test command, pattern boundary, and lesson are recorded.
Completing Lab 02 Agent Loop and Planning	Lab 02 planning loop guided exercise worksheet	Baseline plan trace, changed input, unsupported step, missing input, and stop-condition contract are captured.
Completing Lab 03 Agentic RAG	Lab 03 Agentic RAG guided exercise worksheet	Baseline retrieval, grounding change, missing-evidence behavior, source contract, native graph comparison, and eval fixture are captured.
Completing Lab 06 Observability and Evals	Lab 06 observability and evals guided exercise worksheet	Trace contract, missing-policy failure, negative case, CI gate, and incident-to-eval note are captured.
Completing Lab 07 Runtime Packaging	Lab 07 runtime packaging guided exercise worksheet	Runtime boundaries, tool order, forbidden side effects, rollback, and native Mastra comparison are captured.
Completing Lab 12 State Graphs	Lab 12 state graph guided exercise worksheet	Interrupt, checkpoint, resume, replay safety, native LangGraph comparison, and production follow-up are captured.
Turning a lab into a product slice	Lab production readiness worksheet	Missing state, policy, eval, trace, rollback, and ownership controls are explicit.
Reviewing a production candidate	Production readiness worksheet	Every high-authority path has evidence or is scoped out of release.
Operating a production route	Runtime SLO and incident review worksheet	SLOs, dashboard panels, incident triage, trace review, eval conversion, and rollback actions are explicit.
Preparing a release	Release evidence record	The release has eval output, known risk, approver, rollback owner, and rollback command.
Reviewing agentic system composition	Agentic system architecture review checklist	Experience, control, execution, knowledge, safety, observability, evaluation, contracts, and authority boundaries are explicit.
Reviewing a full system architecture	Reference architecture review checklist	Entry, routing, state, knowledge, tools, memory, approval, evals, observability, and release ownership are reviewed together.
Reviewing unattended event-triggered work	Event-triggered agent review checklist	Event contract, admission, dedupe, ordering, retries, dead letters, storms, traces, and evals are explicit.
Reviewing a self-healing workflow	Self-healing workflow review checklist	Failure classes, recovery policy, idempotency, compensation, breakers, replay packets, and regression evals are explicit.
Reviewing a personal agent architecture	Personal agent architecture review checklist	Consent, local/cloud split, credentials, memory, user controls, connectors, and audit evidence are explicit.
Checking the final bar	10/10 production gate scorecard	The system is reviewable, testable, observable, reversible, and owned.
Comparing a capstone with your own system	Capstone review scorecard	You have a concrete adaptation plan and a list of blocking gaps.
Scoring a pattern before adoption	Pattern evaluation scorecard	The pattern has evidence for goal, boundary, autonomy, tools, state, context, security, evals, observability, and operations.
Translating older agent terminology	Historical pattern migration record	A vague legacy label is mapped to a current architecture decision with evidence.
Reviewing a reusable skill package	Skill review checklist	Activation, instruction shape, assets, security, versioning, tests, observability, and final decision are explicit.
Reviewing one pattern	The matching pattern review checklist below	Use cases, avoid cases, failure modes, evals, and production controls have evidence.

flowchart LR A[Current decision] --> L{Legacy term?} L -->|yes| M[Historical migration record] L -->|no| B{Framework?} B -->|yes| C[ADR] B -->|no| D{Lab done?} D -->|yes| E[Lab worksheet] D -->|no| F{Product slice?} F -->|yes| G[Production readiness worksheet] F -->|no| H{Release?} H -->|yes| I[Release evidence record] H -->|no| N{Operating?} N -->|yes| O[Runtime SLO worksheet] N -->|no| J[Pattern checklist] G --> K[10/10 gate] I --> K O --> K

Use the smallest artifact that proves the decision. Add the production gate only when the system is close enough to release that missing evidence should block progress.

Completed Examples

Blank templates help only after the reader understands the evidence standard. Use completed examples to compare the density and specificity of your own artifact.

flowchart LR A["Run lab or review"] --> B["Fill worksheet"] B --> C["Compare with completed example"] C --> D{"Evidence specific enough?"} D -->|"yes"| E["Attach to ADR, eval, or release record"] D -->|"no"| F["Add command, trace, failure, owner, or gap"] F --> C

Use Completed agent ADR examples for architecture decisions. Use Completed lab evidence examples for lab evidence packs and eval reviews. Use Completed production readiness examples when a team needs to calibrate owner, gate, blocker, rollback, and release-decision evidence.

Downloadable versions:

Framework Selection ADR

Use this ADR when adopting LangGraph, AutoGen, Mastra, CrewAI, a mini-runtime, or another framework.

# ADR-000: Choose [Framework] for [Agent or Workflow]

## Status

Proposed | Accepted | Superseded

## Context

What product problem are we solving?
What user-visible workflow will this framework host?
What constraints matter: language, deployment, compliance, team skills, latency, cost, or existing infrastructure?

## Decision

We will use [framework] for [scope].
The framework will own [state/control flow/tools/memory/evals/observability/deployment].
The application will still own [policy/domain data/security/approval/rollback].

## Alternatives Considered

| Option | Why It Fit | Why We Did Not Choose It |
| --- | --- | --- |
| Direct code / mini-runtime | | |
| LangGraph | | |
| AutoGen | | |
| Mastra | | |
| CrewAI | | |

## Responsibility Boundary

| Responsibility | Owner | Evidence |
| --- | --- | --- |
| State | framework/application/platform | schema, checkpoint, migration plan |
| Control flow | framework/application/platform | graph, workflow, team, flow, loop |
| Tools | framework/application/platform | manifest, schema, permission model |
| Policy | framework/application/platform | enforcement point before authority |
| Memory | framework/application/platform | retention, deletion, correction rules |
| Observability | framework/application/platform | trace schema and dashboard |
| Evals | framework/application/platform | fixtures, thresholds, CI gate |
| Deployment | framework/application/platform | runbook, rollback, kill switch |

## Vertical Slice Proof

- user request:
- state object:
- read tool:
- side-effect tool:
- policy decision:
- trace:
- eval:
- rollback:

## Acceptance Criteria

- local install and run commands are documented;
- state can be inspected and replayed;
- policy runs before side effects;
- evals fail the build for critical regressions;
- traces explain one failed run;
- rollback can disable the risky capability;
- framework-specific code does not hide product policy.

## Consequences

What gets easier?
What gets harder?
What lock-in or migration risk remains?
Which production incident would make us revisit this decision?

## Review Trigger

Review this ADR after model upgrade, framework upgrade, new write-capable tool, new memory type, production incident, or repeated human override.

Production Readiness Worksheet

Use this worksheet before exposing the system to real users, real data, or real side effects.

Gate	Answer	Evidence
Owner	Who owns the runtime and incidents?	team, on-call, runbook
Scope	Which users, tenants, tools, data, and workflows are in scope?	ADR, service config
State	What state exists and where is it persisted?	schema, checkpoint store
Idempotency	Which actions can be retried safely?	idempotency keys, side-effect records
Tools	Which tools can be called and with what authority?	tool manifest, permission map
Policy	Where does policy run before authority?	enforcement code, tests
Approval	Which actions require approval?	approval schema, UI/API, expiry
Memory	What can be read or written?	retention, deletion, correction rules
Observability	Can one failed run be reconstructed?	trace dashboard, redaction proof
Evals	What blocks release?	eval fixtures, thresholds, CI output
Security	How are secrets, egress, and sandboxing handled?	secret manager, network policy
Rollback	How do we disable model, prompt, tool, workflow, or agent?	runbook, feature flag
Incident Loop	How do incidents become evals?	post-incident process

Readiness rating:

green: every high-authority path has evidence
yellow: limited internal or read-only release only
red: demo only; no real users, data, or side effects

Lab-To-Production Checklist

Use this checklist after completing any hands-on lab.

lab:
target product slice:
framework/language:
owner:

Architecture
[ ] Pattern selected and linked
[ ] Framework decision recorded
[ ] State owner named
[ ] Tool boundary defined
[ ] Policy boundary defined
[ ] Human approval boundary defined when needed

Implementation
[ ] Install command documented
[ ] Local run command documented
[ ] Test command documented
[ ] Eval command documented
[ ] .env.example committed
[ ] Secrets excluded from source
[ ] Tool schemas validated
[ ] Side effects use idempotency keys
[ ] Errors have typed outcomes

Production
[ ] Checkpoint or explicit stateless decision recorded
[ ] Trace schema implemented
[ ] Trace redaction implemented
[ ] Eval threshold defined
[ ] CI gate configured
[ ] Rollback path documented
[ ] Kill switch tested
[ ] Runbook created

Evidence
[ ] Test output attached
[ ] Eval output attached
[ ] Example trace attached
[ ] ADR linked
[ ] Owner accepted residual risk

If a checked item has no evidence, leave it unchecked.

Release Gate Checklist

Use this gate for each production release.

Download the release evidence record when preparing a public book release, GitHub Pages deployment, or release PR.

Check	Required Before Release
Prompt/model changed	run task, refusal, policy, tool, and cost evals
Tool changed	run authorization, schema, idempotency, and error evals
Policy changed	run false-allow, false-deny, approval, and escalation evals
Memory changed	run read-scope, write-policy, deletion, and correction evals
Retrieval changed	run access, freshness, citation, and missing-evidence evals
Runtime changed	run retry, cancellation, checkpoint, and trace completeness evals

Release decision:

release version:
change type:
eval dataset version:
passing threshold:
actual result:
known failures:
approved by:
rollback owner:
rollback command:

Incident-To-Eval Worksheet

Use this after production incidents, near misses, or serious human overrides.

incident ID:
date:
service:
trace ID:
owner:

What happened?

Which boundary failed?
[ ] state
[ ] tool
[ ] policy
[ ] approval
[ ] memory
[ ] retrieval
[ ] model/prompt
[ ] workflow
[ ] observability
[ ] eval gate

What should have happened?

New eval case:
input:
expected trajectory:
expected tool behavior:
expected policy behavior:
expected output:
blocking threshold:

Release rule:
[ ] blocks future release
[ ] warning only
[ ] monitor only

Follow-up:
code change:
policy change:
runbook change:
ADR update:
owner:
due date:

An incident that does not produce an eval, a policy change, or a runbook update is likely to repeat.