CAPSTONE REVIEW SCORECARD Capstone: Reviewer: Date: Target release: Primary risk boundary: [ ] Side-effect authority [ ] Private or approved data [ ] Multi-agent coordination [ ] Long-running workflow state [ ] Tool or protocol integration [ ] User trust and human control Capstone being adapted: [ ] Support Refund Agent [ ] Research RAG Agent [ ] Multi-Agent Delivery Workflow [ ] Custom adaptation Scoring: 0 = missing 1 = described but not implemented or not reviewable 2 = implemented, tested, documented, and owned 1. Problem and Scope Score: 0 1 2 Evidence: Review questions: - Is the user workflow concrete? - Are non-goals explicit? - Is the authority level clear? 2. Pattern Composition Score: 0 1 2 Evidence: Review questions: - Does every pattern earn its place? - Is there a reason for each loop, tool, memory, or agent? - Are rejected alternatives named? 3. Architecture Boundary Score: 0 1 2 Evidence: Review questions: - Does the design separate model judgment from deterministic control? - Are state, policy, tools, memory, approval, and observability owned? - Can another engineer review the boundary without reading all code? 4. Tool and Policy Control Score: 0 1 2 Evidence: Review questions: - Are tools typed, scoped, authorized, timed out, and audited? - Does policy run before authority? - Are high-risk actions denied, approved, or escalated outside the prompt? 5. State, Memory, and Context Score: 0 1 2 Evidence: Review questions: - Is run state inspectable and replayable? - Are memory rules explicit? - Does context carry source, trust, freshness, and budget? 6. Evaluation Evidence Score: 0 1 2 Evidence: Review questions: - Do evals cover happy paths, edge cases, regressions, and unsafe paths? - Is there a blocking threshold? - Does the eval inspect trajectory, not only final text? 7. Observability and Traceability Score: 0 1 2 Evidence: Review questions: - Can one successful and one failed run be reconstructed? - Are trace fields sufficient for incident review? - Are sensitive fields redacted? 8. Production Operation Score: 0 1 2 Evidence: Review questions: - Is there a runbook? - Are rollback and kill-switch paths documented? - Are owners and incident triggers named? 9. Framework Portability Score: 0 1 2 Evidence: Review questions: - Is product policy outside framework-only code? - Can the design be mapped to at least one alternate runtime? - Are framework-owned and application-owned responsibilities clear? 10. Reader Reuse Score: 0 1 2 Evidence: Review questions: - Can a reader reuse the ADR, trace, eval, runbook, or checklist shape? - Does the capstone teach a transferable decision rule? - Is the example specific enough to adapt without guessing? 11. Adaptation Plan What workflow replaces the example workflow? Which patterns will be kept? Which patterns will be removed because they do not earn their place? What is the highest-risk failure? Which blocking eval catches it? What artifact is missing next? [ ] ADR [ ] Trace example [ ] Eval fixture [ ] Runbook [ ] Rollback plan [ ] Approval flow [ ] Tool manifest [ ] State schema Total score: Interpretation: 0-9 = example sketch 10-14 = useful design note 15-17 = strong capstone 18-20 = production-grade teaching example Hard fails: [ ] High-risk tool can run without policy or approval. [ ] No replayable trace exists. [ ] No blocking eval exists for unsafe behavior. [ ] State or memory ownership is unclear. [ ] Rollback is not documented. Final decision: [ ] Accept as A++ capstone [ ] Accept with minor edits [ ] Needs revision [ ] Not ready Reviewer notes: