REFERENCE ARCHITECTURE REVIEW CHECKLIST System: Reviewer: Date: Release or design version: 1. Entry and Scope [ ] Entry source is named: chat, API, webhook, IDE, ticket, scheduled job, or event. [ ] User, tenant, or event identity is authenticated. [ ] Goal, non-goals, and authority level are explicit. [ ] Inputs are validated before reaching the model. Evidence: 2. Control Plane [ ] Router can choose direct answer, workflow, agent loop, RAG, or multi-agent path. [ ] State owner is named. [ ] Stop reasons are explicit. [ ] Budget, timeout, cancellation, and retry policy are defined. [ ] Model output remains a proposal until validated. Evidence: 3. Knowledge Plane [ ] Retrieval sources are access-controlled. [ ] Source freshness and trust are recorded. [ ] Context packet includes source, reason, trust, freshness, and budget. [ ] Missing or conflicting evidence has a defined outcome. Evidence: 4. Tool and Execution Plane [ ] Tool manifest lists name, schema, permission, timeout, side effect, and audit fields. [ ] Policy runs before tool execution. [ ] High-risk tools require approval or are forbidden. [ ] Side effects use idempotency keys or equivalent protection. Evidence: 5. Memory Plane [ ] Working memory and durable memory are separate. [ ] Memory writes have policy, provenance, retention, correction, and deletion rules. [ ] Sensitive data is redacted or excluded. Evidence: 6. Approval and Human Control [ ] Approval requests bind exact action, risk, policy version, expiry, and trace ID. [ ] Users or operators can stop, correct, escalate, or roll back risky behavior. [ ] Stale approvals cannot authorize changed actions. Evidence: 7. Evaluation Plane [ ] Evals cover happy paths, edge cases, unsafe paths, and regressions. [ ] Blocking thresholds are defined. [ ] Trajectory evals inspect tools, policy, evidence, and stop reason. [ ] Incidents become regression cases. Evidence: 8. Observability Plane [ ] Trace records goal, context, model decision, policy, tool calls, outputs, and stop reason. [ ] Trace redaction is tested. [ ] Operators can reconstruct one successful run and one failed run. [ ] Cost, latency, tool errors, eval drift, and approvals are measurable. Evidence: 9. Runtime and Release [ ] Deployment owner and on-call owner are named. [ ] Rollback can disable model, prompt, tool, workflow, or agent behavior. [ ] Kill switch is documented and tested. [ ] Runbook explains incident triggers and first actions. Evidence: 10. Final Decision [ ] Ready for internal prototype. [ ] Ready for controlled pilot. [ ] Ready for production candidate. [ ] Not ready. Blocking gaps: Next actions: