Lab Production Readiness Checklist

The labs are teaching implementations. This checklist defines what must be added before a lab pattern becomes production work.

Use this after completing a lab. The goal is to identify the next engineering boundary: persistence, authorization, retries, idempotency, observability, eval gates, deployment, and rollback. For the full production path, continue with Deployment Walkthrough, Templates and Worksheets, and the 10/10 Production Gate.

Download the reusable production review artifact: 10/10 production gate scorecard.

Download the lab-specific worksheet: lab production readiness worksheet.

Universal Production Gate

Every lab needs these controls before real users, real data, or real side effects:

Gate	Required Evidence
State ownership	State schema, owner, persistence strategy, migration plan, and replay story.
Auth and permissions	Actor identity, tenant/resource scope, tool permissions, approval rules, and audit records.
Tool safety	Input schemas, side-effect class, idempotency key, timeout, retry policy, and error contract.
Memory safety	Retention, deletion, redaction, consent, correction path, and write policy.
Observability	Trace IDs, model/tool events, policy decisions, costs, latency, stop reasons, and redaction.
Eval gates	Golden tasks, negative cases, trajectory checks, safety checks, and release-blocking thresholds.
Deployment	Runtime owner, environment config, secrets, scaling limits, rollback, kill switch, and incident path.
Human control	Approval UI/API, escalation rules, cancellation, reviewer identity, and expiry.

sequenceDiagram participant Lab participant Reviewer participant Gate as Production gate participant Runtime participant Operators Lab->>Reviewer: Baseline command, output, source files Reviewer->>Gate: Check state, auth, tools, memory, traces, evals alt Gate incomplete Gate-->>Reviewer: Keep as lab evidence else Gate complete Reviewer->>Runtime: Name added authority and rollout stage alt No real users, data, or side effects Runtime-->>Reviewer: Prototype with synthetic data else Real authority involved Runtime->>Gate: ADR, policy boundary, trace contract, eval fixtures Gate->>Operators: Dashboard, incident path, rollback, kill switch Operators-->>Reviewer: Can disable, replay, and explain end end

Use this flow before changing access, data, or authority. A lab can move forward only when the next stage has reviewable evidence, not because the example command ran once.

Lab Review Questions

Use these questions before a lab-derived system becomes a product slice:

Question	Required Evidence
What did the lab prove?	Baseline command, expected output, source files, and success signal.
What did the lab not prove?	Missing persistence, policy, observability, evals, deployment, rollback, or human control.
What authority will production add?	Read data, write tools, memory, messages, approvals, money movement, or workflow execution.
What must fail closed?	Missing config, missing policy context, unsafe tool input, stale evidence, and failed evals.
What is the next production artifact?	ADR, trace contract, eval fixture, runbook, checklist, dashboard, or rollback plan.

The lab is a learning artifact until these answers exist in reviewable engineering documents.

Promotion Ladder

Use this ladder to decide what the lab output is ready for.

Stage	Allowed Use	Required Evidence	Stop Condition
Lab evidence	Learning, comparison, and design discussion.	Baseline command, success signal, inspected source files, and one known failure path.	Do not connect real users, private data, credentials, or side effects.
Prototype	Internal demo with synthetic data.	Lab evidence plus basic schemas, deterministic test, and named production gaps.	Stop if the demo requires real credentials or broad tool access.
Product slice	Limited internal workflow with controlled data.	ADR, policy boundary, trace contract, eval fixtures, owner, and rollback note.	Stop if policy, trace, or rollback is missing.
Pilot	Small real-user or real-data rollout.	Production gate scorecard, approval rules, incident path, dashboard, and release gate.	Stop or roll back on failed blocking evals, missing trace spans, or unsafe side effects.
Production candidate	Controlled production release.	Deployment walkthrough evidence, runbook, kill switch, rollback path, and current eval report.	Stop if operators cannot disable, replay, or explain the run.

Do not skip stages because a framework example runs successfully. A passing command proves execution; it does not prove authority, observability, recovery, or release safety.

Per-Lab Readiness Matrix

Lab	Production Additions
Lab 01 - Tool-Using Agent	Tool schemas, side-effect labels, permission checks, idempotency, timeout/retry, and audit records.
Lab 02 - Agent Loop and Planning	Durable state, plan versioning, step retry policy, cancellation, partial failure handling, and plan evals.
Lab 03 - Agentic RAG	Source ACLs, freshness, citation checks, retrieval evals, prompt-injection filtering, and evidence retention rules.
Lab 04 - A2A Communication	Agent identity, signed envelopes, correlation IDs, cancellation semantics, schema versioning, and replay logs.
Lab 05 - Multi-Agent Supervisor	Worker contracts, merge policy, per-worker traces, disagreement handling, final acceptance owner, and cost caps.
Lab 06 - Observability and Evals	Trace storage, redaction, eval datasets, release thresholds, incident-to-eval workflow, and dashboard ownership.
Lab 07 - Mastra Runtime Packaging	Deployment config, tool/memory policy, workflow retries, eval integration, trace export, and framework upgrade plan.
Lab 08 - CrewAI Flows and Crews	Flow checkpoints, role permissions, task schemas, crew output validators, human escalation, and flow acceptance evals.
Lab 09 - Minimal Agent Loop	Decision validation, tool registry, stop policy, state persistence, timeout/cancellation, and trace events.
Lab 10 - Tool Registry and Policy Gate	Policy context, approval records, schema validation, side-effect isolation, idempotency, and denial analytics.
Lab 11 - Context, Memory, Trace, and Evals	Memory governance, trace redaction, eval fixture versioning, context audit, and incident replay.
Lab 12 - LangGraph State Graph	Durable checkpointer, thread IDs, interrupt payloads, node idempotency, state migrations, and resume tests.
Lab 13 - AutoGen Transcript Evals	Message schemas, team termination, transcript redaction, role permissions, transcript replay, and team-level eval gates.

Framework-Specific Deployment Questions

Framework Shape	Production Questions
LangGraph-style	Where is the checkpointer stored? How are thread IDs assigned? Which nodes can cause side effects? What state migrations are supported?
AutoGen-style	Who owns the transcript? Which messages are durable? How does termination work? How are agent tools permissioned?
Mastra-style	Which runtime features are framework-owned? How are workflows deployed? How are tools, memory, traces, and evals exported?
CrewAI-style	What does the flow own? What does the crew own? How are role outputs validated? How are crew failures escalated?
Mini-runtime	Which production controls are you willing to build and operate yourself? Which should move into an existing workflow platform?

Release Gate Template

Before shipping a lab-derived system, require:

state: persisted or explicitly stateless
policy: enforced before side effects
tools: typed, scoped, idempotent, timed out
memory: governed by retention/deletion rules
trace: redacted and replayable
evals: include happy path, negative path, and trajectory checks
deployment: rollback and kill switch documented
owner: team and escalation path assigned

If any line is unknown, the system is still a demo.