Capstone Projects

The capstones show how the patterns combine into product-shaped systems. Each capstone starts from a concrete workflow, chooses the right agentic boundaries, maps the design across frameworks, and defines the production evidence required before release.

Use these chapters after the labs. The labs isolate one pattern at a time. The capstones combine patterns into systems with state, tools, policy, memory, observability, evals, deployment, rollback, ownership, and selected native framework slices.

Do not read the capstones as large tutorials. Read them as design review packets. Each one shows what evidence a team should gather before turning an agent pattern into product behavior.

Capstone Set

Capstone	Primary Goal	Main Patterns	Framework Lens
Support Refund Agent	Draft policy-safe refund recommendations.	Tool use, policy enforcement, approval gates, observability, evals.	Mastra runtime, LangGraph workflow, mini-runtime, native Mastra and LangGraph slices.
Research RAG Agent	Answer from approved sources with citations and memory rules.	Context engineering, semantic recall, knowledge-bound agents, memory, evals.	LangGraph graph, direct Python/TypeScript, Mastra runtime, native LangGraph slice.
Multi-Agent Delivery Workflow	Coordinate specialist agents while preserving one accountable owner.	Supervisor/worker, CrewAI flows, AutoGen transcripts, durable workflows.	CrewAI, AutoGen, LangGraph, Mastra, native CrewAI and AutoGen slices.

Choose A Capstone

Use the capstone that matches your highest-risk boundary.

If Your System Needs…	Start With	Main Risk To Inspect
A model to draft an action involving money, policy, or customer data.	Support Refund Agent	The model must never own final authority for the side effect.
Answers grounded in private or approved knowledge sources.	Research RAG Agent	The system must refuse or escalate when evidence is missing, stale, or unauthorized.
Several specialist agents or roles contributing to one deliverable.	Multi-Agent Delivery Workflow	Coordination must not erase blockers, minority concerns, failed tasks, or final ownership.

If more than one applies, read the capstones in this order: side-effect authority, evidence grounding, then multi-agent coordination. Side effects and private data usually deserve review before topology choices.

How To Reuse A Capstone

Turn a capstone into your own design review in five steps:

Replace the example workflow with your real workflow.
Keep the same evidence headings: state, tools, policy, memory, trace, eval, ADR, runbook, rollback.
Delete any pattern that does not earn its place in your workflow.
Add one blocking eval for the highest-risk failure.
Record gaps in the capstone review scorecard and production readiness worksheet.

The reusable value is the review shape, not the domain. A refund system, research assistant, and delivery workflow all need the same proof: bounded authority, replayable state, traceable decisions, eval gates, and rollback.

flowchart TB A[Pick matching capstone] --> B[Replace example workflow] B --> C[Keep evidence headings] C --> D[Remove unearned patterns] D --> E[Add highest-risk blocking eval] E --> F[Fill scorecard and readiness worksheet] F --> G[Compare gaps before release]

Use this flow as the capstone reuse contract. The output is not a copied implementation; it is a reviewed system packet with gaps, evidence, and release blockers made explicit.

Run The Capstones

The capstones include deterministic TypeScript assets so readers can inspect state, traces, evals, and rollback behavior without model provider keys.

npm run capstones:demo
npm run capstones:test

Expected demo output:

support-refund-agent: pass
  stop: draft_ready
  trace events: 7
research-rag-agent: pass
  stop: answered_with_citation
  trace events: 6
multi-agent-delivery-workflow: pass
  stop: accepted_after_review
  trace events: 4

Source:

capstone-projects-runtime/typescript/src/capstones.ts
capstone-projects-runtime/typescript/test/capstones.spec.ts

After running the commands, compare the output with each capstone’s trace and eval sections. The goal is to connect runtime behavior to the written design evidence.

Runtime Evidence Map

Use this map to inspect the code path before reading the detailed capstone chapters.

Capstone	Safe Stop	Release Evals	Rollback Path
Support Refund Agent	`draft_ready`	`draft_contains_policy_citation`, `no_money_movement`, `safe_stop_reason`	Disable `refunds.create_draft`; route the ticket to a human support queue.
Research RAG Agent	`answered_with_citation`	`current_source_used`, `stale_source_rejected`, `forbidden_source_omitted`, `citation_faithfulness`	Disable answer synthesis; return the ranked source list only.
Multi-Agent Delivery Workflow	`accepted_after_review`	`planner_present`, `risk_review_present`, `test_plan_present`, `turns_sequential`, `final_owner_accepts_last`	Disable delegation; route the request to a single-owner delivery checklist.

These are not toy assertions. Each eval protects a production boundary:

The refund capstone proves the agent can draft a recommendation while policy blocks money movement.
The RAG capstone proves the context packet uses the current approved source and omits stale or forbidden sources.
The delivery capstone proves specialist agents can contribute without removing final workflow ownership.

What Each Capstone Proves

Each capstone includes:

problem and non-goals;
pattern composition;
system architecture;
data and state model;
tool, policy, memory, and approval boundaries;
native framework mapping;
native framework example path where one exists;
trace example;
eval report example;
ADR example;
runbook example;
release and rollback checklist.

The repeated structure matters. It gives readers a reusable design review shape: if a future project cannot fill these sections, it is not ready for production.

Capstone Completion Standard

A capstone is complete only when it can answer these questions:

Download the reusable review artifact: capstone review scorecard.

Download the production follow-up worksheet: production readiness worksheet.

Question	Required Evidence
What owns state?	State schema, checkpoint plan, migration note.
What owns authority?	Tool manifest, policy decision, approval rule.
What proves quality?	Eval cases, thresholds, failure examples.
What proves observability?	Trace event sequence and required fields.
What proves production readiness?	Deployment notes, runbook, rollback path.
What proves portability?	Framework mapping and assets kept outside framework-only code.

Do not treat a capstone as a larger lab. Treat it as a small production design review.

Capstone Review Gate

Use this gate before treating any capstone as A++ material:

Check	Evidence
The workflow is concrete	One user-visible or operator-visible workflow starts the system.
The authority boundary is explicit	Tools, data, memory, approvals, and side effects have named owners.
The unsafe path is blocked	At least one blocking eval catches the highest-risk failure.
The run is replayable	State, trace, versions, and eval result can reconstruct success and failure.
The reader can reuse the shape	ADR, runbook, trace, eval, checklist, or rollback artifacts are provided.

Record the score, blocking gaps, and next production artifact in the capstone review scorecard.

A++ Capstone Rubric

Score each area from 0 to 2.

Area	A++ Evidence
Problem and scope	Concrete workflow, explicit non-goals, clear authority level.
Pattern composition	Every loop, tool, memory, agent, and approval boundary has a reason to exist.
Architecture boundary	Model judgment is separated from deterministic control, policy, state, tools, and approval.
Tool and policy control	Tool contracts, permissions, timeouts, audit fields, and high-risk denial or approval paths are documented.
State, memory, and context	Run state, memory rules, context sources, trust, freshness, and budget are inspectable.
Evaluation evidence	Happy paths, edge cases, unsafe paths, and regressions have thresholds that can block release.
Observability and traceability	A successful run and failed run can be reconstructed from trace fields.
Production operation	Runbook, incident triggers, rollback, kill switch, and owners are named.
Framework portability	Framework-owned and application-owned responsibilities are clear.
Reader reuse	The chapter leaves the reader with reusable ADR, trace, eval, runbook, or checklist shapes.

Interpret the score this way:

0-9: example sketch
10-14: useful design note
15-17: strong capstone
18-20: production-grade teaching example

A capstone cannot score A++ if a high-risk tool can run without policy or approval, if no replayable trace exists, if no blocking eval exists for unsafe behavior, if state or memory ownership is unclear, or if rollback is not documented.