Hands-On Labs
The labs turn the reference chapters into a build path. Each lab uses code that already lives in this repository, so you can read the pattern, run the example, change one thing, and connect the result back to production design.
The labs are intentionally framework-agnostic. They move between TypeScript and Python, and across minimal custom runtimes, LangChain/LangGraph-style retrieval, AutoGen-style manager/worker examples, A2A protocol code, MCP-style tool boundaries, and framework-neutral tests. The point is not to teach one API. The point is to show the architecture that survives when the framework changes.
Use Lab Framework and Language Matrix before starting if you want to see which language, framework, and architectural boundary each lab emphasizes. Use Lab Production Readiness Checklist and the lab production readiness worksheet after each lab to identify what the demo still needs before production. Use From-Scratch Mini-Framework Track when you want to understand what agent frameworks package under the hood. Use Vertical Slice Examples after the labs, or whenever you want to see several patterns composed into one realistic task. Use Capstone Projects when you want product-shaped examples with ADRs, traces, evals, runbooks, rollback plans, and native framework slices.
Run these commands from the repository root before starting:
npm install
npm test
npm run typecheck
Some examples can run with deterministic fallbacks. Examples that call live models require a .env file with MISTRAL_API_KEY.
Lab Progression Map
Use this map to understand why the labs are ordered this way. The sequence starts with deterministic primitives, exposes the mini-framework underneath agent runtimes, compares native framework slices, and then moves into capstone-level release evidence.
Lab Standard
Each lab should leave you with three things: a runnable example, a specific design boundary you can explain, and one production hardening step you know how to make.
Every lab follows the same learning contract:
- State the objective.
- Name the language, framework, and source files.
- Run a baseline command.
- Inspect the code boundary.
- Change one thing.
- Verify the result.
- Identify what production would need next.
The examples stay small on purpose. A small example is useful only when the lab also says what is intentionally missing: durable state, policy enforcement, stronger schemas, approval, tracing, evals, deployment, or framework integration. When a native framework slice exists, treat it as the next comparison point, not as a replacement for the deterministic lab.
Planning Table
Use this table to choose a lab by effort and outcome. Time estimates assume you can already run the repository tests. Each lab page also includes optional per-exercise time blocks so you can split the work across shorter sessions.
Download the reusable worksheet: lab completion worksheet.
Download the production follow-up worksheet: lab production readiness worksheet.
For the high-leverage labs, use the focused worksheets for Lab 02 planning loops, Lab 03 Agentic RAG, Lab 06 observability and evals, Lab 07 runtime packaging, and Lab 12 state graphs.
Compare your finished worksheet with the completed lab evidence examples before treating the lab as review-ready.
Use the captured lab and capstone command output examples when you need a concrete model for saved command output, trace snapshots, eval snapshots, and release evidence.
| Lab | Time | Level | Prerequisite | Reusable Artifact |
|---|---|---|---|---|
| Lab 01 - Tool-Using Agent | 20-30 min | Beginner | TypeScript basics | Tool boundary and error behavior. |
| Lab 02 - Agent Loop and Planning | 35-55 min | Beginner | Lab 01 or equivalent tool boundary | Plan/execute split with structured stop-condition evidence. |
| Lab 03 - Agentic RAG | 45-75 min | Intermediate | Retrieval and Python basics | Evidence-grounded answer path plus missing-evidence eval fixture. |
| Lab 04 - A2A Communication | 45-60 min | Intermediate | JSON schema and HTTP/message concepts | Typed agent message envelope. |
| Lab 05 - Multi-Agent Supervisor | 45-60 min | Intermediate | Delegation and aggregation concepts | Supervisor/worker contract. |
| Lab 06 - Observability and Evals | 50-85 min | Intermediate | Any earlier lab | Trace contract, negative eval, and CI gate sketch. |
| Lab 07 - Mastra Runtime Packaging | 70-105 min | Advanced | TypeScript runtime packaging | Agent, tool, workflow, memory, eval, and rollback slice. |
| Lab 08 - CrewAI Flows and Crews | 60-90 min | Advanced | Python and role/task orchestration | Flow, crew, role, and acceptance contract. |
| Mini-Framework Track | 2-4 hr | Advanced | Labs 01, 02, and 06 | Runtime primitives you can compare to frameworks. |
| Lab 09 - Minimal Agent Loop | 45-75 min | Intermediate | Mini-framework setup | Loop state, observations, budgets, and stop reasons. |
| Lab 10 - Tool Registry and Policy Gate | 45-75 min | Intermediate | Lab 09 | Tool registry with policy decisions. |
| Lab 11 - Context, Memory, Trace, and Evals | 60-90 min | Advanced | Labs 09 and 10 | Reviewable runtime trace and trajectory eval. |
| Lab 12 - LangGraph State Graph | 70-105 min | Advanced | Graph/state concepts | Checkpointed state graph with interrupt, resume, and replay review. |
| Lab 13 - AutoGen Transcript Evals | 60-90 min | Advanced | Multi-agent basics | Transcript rubric and regression check. |
Completion Standard
A lab is complete when you can show four things:
- The baseline command runs.
- The expected output matches the lab’s success signal.
- One intentional failure path is visible and controlled.
- You can name the production gap before using the pattern with real users, data, credentials, or side effects.
Do not count a lab as finished just because the happy path works. The value comes from seeing the boundary: what the model can decide, what software must enforce, what gets traced, and what would block production.
Lab Evidence Pack
Save a small evidence pack after each lab. It turns the lab from a one-time exercise into material you can reuse in a design review, ADR, eval suite, or capstone.
| Evidence | What To Capture | Why It Matters |
|---|---|---|
| Baseline command | Command, exit status, and expected output signal. | Proves the example ran before you changed it. |
| Source boundary | Files inspected and the contract each file owns. | Shows where the pattern becomes code. |
| Small change | One input, rule, prompt, tool, schema, or policy change. | Proves you can modify behavior intentionally. |
| Failure path | Error, refusal, denial, timeout, budget stop, or invalid input. | Shows the boundary fails visibly instead of silently. |
| Trace or log | Minimal trace, transcript, or structured output. | Gives future evals something concrete to assert. |
| Production gap | Missing control and the next artifact needed. | Connects the lab to production architecture. |
Keep the pack short. One screen of evidence is better than a folder of unreviewed screenshots.
Use the completed lab evidence examples as calibration. A good evidence pack names the command, output, failure path, protected boundary, production gap, and next owner.
Use the captured command output examples to compare the shape of your saved terminal output. The important signal is not a screenshot. It is a short, reviewable record that shows command, success signal, trace or eval link, and production question.
Framework-Agnostic Rule
Frameworks change the API, not the architecture questions. For every lab, ask:
- What owns state?
- What can the model decide?
- What can software validate?
- What tools are exposed?
- What policy is enforced outside the prompt?
- What is traced?
- Why does the run stop?
Those questions apply whether the code uses LangGraph, LangChain, Mastra AI, AutoGen-style agents, CrewAI, MCP, A2A, or a small custom runtime.
End-To-End Reader Path
Use this path when you want to move from learning to implementation:
- Start with Lab Framework and Language Matrix and choose the highest-risk boundary.
- Run the matching deterministic lab.
- Read the production extension and readiness checklist.
- Compare the matching native example under
native-framework-examples/. - Map the same behavior to a capstone.
- Fill out the framework selection ADR and rollback worksheet.
- Add evals that fail the build before adding real side-effect tools.
For the support refund path, use Lab 07, native-framework-examples/mastra-refund/, the Support Refund Agent capstone, and the production readiness worksheet.
Lab Sequence
- Lab Framework and Language Matrix
- Lab Production Readiness Checklist
- Build a Tool-Using Agent
- Build an Agent Loop with Planning
- Build Agentic RAG
- Build A2A Agent Communication
- Build a Multi-Agent Supervisor
- Add Observability and Evals
- Package Agents, Tools, Workflows, Memory, and Evals
- Model Flows, Crews, Roles, and Task Contracts
- Study the From-Scratch Mini-Framework Track
- Build a Minimal Agent Loop
- Build a Tool Registry and Policy Gate
- Add Context, Memory, Trace, and Evals
- Model State Graphs, Checkpoints, and Interrupts
- Evaluate Multi-Agent Transcripts
- Study Vertical Slice Examples
How To Use These Labs
Read the objective first, then run the command exactly as shown. After that, inspect the named source files and make the small change in the lab. The goal is to see where the pattern becomes code: input contracts, state, tool boundaries, stop conditions, evaluation, and failure handling.
Each lab ends with a production extension. Treat that section as the bridge between a working demo and an architecture decision.
Expected Output Map
Use this table as the quick success check before you move to the production extension.
| Lab | Expected Output Signal |
|---|---|
| Lab 01 | Structured read_order and search_refund_policy results with trust labels and evidence references. |
| Lab 02 | Planning test OK plus a deterministic plan and result for the CLI path. |
| Lab 03 | Answer reflects retrieved evidence from the local document set. |
| Lab 04 | A2A test shows success, refusal, invalid-input error, and cancellation. |
| Lab 05 | Manager delegates bounded work and final aggregation uses worker outputs. |
| Lab 06 | Eval records expose success and negative cases, not only final text. |
| Lab 07 | Mastra-style runtime packaging tests OK; native slice exposes agent, tools, workflow, and eval gate. |
| Lab 08 | CrewAI-style flow and crew tests OK; Flow accepts only validated role outputs. |
| Lab 09 | Immediate answer stops with success; repeated tool proposals stop with budget_exhausted. |
| Lab 10 | Unknown tools are refused, write tools require approval, and allowed tools record observations. |
| Lab 11 | Trace contains context, decision, tool/policy, and stop events; unsafe trajectory eval fails. |
| Lab 12 | LangGraph-style state graph tests OK; resume preserves checkpointed state. |
| Lab 13 | AutoGen-style transcript tests OK; transcript proves role order, stop reason, and final owner. |
Recommended Order
Do the labs in order if you are new to agent systems. The sequence starts with one agent and one tool, then adds planning, retrieval, remote agent communication, multi-agent coordination, and production-quality evaluation.
If you already know the basics, start with the lab closest to your current system and use the related chapters as reference material.
After the labs, read the vertical slices to see how the same patterns compose into support, coding, and research workflows. Then read the Capstone Projects to see production-shaped systems with framework mappings, native slices, and release evidence.
If you are evaluating frameworks, do the mini-framework track before choosing a production runtime. Building the primitives once makes it easier to see which responsibilities a framework owns and which responsibilities remain in your application.