Lab 07 - Package Agents, Tools, Workflows, Memory, and Evals

Download the Lab 07 runtime packaging guided exercise worksheet, lab completion worksheet, and lab production readiness worksheet before you start.

Objective

Use a Mastra-style TypeScript runtime shape to package the same responsibilities you built from scratch: agent decision, tool execution, workflow control, memory, trace events, and evals.

What You Will Use

Language: TypeScript
Framework/runtime: Mastra-style runtime packaging
Framework-agnostic lesson: a framework can package primitives, but your application still owns state, policy, tool contracts, traces, and acceptance criteria.
Pattern chapters: Mastra Runtime, Building a Minimal Agent Runtime, Observability and Evals
Source files:
- mastra-runtime-pattern/typescript/src/runtime_packaging.ts
- mastra-runtime-pattern/typescript/src/run_demo.ts
- mastra-runtime-pattern/typescript/test/runtime_packaging.spec.ts

Exercise Time Budget

These estimates assume dependencies are already installed.

Exercise	Time	Output
Setup and baseline runtime run	10 min	Demo and test output.
Map runtime package boundaries	15 min	Owner notes for agent, tool, workflow, memory, trace, and eval.
Inspect tool order and forbidden tools	20 min	Tool-order trace plus forbidden-tool eval result.
Run rollback and native comparison	20-30 min	Disable path and mapping to the native Mastra-style slice.
Complete production mapping	10-20 min	Release notes for policy, trace export, eval gate, and upgrade risk.

Setup

From the repository root:

npm install

This lab is deterministic and does not require a model key. The point is runtime packaging, not model quality.

Run It

npm run mastra-runtime:demo
npm run mastra-runtime:test

Expected Result

The test command should print:

Mastra-style runtime packaging tests OK

The run should also prove these behavior signals:

policy is read before drafting;
the draft is created for review, not sent;
workflow steps and tool results are traced;
the eval fails if a direct send or refund-issue tool is used.

The demo command should include this evidence:

{
  "toolCalls": [
    { "name": "read_policy", "input": { "policyId": "refund-v1" } },
    { "name": "draft_response", "input": { "customerId": "cust_123" } }
  ],
  "result": "Policy checked and draft created for human review.",
  "evaluation": {
    "status": "pass"
  }
}

The exact JSON also includes memory and trace fields. Use the toolCalls, result, and evaluation.status fields as the quick check.

sequenceDiagram participant Workflow participant Agent participant PolicyTool as read_policy tool participant DraftTool as draft_response tool participant Trace participant Eval Workflow->>Trace: workflow:start Workflow->>Agent: Request refund draft decision Agent-->>Workflow: Use policy before drafting Workflow->>PolicyTool: read refund policy PolicyTool-->>Workflow: policy evidence Workflow->>DraftTool: create draft for review DraftTool-->>Workflow: draft result Workflow->>Trace: record tool order and result Workflow->>Eval: Check no send or refund issue tool was used Eval-->>Workflow: pass or fail

Use this flow as the lab’s acceptance model. A framework can package the runtime, but the application still owns tool order, side-effect policy, trace evidence, and release evals.

Native Mastra comparison point:

native-framework-examples/mastra-refund/
download: /downloads/native-mastra-refund.zip
agent: refundDraftAgent
workflow: refundDraftWorkflow
tools: refund_policy.retrieve, refunds.create_draft
eval gate: refund_draft_no_money_movement

Guided Exercises

Use these exercises to inspect what the runtime package owns and what product policy still owns.

Exercise	Time	What To Do	Evidence To Save
Runtime package map	15 min	Inspect `Agent`, `Tool`, `WorkflowStep`, `RuntimeState`, and `evaluateRuntime`.	One sentence for what each boundary owns.
Tool-order trace	10 min	Run `npm run mastra-runtime:demo` and inspect `toolCalls`.	`read_policy` before `draft_response`.
Forbidden-tool eval	10 min	Read the `refunds.issue_refund` negative case in `runtime_packaging.spec.ts`.	Eval status and failure reason.
Rollback exercise	10 min	Name the fastest way to disable the risky capability.	Feature flag, tool removal, workflow route disable, or deployment rollback.
Native comparison	20 min	Compare the deterministic lab with `native-framework-examples/mastra-refund/`.	Which parts map to agent, workflow, tools, trace, and eval.

flowchart TD A["User refund request"] --> B["Runtime package"] B --> C["Agent decision"] C --> D{"Tool requested?"} D -->|"read policy"| E["Read-only policy tool"] D -->|"draft response"| F["Draft-only tool"] D -->|"send or issue refund"| G["Forbidden side effect"] E --> H["Memory and trace updated"] F --> H G --> I["Eval fails release"] H --> J{"Release criteria met?"} J -->|"yes"| K["Draft ready for human review"] J -->|"no"| I

Deployment And Rollback Exercise

Treat the lab as a production candidate. Before a real runtime ships, write the rollback action that disables the highest-risk capability without deleting the whole service.

Capability	Rollback Question	Acceptable Evidence
`read_policy`	Can the system fall back to a static policy snapshot?	Config flag, versioned policy ID, or cached source.
`draft_response`	Can draft creation be disabled while policy lookup stays available?	Tool manifest change or workflow route flag.
`send_message`	Can direct sending remain impossible unless explicitly approved?	Forbidden-tool eval and approval gate.
`refunds.issue_refund`	Can money movement stay outside this runtime?	Missing tool registration plus failing eval if it appears.

Inspect The Code

Open mastra-runtime-pattern/typescript/src/runtime_packaging.ts and find these boundaries:

Agent: owns instructions and proposes the next decision.
Tool: owns typed capability execution.
WorkflowStep: owns deterministic runtime sequence.
RuntimeState: owns memory, traces, tool calls, and result.
evaluateRuntime: checks trajectory, not only final text.

The local code is intentionally small, but the mapping follows the production shape: agents, workflows, tools, memory, traces, and evals belong in one runtime package.

Change One Thing

Add a new tool named send_message with a direct side effect, then update evaluateRuntime so the eval fails if that tool appears in toolCalls. The repository test also checks the refund side-effect case:

{ name: "refunds.issue_refund", input: { amount: 42 } }

Expected lesson: framework tool registration is not permission. Application policy still decides what is safe.

Verify

Run:

npm run mastra-runtime:test

Compare the output with the expected result above before moving to the production extension.

Lab Review Gate

Before moving on, verify the runtime package:

Check	Evidence
Runtime responsibilities are visible	Agent, tool, workflow, memory, trace, and eval types are separate in code.
Tool order is constrained	Policy lookup happens before draft creation.
Side effects are blocked	The eval fails if a direct send or refund issue tool appears.
Trace explains behavior	The run records workflow, agent decision, and tool events.
Framework ownership is limited	Product policy and release criteria remain outside framework defaults.

Record the command output, trace events, tool order, and eval result in the lab completion worksheet.

Production Extension

Before a real Mastra implementation ships, add:

real tool schemas and input validation;
workflow retry and compensation policy;
memory retention, redaction, and deletion controls;
scorer/eval fixtures tied to release gates;
trace export to the observability backend;
approval gates for write tools and external communication.

Production Bridge

Use this table when adapting the lab to a real Mastra runtime:

Lab Concept	Production Version
`Agent` type	Agent route with versioned instructions, allowed tools, and stop reasons.
`Tool` type	Typed capability with permission, side-effect class, timeout, and idempotency.
`WorkflowStep`	Durable workflow step with retry, compensation, checkpoint, and approval support.
`RuntimeState`	Persisted run state with tenant, actor, trace ID, budget, and memory policy.
`evaluateRuntime`	Release gate tied to known incidents, policy rules, and forbidden trajectories.

The first production milestone is a runtime slice that can be disabled, replayed, and evaluated without reading framework internals.

Native Framework Extension

After the deterministic lab passes, port one vertical slice into a real Mastra project. Use Real Framework Setup Notes for current setup commands and compare your work with the repository example at native-framework-examples/mastra-refund/.

Native porting steps:

scaffold a Mastra project;
create one agent for the refund draft decision;
create typed tools for policy lookup and draft creation;
create a workflow that calls policy lookup before draft creation;
add an eval that fails if a send or refund-issue tool is called;
export a trace with workflow, agent, tool, policy, and eval events;
document rollback for disabling the draft tool.

Keep these assets outside framework-only code:

Asset	Why
tool manifest	product authority should be reviewable without reading framework internals
policy rules	policy must run before tool execution
eval fixtures	release gates should survive framework changes
trace schema	operations need stable fields across runtimes
ADR	the framework choice needs a recorded owner and rollback path

Completion standard: the native project proves the same behavior as this lab and links to the Support Refund Agent capstone. A native Mastra run is not complete just because an agent responds; the workflow must preserve tool order, side-effect policy, trace evidence, and eval gates.

Troubleshooting

Symptom	Likely Cause	Fix
`mastra dev` cannot find provider credentials	`.env` is missing or provider variables are unset	Copy `.env.example`, set provider keys, and restart the dev server.
workflow drafts before policy lookup	workflow step order is wrong	Keep policy retrieval as the first workflow step and make the draft step consume the policy output.
eval passes after adding `refunds.issue_refund`	forbidden tool list is incomplete	Add the tool name to the release eval and rerun the validation.
trace explains final text but not tool order	trace is only model-level	Emit workflow, tool, policy, and eval events separately.

Cross-Framework Mapping

In LangGraph, this becomes graph state, nodes, tools, checkpoints, and evals around graph runs.
In Mastra AI, these concerns are packaged as agents, workflows, tools, memory, observability, and evals.
In AutoGen-style systems, the runtime package is split across manager, workers, functions, messages, and transcript evals.
In CrewAI, the equivalent packaging splits between flow state, crew tasks, agent roles, tools, and flow-level evaluation.