Vertical Slice Examples
Patterns become useful when they are composed into a working system. A vertical slice is a small end-to-end design that shows the goal, agent loop, tools, state, policy, observability, evals, and runtime behavior together.
Download the lab completion worksheet and lab production readiness worksheet when turning a slice into an implementation plan.
The examples in this chapter are not full products. They are slices. Each one should be small enough to review in one sitting and concrete enough to expose architecture decisions.
Read this after Pattern Composition Playbook, Production Runtime Overview, and Observability and Evals. Those chapters explain the boundaries; this chapter shows what those boundaries look like when several patterns work together for one task.
Run the deterministic capstone runtime when you want executable evidence for the same shapes:
npm run capstones:demo
npm run capstones:test
Expected output:
support-refund-agent: pass
stop: draft_ready
trace events: 7
research-rag-agent: pass
stop: answered_with_citation
trace events: 6
multi-agent-delivery-workflow: pass
stop: accepted_after_review
trace events: 4
Capstone project tests OK
The code lives in capstone-projects-runtime/typescript/src/capstones.ts. Treat it as the runnable evidence layer for these slices: each run returns state, trace events, eval results, and rollback actions.
Download the captured lab and capstone command output examples when you need a compact model for saving capstone terminal output, trace snapshots, eval snapshots, and production questions.
How To Read A Slice
Use the same checklist for every example:
- What user or system goal starts the run?
- What patterns are composed?
- What state must survive between steps?
- What tools can the agent call, and under which scopes?
- What requires approval?
- What trace events prove what happened?
- What evals catch regressions?
- What failure mode would make this unsafe in production?
If a slice cannot answer those questions, it is still a demo.
Slice Review Gate
Use this gate before turning any slice into a capstone or product backlog:
| Check | Evidence |
|---|---|
| Goal is bounded | One user or system goal starts the run. |
| Pattern composition is explicit | Each major concern maps to a named pattern chapter. |
| Authority is constrained | Tools, data, memory, approvals, and side effects have owners and scopes. |
| State is recoverable | The slice names what must persist, replay, resume, or be deleted. |
| Evals protect the risky path | Regression cases cover the failure mode that would make the slice unsafe. |
Record the goal, composed patterns, state, tools, approval points, trace events, and evals in the lab production readiness worksheet.
Slice 1: Support Refund Assistant
Goal
A support agent helps a human support operator handle refund requests. It reads the order, retrieves the active refund policy, drafts a recommendation, and prepares a refund action. It does not issue the refund without approval.
Pattern Composition
| Concern | Pattern |
|---|---|
| Agent loop | Agent Loop |
| Context | Context Engineering |
| Evidence | Semantic Recall and RAG |
| Tools | Tool Capability Design |
| Approval | Human Approval Gates |
| Runtime | Production Runtime Overview |
| Security | Agent Security and Sandboxing |
| Evals | Observability and Evals |
Runtime Flow
Security Controls
- The agent receives
orders:read,refunds:draft, andpolicies:read. - The
refunds.issuetool requires human approval and an idempotency key. - The refund policy is retrieved from an approved source with a policy version.
- Customer payment tokens never enter the prompt.
- External email is a separate tool with its own approval rule.
Trace And Eval
Every run should record order lookup, policy retrieval, recommendation draft, tool authorization, approval state, refund side-effect ID, and stop reason.
Good eval cases:
- refund allowed by policy and approved;
- refund denied by policy;
- missing order;
- stale policy retrieved;
- model attempts
refunds.issuewithout approval; - duplicate approval message replayed.
Runnable evidence:
| Signal | Repository Evidence |
|---|---|
| Safe stop | stopReason: draft_ready |
| Policy citation | draft_contains_policy_citation eval passes. |
| Money does not move | no_money_movement eval passes and trace records agent_cannot_issue_refund. |
| Rollback | Disable refunds.create_draft or route to the human support queue. |
Minimal Code
type RefundDecision =
| { action: "draft_refund"; orderId: string; amountCents: number; policyVersion: string }
| { action: "deny_refund"; orderId: string; reason: string; policyVersion: string }
| { action: "needs_human_review"; orderId: string; reason: string };
function requiresApproval(decision: RefundDecision): boolean {
return decision.action === "draft_refund" && decision.amountCents > 0;
}
The code is intentionally small. The important part is the boundary: a model can propose a refund decision, but the runtime still checks policy, approval, and idempotency before money moves.
Failure Modes
- The model treats an old refund policy as current.
- The tool call issues a refund before approval.
- The trace records the final answer but not the policy version.
- A retry issues the same refund twice.
- The agent sends the customer message before the operator reviews it.
Slice 2: Safe Coding Agent
Goal
A coding agent makes a small repository change, runs tests, shows the diff, and asks for approval before committing or opening a pull request.
Pattern Composition
| Concern | Pattern |
|---|---|
| Loop and planning | Planning and Execution |
| Harness | Agent Harnesses |
| Workspace | Coding Agents |
| Sandbox | Agent Security and Sandboxing |
| Evaluation | Evaluation-Driven Agent Development |
| Recovery | Circuit Breakers, Fallbacks, and Replay |
Runtime Flow
Security Controls
- The agent works in a scoped workspace or branch.
- Shell commands run with timeouts and no ambient production secrets.
- File edits stay within the repository root.
- Network access is disabled unless the task needs dependency or documentation lookup.
- Commit, push, deploy, and destructive commands require explicit approval.
Trace And Eval
Every run should record files inspected, commands run, tests passed or failed, diff summary, approval request, and final state.
Good eval cases:
- correct single-file change with passing tests;
- failing test stops the run;
- command timeout is handled;
- attempted edit outside workspace is denied;
- generated change touches unrelated files;
- commit requested before diff review.
Minimal Code
type CommandPolicy = {
allowedPrefixes: string[];
timeoutMs: number;
network: "blocked" | "allowlisted";
};
function canRunCommand(command: string, policy: CommandPolicy): boolean {
return policy.allowedPrefixes.some(prefix => command.startsWith(prefix));
}
The model should not decide that a command is safe because it looks familiar. The harness should check the command against the current task, workspace, and approval policy.
Failure Modes
- The agent edits generated files instead of source files.
- A test failure is summarized as success.
- The sandbox exposes secrets through environment variables.
- The agent commits unrelated user changes.
- The final answer hides a failed command or skipped check.
Slice 3: Research To Brief Agent
Goal
A research agent gathers evidence, produces a short technical brief, cites sources, and stores only durable facts that pass a memory policy.
Pattern Composition
| Concern | Pattern |
|---|---|
| Retrieval | Semantic Recall and RAG |
| Context control | Context Budgets and Working Sets |
| Memory | Working Memory |
| Output shape | Structured Output |
| Evals | Production Evaluation Feedback Loops |
| UX | Agent UX and Human Trust |
Runtime Flow
Security Controls
- Retrieved documents are data, not instructions.
- The agent separates source evidence from system instructions.
- Memory writes require source, confidence, retention class, and correction path.
- Private or licensed content is not copied into long-term memory by default.
- The brief says when evidence is missing, stale, or conflicting.
Trace And Eval
Every run should record query, source set, evidence packet, omitted sources, citation checks, memory decisions, and final answer shape.
Good eval cases:
- answer requires a current source;
- sources conflict;
- retrieval returns irrelevant documents;
- citation does not support the claim;
- model tries to store an unsupported memory;
- brief should refuse because evidence is missing.
Runnable evidence:
| Signal | Repository Evidence |
|---|---|
| Safe stop | stopReason: answered_with_citation |
| Current source | current_source_used eval passes for refund-policy-v4. |
| Stale source rejected | stale_source_rejected eval passes for refund-policy-v2. |
| Forbidden source omitted | forbidden_source_omitted eval passes for finance-private-notes. |
Minimal Code
type MemoryCandidate = {
claim: string;
sourceIds: string[];
confidence: "low" | "medium" | "high";
retention: "task_only" | "project" | "user";
};
function canWriteMemory(candidate: MemoryCandidate): boolean {
return (
candidate.retention !== "user" &&
candidate.confidence === "high" &&
candidate.sourceIds.length > 0
);
}
The default should be task-local memory. Durable memory is a controlled write, not a side effect of reading.
Failure Modes
- Retrieved content changes the agent’s instructions.
- The brief cites a source that does not support the claim.
- Stale evidence is presented as current.
- The agent stores a user preference from one temporary task.
- The trace cannot explain why a source was included or omitted.
Comparison
| Slice | Main risk | Primary control | Best regression eval | Runnable Stop Signal |
|---|---|---|---|---|
| Support refund assistant | Money moves without authority. | Approval-bound tool execution. | Refund tool cannot execute without policy and approval trace. | draft_ready |
| Safe coding agent | The agent changes more than it should. | Workspace, diff, tests, and approval. | Unrelated file edits or failed checks block completion. | Not included in capstone runtime yet. |
| Research to brief agent | Unsupported claims look authoritative. | Evidence packets and citation checks. | Claims must be supported by cited source IDs. | answered_with_citation |
| Multi-agent delivery workflow | Delegation hides accountability. | Workflow-owned merge and final acceptance. | Required role outputs and sequential turns must pass before acceptance. | accepted_after_review |
Slice 4: Multi-Agent Delivery Workflow
Goal
A workflow owner coordinates planner, risk reviewer, and test planner roles. The workflow accepts the package only after every role contributes in order.
Pattern Composition
| Concern | Pattern |
|---|---|
| Delegation | Task Delegation |
| Supervisor | Supervisor / Worker |
| Transcript | Evaluate Multi-Agent Transcripts |
| Flow control | CrewAI Flows and Crews |
| Evals | Observability and Evals |
Runnable Evidence
| Signal | Repository Evidence |
|---|---|
| Planner present | planner_present eval passes. |
| Risk review present | risk_review_present eval passes. |
| Test plan present | test_plan_present eval passes. |
| Turn order valid | turns_sequential eval passes. |
| Final owner accepts last | final_owner_accepts_last eval passes and finalOwner is workflow. |
Failure Modes
- A specialist role is skipped but the final answer still sounds complete.
- Acceptance happens before risk review or test planning.
- Turn order is broken, making the trace hard to replay.
- No single owner accepts the final package.
- Delegation cannot be disabled during an incident.
Design Rule
A vertical slice should prove composition. It should show how the loop, tools, state, memory, security, runtime, observability, and evals work together for one real task.
Small examples are fine. Isolated examples are not enough.