Vertical Slice Examples

Patterns become useful when they are composed into a working system. A vertical slice is a small end-to-end design that shows the goal, agent loop, tools, state, policy, observability, evals, and runtime behavior together.

Download the lab completion worksheet and lab production readiness worksheet when turning a slice into an implementation plan.

The examples in this chapter are not full products. They are slices. Each one should be small enough to review in one sitting and concrete enough to expose architecture decisions.

Read this after Pattern Composition Playbook, Production Runtime Overview, and Observability and Evals. Those chapters explain the boundaries; this chapter shows what those boundaries look like when several patterns work together for one task.

Run the deterministic capstone runtime when you want executable evidence for the same shapes:

npm run capstones:demo
npm run capstones:test

Expected output:

support-refund-agent: pass
  stop: draft_ready
  trace events: 7
research-rag-agent: pass
  stop: answered_with_citation
  trace events: 6
multi-agent-delivery-workflow: pass
  stop: accepted_after_review
  trace events: 4
Capstone project tests OK

The code lives in capstone-projects-runtime/typescript/src/capstones.ts. Treat it as the runnable evidence layer for these slices: each run returns state, trace events, eval results, and rollback actions.

Download the captured lab and capstone command output examples when you need a compact model for saving capstone terminal output, trace snapshots, eval snapshots, and production questions.

How To Read A Slice

Use the same checklist for every example:

What user or system goal starts the run?
What patterns are composed?
What state must survive between steps?
What tools can the agent call, and under which scopes?
What requires approval?
What trace events prove what happened?
What evals catch regressions?
What failure mode would make this unsafe in production?

If a slice cannot answer those questions, it is still a demo.

Slice Review Gate

Use this gate before turning any slice into a capstone or product backlog:

Check	Evidence
Goal is bounded	One user or system goal starts the run.
Pattern composition is explicit	Each major concern maps to a named pattern chapter.
Authority is constrained	Tools, data, memory, approvals, and side effects have owners and scopes.
State is recoverable	The slice names what must persist, replay, resume, or be deleted.
Evals protect the risky path	Regression cases cover the failure mode that would make the slice unsafe.

Record the goal, composed patterns, state, tools, approval points, trace events, and evals in the lab production readiness worksheet.

Slice 1: Support Refund Assistant

Goal

A support agent helps a human support operator handle refund requests. It reads the order, retrieves the active refund policy, drafts a recommendation, and prepares a refund action. It does not issue the refund without approval.

Pattern Composition

Concern	Pattern
Agent loop	Agent Loop
Context	Context Engineering
Evidence	Semantic Recall and RAG
Tools	Tool Capability Design
Approval	Human Approval Gates
Runtime	Production Runtime Overview
Security	Agent Security and Sandboxing
Evals	Observability and Evals

Runtime Flow

flowchart TD A["Refund request"] --> B["Load order and customer context"] B --> C["Retrieve active refund policy"] C --> D["Draft recommendation"] D --> E["Authorize draft refund tool"] E --> F{"Requires approval?"} F -->|yes| G["Pause for human approval"] G --> H["Issue refund or reject action"] F -->|no| I["Return draft only"] H --> J["Trace and eval record"] I --> J

Security Controls

The agent receives orders:read, refunds:draft, and policies:read.
The refunds.issue tool requires human approval and an idempotency key.
The refund policy is retrieved from an approved source with a policy version.
Customer payment tokens never enter the prompt.
External email is a separate tool with its own approval rule.

Trace And Eval

Every run should record order lookup, policy retrieval, recommendation draft, tool authorization, approval state, refund side-effect ID, and stop reason.

Good eval cases:

refund allowed by policy and approved;
refund denied by policy;
missing order;
stale policy retrieved;
model attempts refunds.issue without approval;
duplicate approval message replayed.

Runnable evidence:

Signal	Repository Evidence
Safe stop	`stopReason: draft_ready`
Policy citation	`draft_contains_policy_citation` eval passes.
Money does not move	`no_money_movement` eval passes and trace records `agent_cannot_issue_refund`.
Rollback	Disable `refunds.create_draft` or route to the human support queue.

Minimal Code

type RefundDecision =
  | { action: "draft_refund"; orderId: string; amountCents: number; policyVersion: string }
  | { action: "deny_refund"; orderId: string; reason: string; policyVersion: string }
  | { action: "needs_human_review"; orderId: string; reason: string };

function requiresApproval(decision: RefundDecision): boolean {
  return decision.action === "draft_refund" && decision.amountCents > 0;
}

The code is intentionally small. The important part is the boundary: a model can propose a refund decision, but the runtime still checks policy, approval, and idempotency before money moves.

Failure Modes

The model treats an old refund policy as current.
The tool call issues a refund before approval.
The trace records the final answer but not the policy version.
A retry issues the same refund twice.
The agent sends the customer message before the operator reviews it.

Slice 2: Safe Coding Agent

Goal

A coding agent makes a small repository change, runs tests, shows the diff, and asks for approval before committing or opening a pull request.

Pattern Composition

Concern	Pattern
Loop and planning	Planning and Execution
Harness	Agent Harnesses
Workspace	Coding Agents
Sandbox	Agent Security and Sandboxing
Evaluation	Evaluation-Driven Agent Development
Recovery	Circuit Breakers, Fallbacks, and Replay

Runtime Flow

flowchart TD A["User change request"] --> B["Inspect repository"] B --> C["Create plan"] C --> D["Edit scoped files"] D --> E["Run tests and checks"] E --> F{"Checks pass?"} F -->|yes| G["Show diff and summary"] F -->|no| H["Diagnose failure or stop"] G --> I["Ask for commit or PR approval"] I --> J["Commit or leave changes unstaged"] H --> K["Trace failure and next action"]

Security Controls

The agent works in a scoped workspace or branch.
Shell commands run with timeouts and no ambient production secrets.
File edits stay within the repository root.
Network access is disabled unless the task needs dependency or documentation lookup.
Commit, push, deploy, and destructive commands require explicit approval.

Trace And Eval

Every run should record files inspected, commands run, tests passed or failed, diff summary, approval request, and final state.

Good eval cases:

correct single-file change with passing tests;
failing test stops the run;
command timeout is handled;
attempted edit outside workspace is denied;
generated change touches unrelated files;
commit requested before diff review.

Minimal Code

type CommandPolicy = {
  allowedPrefixes: string[];
  timeoutMs: number;
  network: "blocked" | "allowlisted";
};

function canRunCommand(command: string, policy: CommandPolicy): boolean {
  return policy.allowedPrefixes.some(prefix => command.startsWith(prefix));
}

The model should not decide that a command is safe because it looks familiar. The harness should check the command against the current task, workspace, and approval policy.

Failure Modes

The agent edits generated files instead of source files.
A test failure is summarized as success.
The sandbox exposes secrets through environment variables.
The agent commits unrelated user changes.
The final answer hides a failed command or skipped check.

Slice 3: Research To Brief Agent

Goal

A research agent gathers evidence, produces a short technical brief, cites sources, and stores only durable facts that pass a memory policy.

Pattern Composition

Concern	Pattern
Retrieval	Semantic Recall and RAG
Context control	Context Budgets and Working Sets
Memory	Working Memory
Output shape	Structured Output
Evals	Production Evaluation Feedback Loops
UX	Agent UX and Human Trust

Runtime Flow

flowchart TD A["Research question"] --> B["Clarify scope and freshness"] B --> C["Search approved sources"] C --> D["Build evidence packet"] D --> E["Draft brief with citations"] E --> F["Check citation faithfulness"] F --> G{"Memory write allowed?"} G -->|yes| H["Store scoped durable fact"] G -->|no| I["Keep as task-local note"] H --> J["Return brief and trace"] I --> J

Security Controls

Retrieved documents are data, not instructions.
The agent separates source evidence from system instructions.
Memory writes require source, confidence, retention class, and correction path.
Private or licensed content is not copied into long-term memory by default.
The brief says when evidence is missing, stale, or conflicting.

Trace And Eval

Every run should record query, source set, evidence packet, omitted sources, citation checks, memory decisions, and final answer shape.

Good eval cases:

answer requires a current source;
sources conflict;
retrieval returns irrelevant documents;
citation does not support the claim;
model tries to store an unsupported memory;
brief should refuse because evidence is missing.

Runnable evidence:

Signal	Repository Evidence
Safe stop	`stopReason: answered_with_citation`
Current source	`current_source_used` eval passes for `refund-policy-v4`.
Stale source rejected	`stale_source_rejected` eval passes for `refund-policy-v2`.
Forbidden source omitted	`forbidden_source_omitted` eval passes for `finance-private-notes`.

Minimal Code

type MemoryCandidate = {
  claim: string;
  sourceIds: string[];
  confidence: "low" | "medium" | "high";
  retention: "task_only" | "project" | "user";
};

function canWriteMemory(candidate: MemoryCandidate): boolean {
  return (
    candidate.retention !== "user" &&
    candidate.confidence === "high" &&
    candidate.sourceIds.length > 0
  );
}

The default should be task-local memory. Durable memory is a controlled write, not a side effect of reading.

Failure Modes

Retrieved content changes the agent’s instructions.
The brief cites a source that does not support the claim.
Stale evidence is presented as current.
The agent stores a user preference from one temporary task.
The trace cannot explain why a source was included or omitted.

Comparison

Slice	Main risk	Primary control	Best regression eval	Runnable Stop Signal
Support refund assistant	Money moves without authority.	Approval-bound tool execution.	Refund tool cannot execute without policy and approval trace.	`draft_ready`
Safe coding agent	The agent changes more than it should.	Workspace, diff, tests, and approval.	Unrelated file edits or failed checks block completion.	Not included in capstone runtime yet.
Research to brief agent	Unsupported claims look authoritative.	Evidence packets and citation checks.	Claims must be supported by cited source IDs.	`answered_with_citation`
Multi-agent delivery workflow	Delegation hides accountability.	Workflow-owned merge and final acceptance.	Required role outputs and sequential turns must pass before acceptance.	`accepted_after_review`

Slice 4: Multi-Agent Delivery Workflow

Goal

A workflow owner coordinates planner, risk reviewer, and test planner roles. The workflow accepts the package only after every role contributes in order.

Pattern Composition

Concern	Pattern
Delegation	Task Delegation
Supervisor	Supervisor / Worker
Transcript	Evaluate Multi-Agent Transcripts
Flow control	CrewAI Flows and Crews
Evals	Observability and Evals

Runnable Evidence

Signal	Repository Evidence
Planner present	`planner_present` eval passes.
Risk review present	`risk_review_present` eval passes.
Test plan present	`test_plan_present` eval passes.
Turn order valid	`turns_sequential` eval passes.
Final owner accepts last	`final_owner_accepts_last` eval passes and `finalOwner` is `workflow`.

Failure Modes

A specialist role is skipped but the final answer still sounds complete.
Acceptance happens before risk review or test planning.
Turn order is broken, making the trace hard to replay.
No single owner accepts the final package.
Delegation cannot be disabled during an incident.

Design Rule

A vertical slice should prove composition. It should show how the loop, tools, state, memory, security, runtime, observability, and evals work together for one real task.

Small examples are fine. Isolated examples are not enough.

Vertical Slice Examples

How To Read A Slice

Slice Review Gate

Slice 1: Support Refund Assistant

Goal

Pattern Composition

Runtime Flow

Security Controls

Trace And Eval

Minimal Code

Failure Modes

Slice 2: Safe Coding Agent

Goal

Pattern Composition

Runtime Flow

Security Controls

Trace And Eval

Minimal Code

Failure Modes

Slice 3: Research To Brief Agent

Goal

Pattern Composition

Runtime Flow

Security Controls

Trace And Eval

Minimal Code

Failure Modes

Comparison

Slice 4: Multi-Agent Delivery Workflow

Goal

Pattern Composition

Runnable Evidence

Failure Modes

Design Rule

Related Chapters

Related chapters