Vertical Slice Examples show how to combine patterns into small product workflows with clear runtime and evidence boundaries.

Section
Hands-On Labs
Type
Lab
Level
Hands-on
Read
8 min
Effort
45-90 min lab
BuilderStudent

Vertical Slice Examples

Patterns become useful when they are composed into a working system. A vertical slice is a small end-to-end design that shows the goal, agent loop, tools, state, policy, observability, evals, and runtime behavior together.

Download the lab completion worksheet and lab production readiness worksheet when turning a slice into an implementation plan.

The examples in this chapter are not full products. They are slices. Each one should be small enough to review in one sitting and concrete enough to expose architecture decisions.

Read this after Pattern Composition Playbook, Production Runtime Overview, and Observability and Evals. Those chapters explain the boundaries; this chapter shows what those boundaries look like when several patterns work together for one task.

Run the deterministic capstone runtime when you want executable evidence for the same shapes:

npm run capstones:demo
npm run capstones:test

Expected output:

support-refund-agent: pass
  stop: draft_ready
  trace events: 7
research-rag-agent: pass
  stop: answered_with_citation
  trace events: 6
multi-agent-delivery-workflow: pass
  stop: accepted_after_review
  trace events: 4
Capstone project tests OK

The code lives in capstone-projects-runtime/typescript/src/capstones.ts. Treat it as the runnable evidence layer for these slices: each run returns state, trace events, eval results, and rollback actions.

Download the captured lab and capstone command output examples when you need a compact model for saving capstone terminal output, trace snapshots, eval snapshots, and production questions.

How To Read A Slice

Use the same checklist for every example:

  1. What user or system goal starts the run?
  2. What patterns are composed?
  3. What state must survive between steps?
  4. What tools can the agent call, and under which scopes?
  5. What requires approval?
  6. What trace events prove what happened?
  7. What evals catch regressions?
  8. What failure mode would make this unsafe in production?

If a slice cannot answer those questions, it is still a demo.

Slice Review Gate

Use this gate before turning any slice into a capstone or product backlog:

Check Evidence
Goal is bounded One user or system goal starts the run.
Pattern composition is explicit Each major concern maps to a named pattern chapter.
Authority is constrained Tools, data, memory, approvals, and side effects have owners and scopes.
State is recoverable The slice names what must persist, replay, resume, or be deleted.
Evals protect the risky path Regression cases cover the failure mode that would make the slice unsafe.

Record the goal, composed patterns, state, tools, approval points, trace events, and evals in the lab production readiness worksheet.

Slice 1: Support Refund Assistant

Goal

A support agent helps a human support operator handle refund requests. It reads the order, retrieves the active refund policy, drafts a recommendation, and prepares a refund action. It does not issue the refund without approval.

Pattern Composition

Concern Pattern
Agent loop Agent Loop
Context Context Engineering
Evidence Semantic Recall and RAG
Tools Tool Capability Design
Approval Human Approval Gates
Runtime Production Runtime Overview
Security Agent Security and Sandboxing
Evals Observability and Evals

Runtime Flow

flowchart TD A["Refund request"] --> B["Load order and customer context"] B --> C["Retrieve active refund policy"] C --> D["Draft recommendation"] D --> E["Authorize draft refund tool"] E --> F{"Requires approval?"} F -->|yes| G["Pause for human approval"] G --> H["Issue refund or reject action"] F -->|no| I["Return draft only"] H --> J["Trace and eval record"] I --> J

Security Controls

  • The agent receives orders:read, refunds:draft, and policies:read.
  • The refunds.issue tool requires human approval and an idempotency key.
  • The refund policy is retrieved from an approved source with a policy version.
  • Customer payment tokens never enter the prompt.
  • External email is a separate tool with its own approval rule.

Trace And Eval

Every run should record order lookup, policy retrieval, recommendation draft, tool authorization, approval state, refund side-effect ID, and stop reason.

Good eval cases:

  • refund allowed by policy and approved;
  • refund denied by policy;
  • missing order;
  • stale policy retrieved;
  • model attempts refunds.issue without approval;
  • duplicate approval message replayed.

Runnable evidence:

Signal Repository Evidence
Safe stop stopReason: draft_ready
Policy citation draft_contains_policy_citation eval passes.
Money does not move no_money_movement eval passes and trace records agent_cannot_issue_refund.
Rollback Disable refunds.create_draft or route to the human support queue.

Minimal Code

type RefundDecision =
  | { action: "draft_refund"; orderId: string; amountCents: number; policyVersion: string }
  | { action: "deny_refund"; orderId: string; reason: string; policyVersion: string }
  | { action: "needs_human_review"; orderId: string; reason: string };

function requiresApproval(decision: RefundDecision): boolean {
  return decision.action === "draft_refund" && decision.amountCents > 0;
}

The code is intentionally small. The important part is the boundary: a model can propose a refund decision, but the runtime still checks policy, approval, and idempotency before money moves.

Failure Modes

  • The model treats an old refund policy as current.
  • The tool call issues a refund before approval.
  • The trace records the final answer but not the policy version.
  • A retry issues the same refund twice.
  • The agent sends the customer message before the operator reviews it.

Slice 2: Safe Coding Agent

Goal

A coding agent makes a small repository change, runs tests, shows the diff, and asks for approval before committing or opening a pull request.

Pattern Composition

Concern Pattern
Loop and planning Planning and Execution
Harness Agent Harnesses
Workspace Coding Agents
Sandbox Agent Security and Sandboxing
Evaluation Evaluation-Driven Agent Development
Recovery Circuit Breakers, Fallbacks, and Replay

Runtime Flow

flowchart TD A["User change request"] --> B["Inspect repository"] B --> C["Create plan"] C --> D["Edit scoped files"] D --> E["Run tests and checks"] E --> F{"Checks pass?"} F -->|yes| G["Show diff and summary"] F -->|no| H["Diagnose failure or stop"] G --> I["Ask for commit or PR approval"] I --> J["Commit or leave changes unstaged"] H --> K["Trace failure and next action"]

Security Controls

  • The agent works in a scoped workspace or branch.
  • Shell commands run with timeouts and no ambient production secrets.
  • File edits stay within the repository root.
  • Network access is disabled unless the task needs dependency or documentation lookup.
  • Commit, push, deploy, and destructive commands require explicit approval.

Trace And Eval

Every run should record files inspected, commands run, tests passed or failed, diff summary, approval request, and final state.

Good eval cases:

  • correct single-file change with passing tests;
  • failing test stops the run;
  • command timeout is handled;
  • attempted edit outside workspace is denied;
  • generated change touches unrelated files;
  • commit requested before diff review.

Minimal Code

type CommandPolicy = {
  allowedPrefixes: string[];
  timeoutMs: number;
  network: "blocked" | "allowlisted";
};

function canRunCommand(command: string, policy: CommandPolicy): boolean {
  return policy.allowedPrefixes.some(prefix => command.startsWith(prefix));
}

The model should not decide that a command is safe because it looks familiar. The harness should check the command against the current task, workspace, and approval policy.

Failure Modes

  • The agent edits generated files instead of source files.
  • A test failure is summarized as success.
  • The sandbox exposes secrets through environment variables.
  • The agent commits unrelated user changes.
  • The final answer hides a failed command or skipped check.

Slice 3: Research To Brief Agent

Goal

A research agent gathers evidence, produces a short technical brief, cites sources, and stores only durable facts that pass a memory policy.

Pattern Composition

Concern Pattern
Retrieval Semantic Recall and RAG
Context control Context Budgets and Working Sets
Memory Working Memory
Output shape Structured Output
Evals Production Evaluation Feedback Loops
UX Agent UX and Human Trust

Runtime Flow

flowchart TD A["Research question"] --> B["Clarify scope and freshness"] B --> C["Search approved sources"] C --> D["Build evidence packet"] D --> E["Draft brief with citations"] E --> F["Check citation faithfulness"] F --> G{"Memory write allowed?"} G -->|yes| H["Store scoped durable fact"] G -->|no| I["Keep as task-local note"] H --> J["Return brief and trace"] I --> J

Security Controls

  • Retrieved documents are data, not instructions.
  • The agent separates source evidence from system instructions.
  • Memory writes require source, confidence, retention class, and correction path.
  • Private or licensed content is not copied into long-term memory by default.
  • The brief says when evidence is missing, stale, or conflicting.

Trace And Eval

Every run should record query, source set, evidence packet, omitted sources, citation checks, memory decisions, and final answer shape.

Good eval cases:

  • answer requires a current source;
  • sources conflict;
  • retrieval returns irrelevant documents;
  • citation does not support the claim;
  • model tries to store an unsupported memory;
  • brief should refuse because evidence is missing.

Runnable evidence:

Signal Repository Evidence
Safe stop stopReason: answered_with_citation
Current source current_source_used eval passes for refund-policy-v4.
Stale source rejected stale_source_rejected eval passes for refund-policy-v2.
Forbidden source omitted forbidden_source_omitted eval passes for finance-private-notes.

Minimal Code

type MemoryCandidate = {
  claim: string;
  sourceIds: string[];
  confidence: "low" | "medium" | "high";
  retention: "task_only" | "project" | "user";
};

function canWriteMemory(candidate: MemoryCandidate): boolean {
  return (
    candidate.retention !== "user" &&
    candidate.confidence === "high" &&
    candidate.sourceIds.length > 0
  );
}

The default should be task-local memory. Durable memory is a controlled write, not a side effect of reading.

Failure Modes

  • Retrieved content changes the agent’s instructions.
  • The brief cites a source that does not support the claim.
  • Stale evidence is presented as current.
  • The agent stores a user preference from one temporary task.
  • The trace cannot explain why a source was included or omitted.

Comparison

Slice Main risk Primary control Best regression eval Runnable Stop Signal
Support refund assistant Money moves without authority. Approval-bound tool execution. Refund tool cannot execute without policy and approval trace. draft_ready
Safe coding agent The agent changes more than it should. Workspace, diff, tests, and approval. Unrelated file edits or failed checks block completion. Not included in capstone runtime yet.
Research to brief agent Unsupported claims look authoritative. Evidence packets and citation checks. Claims must be supported by cited source IDs. answered_with_citation
Multi-agent delivery workflow Delegation hides accountability. Workflow-owned merge and final acceptance. Required role outputs and sequential turns must pass before acceptance. accepted_after_review

Slice 4: Multi-Agent Delivery Workflow

Goal

A workflow owner coordinates planner, risk reviewer, and test planner roles. The workflow accepts the package only after every role contributes in order.

Pattern Composition

Concern Pattern
Delegation Task Delegation
Supervisor Supervisor / Worker
Transcript Evaluate Multi-Agent Transcripts
Flow control CrewAI Flows and Crews
Evals Observability and Evals

Runnable Evidence

Signal Repository Evidence
Planner present planner_present eval passes.
Risk review present risk_review_present eval passes.
Test plan present test_plan_present eval passes.
Turn order valid turns_sequential eval passes.
Final owner accepts last final_owner_accepts_last eval passes and finalOwner is workflow.

Failure Modes

  • A specialist role is skipped but the final answer still sounds complete.
  • Acceptance happens before risk review or test planning.
  • Turn order is broken, making the trace hard to replay.
  • No single owner accepts the final package.
  • Delegation cannot be disabled during an incident.

Design Rule

A vertical slice should prove composition. It should show how the loop, tools, state, memory, security, runtime, observability, and evals work together for one real task.

Small examples are fine. Isolated examples are not enough.