Prompt Chaining and Gates covers fixed multi-step model workflows where software validates each stage before the next one runs.

Section
Pattern Selection and Composition
Type
Guide
Level
Intermediate
Read
5 min
Effort
10-20 min read
ArchitectReviewer

Prompt Chaining and Gates

Prompt chaining breaks a task into known LLM steps and places validation gates between them. It is one of the safest ways to add model judgment without giving the model control over the whole workflow.

Use this pattern when the order of work is known but individual steps need language understanding, generation, classification, or extraction.

Intent

Run several bounded model calls in a fixed sequence. Each call receives a narrow input, produces a typed output, and passes through a deterministic gate before the next step runs.

The model performs local judgment. Code owns the workflow.

Use When

  • The task has stable phases.
  • Each phase can be tested independently.
  • Intermediate outputs can be validated.
  • You want lower risk than an autonomous agent loop.
  • You need clear failure points and retry behavior.

Common examples:

  • classify a request, extract fields, verify fields, then draft a response;
  • summarize a document, extract claims, check claims, then produce a final answer;
  • generate code, run tests, critique failures, then revise;
  • retrieve evidence, normalize citations, then synthesize an answer.

Avoid When

  • The steps are unknown before execution starts.
  • The chain would branch into dozens of special cases.
  • No gate can verify intermediate outputs.
  • The same input often needs a different sequence of work.
  • Latency from multiple calls is unacceptable.

If the chain keeps growing conditional branches, consider Routing and Handoffs or an Agent Loop.

Architecture

Input
  -> Step 1: classify or extract
  -> Gate 1: schema, confidence, policy, completeness
  -> Step 2: transform or retrieve
  -> Gate 2: evidence, consistency, permissions
  -> Step 3: generate final output
  -> Gate 3: final eval, formatting, approval
  -> Result

Gates should be deterministic whenever possible. A gate can use a model as an evaluator, but the gate still needs explicit criteria and a clear pass/fail output.

sequenceDiagram participant Code as Workflow code participant Model participant Gate participant Review as Human or policy review participant Trace Code->>Model: Step 1: classify or extract Model-->>Code: Typed output Code->>Gate: Validate schema, confidence, policy alt Gate passes Code->>Model: Step 2: transform or retrieve Model-->>Code: Typed output Code->>Gate: Validate evidence and consistency else Gate fails Code->>Trace: Record stop, repair, or clarification end alt Final gate passes Code->>Model: Step 3: generate final output Model-->>Code: Draft result Code->>Gate: Final eval and format check Code->>Trace: Record accepted chain else Escalation required Code->>Review: Send state and failure reason Review-->>Code: Approve, reject, or revise end

Use the diagram as a design test. Every arrow should map to a typed state transition, a named gate decision, and a bounded retry or escalation rule.

Gate Types

Gate Checks Failure Behavior
Schema gate JSON shape, required fields, enums, numeric ranges Ask model to repair or fail fast.
Evidence gate Citations, source freshness, retrieval coverage Retrieve more evidence or escalate.
Policy gate permissions, user role, restricted actions Block, redact, or request approval.
Confidence gate classification confidence, ambiguity, missing inputs Ask a clarification question.
Consistency gate claims match previous state and evidence Re-run the step with narrower context.
Cost gate model call count, token budget, time budget Return partial result or defer.
Human gate subjective or high-impact decision Pause until approval.

The strongest chains mix several gate types. A support workflow may use schema gates for extracted fields, policy gates for refunds, and human gates for exceptions.

Gate Contract

Every gate should produce a machine-readable decision. Do not hide gate behavior inside prose.

type GateDecision =
  | { status: 'pass' }
  | {
      status: 'repair';
      reason: string;
      maxRetries: number;
      repairInstructions: string;
    }
  | {
      status: 'stop';
      reason: string;
      userVisibleMessage?: string;
    }
  | {
      status: 'escalate';
      reason: string;
      requiredRole: 'support_lead' | 'security_reviewer' | 'domain_expert';
    };

The chain should record the gate decision, not just the final output. A production trace should show which step ran, which gate accepted or rejected it, and what happened next.

Use explicit failure reasons:

Failure Reason Meaning Next Action
schema_invalid Output cannot be parsed or required fields are missing. Repair once or twice, then stop.
evidence_missing The step made a claim without enough source support. Retrieve more evidence or escalate.
policy_denied The next step would violate permission or business policy. Stop or request approval.
ambiguous_input The chain cannot choose a safe interpretation. Ask the user or route to review.
budget_exhausted Retry, token, latency, or tool budget is spent. Return partial result or stop.

Worked Gate Example: Refund Draft

A refund support chain should stop before it drafts a customer promise if required evidence is missing. The model can classify, extract, and summarize, but gates decide whether the next step may run.

Chain Step Model Output Gate Pass Condition Failure Behavior
Classify request refund_request with confidence 0.91 confidence gate Confidence >= 0.72 and type is not unknown. Ask a clarifying question.
Extract fields order_id, customer_id, issue, requested_outcome schema gate Required fields exist and IDs match expected format. Repair once, then stop with schema_invalid.
Retrieve policy source IDs and policy version evidence gate Current refund policy and order evidence are present. Stop with evidence_missing or retrieve again.
Draft recommendation proposed refund or denial policy gate Recommendation matches threshold, account status, and policy version. Stop with policy_denied or request approval.
Draft customer reply customer-facing text final gate Text matches approved recommendation and does not promise unapproved payment. Block and send to support lead.

The important boundary is the fourth row. If the model proposes “full refund approved” but the policy gate returns approval_required, the chain must not continue to a customer-facing promise. It should pause with the exact evidence, proposed amount, policy version, and reviewer role.

{
  "step": "policy_gate",
  "status": "escalate",
  "reason": "approval_required",
  "required_role": "support_lead",
  "evidence_refs": ["order:O-104", "policy:refunds:v2026-06"],
  "blocked_next_step": "draft_customer_reply"
}

This is the difference between a chain and a loose sequence of prompts. The chain carries typed state and stops before unsafe language or side effects escape.

Implementation Notes

  • Keep each prompt narrow. A step should have one job.
  • Use structured output for every handoff between steps.
  • Persist intermediate outputs, not just the final answer.
  • Treat every model output as untrusted until a gate accepts it.
  • Make repair loops bounded. A failed schema should not trigger infinite retries.
  • Prefer deterministic gates for safety and model-based gates for subjective quality.
  • Log each step with input hash, model, output, gate result, latency, and cost.

Example Chain

type TicketClass = 'billing' | 'technical' | 'account' | 'unknown';

interface Classification {
  type: TicketClass;
  confidence: number;
  reason: string;
}

interface ExtractedFields {
  customerId?: string;
  orderId?: string;
  issue: string;
}

interface ChainState {
  input: string;
  classification?: Classification;
  fields?: ExtractedFields;
  draft?: string;
}

function gateClassification(result: Classification): string | null {
  if (result.confidence < 0.72) return 'classification_low_confidence';
  if (result.type === 'unknown') return 'unknown_ticket_type';
  return null;
}

function gateFields(fields: ExtractedFields): string | null {
  if (!fields.issue.trim()) return 'missing_issue';
  return null;
}

async function runTicketChain(input: string): Promise<ChainState> {
  const state: ChainState = { input };

  state.classification = await classifyTicket(input);
  const classificationError = gateClassification(state.classification);
  if (classificationError) throw new Error(classificationError);

  state.fields = await extractTicketFields(input, state.classification.type);
  const fieldError = gateFields(state.fields);
  if (fieldError) throw new Error(fieldError);

  state.draft = await draftResponse(state.classification.type, state.fields);
  return state;
}

The example keeps control flow in code. The model classifies, extracts, and drafts, but it does not decide which validation rules apply.

Failure Modes

  • A chain pretends to be deterministic while hidden prompts make important decisions.
  • Gates validate shape but not meaning.
  • The chain passes summarized state forward and loses key evidence.
  • Repair loops keep asking the model to fix an output without changing the input or prompt.
  • A model-based evaluator accepts fluent but unsupported claims.
  • The chain becomes a fragile substitute for a real workflow engine.

Production Checklist

  • Are all step inputs and outputs typed?
  • Can each step be tested with fixture inputs?
  • Does every gate have a named failure reason?
  • Does every repair loop have a retry limit?
  • Are intermediate outputs persisted for replay?
  • Are high-risk actions separated from generation steps?
  • Can operators see which gate stopped a run?