From Patterns To Systems

Patterns are useful only when they help you build a system. A production agent is rarely one pattern. It is usually a workflow with a few model-mediated decisions, a retrieval boundary, some tools, state, policy, approvals, evals, and observability. The pattern names matter far less than the way those pieces fit together.

This is where many agent projects go wrong. The team adds a loop, then memory, then tools, then a second agent, then a judge, then a workflow engine. Each addition feels reasonable on its own, and the result is a system nobody can explain. Composition is the discipline that prevents that.

Start With The Workload

Do not compose patterns from a catalog. Compose them from the workload. Start by asking what the user is actually trying to accomplish, which steps are known in advance and which require model judgment, what evidence has to be retrieved, which actions carry side effects, what state must survive a failure, what needs approval, and what has to be observable after the run.

The answers point to the parts. If the workflow is known, keep code in charge. If the next step depends on observations, add an agent loop. If the task needs evidence, add retrieval. If the task can change the outside world, add policy and approval. If failures matter, add durable state and replay. That is composition: each pattern earns its place rather than arriving by default.

A Common Shape

Many useful agentic systems follow roughly this shape:

Entry point receives a request, event, or scheduled task.
Router classifies the task, risk, and required capability.
Workflow loads state, policy context, and relevant memory.
Retrieval gathers evidence when the answer depends on knowledge.
Agent loop handles bounded uncertainty inside the workflow.
Tools execute through typed schemas and permission checks.
Approval gate pauses high-risk side effects.
Evaluators check trajectory, evidence, output, and policy.
Runtime stores traces, costs, decisions, tool calls, and stop reasons.
Incidents and corrections feed the eval suite.

Not every system needs every step. What stays constant is the ownership. Code owns flow, state, policy, and persistence; the model owns bounded judgment inside those constraints.

flowchart TB subgraph A[Workload shape] direction LR A1[Request or event] --> A2[Route task and risk] A2 --> A3[Workflow owns state] end subgraph B[Judgment boundary] direction LR B1[Deterministic step] --> B2{Uncertainty remains?} B2 -->|yes| B3[Retrieval plus bounded agent loop] B2 -->|no| B4[Continue in code] end subgraph C[Release boundary] direction LR C1{Side effect?} -->|yes| C2[Policy and approval] C1 -->|no| C3[Produce result] C2 --> C3 end subgraph D[Learning loop] direction LR D1[Trace, evals, cost, stop reason] --> D2[Incidents become eval cases] end A --> B B --> C C --> D

Use this diagram as a composition test. If a proposed pattern cannot be placed on the map with an owner, input, output, and failure mode, it is probably not ready to enter the system.

Composition Rules

Run through these rules before adding another pattern.

Rule	Why It Matters
One component owns the goal.	Without goal ownership, agents optimize different tasks.
One component owns state.	Without state ownership, replay and recovery become guesswork.
Tool calls cross a policy boundary.	Without policy, model proposals become actions too quickly.
Memory writes are explicit events.	Without memory discipline, stale or unsafe context persists.
Loops have stop conditions.	Without stop conditions, autonomy becomes cost and latency growth.
Evals inspect trajectories.	Without trajectory evals, unsafe paths can produce plausible answers.
Traces connect decisions to effects.	Without traces, failures cannot become better tests.

These rules matter more than the framework. A framework can help you implement them; it cannot decide them for you.

System Composition Record

For any non-trivial agentic system, write a short composition record before implementation. It should fit in a pull request or architecture decision record.

system: support_refund_assistant
user_goal: "Resolve refund eligibility with policy-backed evidence."
primary_flow_owner: refund_workflow
patterns:
  routing:
    job: "classify request type and risk"
    owner: intake_service
  retrieval:
    job: "load current refund policy and order evidence"
    owner: evidence_service
  agent_loop:
    job: "investigate missing or conflicting evidence"
    owner: refund_investigation_agent
    max_steps: 6
  policy_enforcement:
    job: "validate recommendation against refund policy"
    owner: policy_gate
  human_approval:
    job: "approve exceptions and high-value refunds"
    owner: approval_workflow
  observability:
    job: "record trace, decisions, evidence, costs, and stop reason"
    owner: runtime
release_blockers:
  - "missing evidence can stop the run"
  - "refund tool cannot execute without approval"
  - "trace can replay proposal, validation, approval, and side effect"

The record forces every pattern to justify itself. If a pattern has no job, no owner, or no release blocker, remove it or keep it out of the first version.

Bad Composition

Bad composition usually has the same smell: the model owns too much. The agent infers the goal from vague conversation history, chooses tools without a permission check, and writes memory without classification or review. Retries happen inside hidden loops. Subagents receive the full conversation instead of a narrow task. Evaluators check tone but not evidence. The final answer is logged while the trajectory is lost, and a multi-agent system has no single owner for final synthesis.

These systems can look impressive in a demo. They are painful to operate because nobody can say where responsibility lives.

Good Composition

Good composition is usually boring. A support refund system, for example, might run a deterministic workflow for intake and account lookup, a router for request type and risk, and retrieval for the current refund policy. It would use structured output for the extracted fields and the recommendation, a small agent loop only for missing-information investigation, policy enforcement before any refund action, human approval for exceptions, and observability and evals across the full run.

The system is agentic where uncertainty exists and deterministic where control matters. That is the whole trick.

A simple composition sketch might look like this:

async function handleRefundRequest(request: SupportRequest) {
  const route = classifyRequest(request);
  if (route.kind !== 'refund') return handoffTo(route.owner);

  const order = await tools.lookupOrder(request.orderId);
  const policy = await retrievePolicy('refunds', order.region);
  const recommendation = await refundAgent.investigate({
    request,
    order,
    policy
  });

  const decision = enforceRefundPolicy(recommendation, order, policy);
  if (decision.requiresApproval) {
    return approvals.request('refund_exception', decision);
  }

  return tools.draftRefundRequest(decision);
}

Only one step uses an agent loop. The workflow still owns route, state, policy, approval, and side effects.

When To Split Agents

Do not split agents because the task feels large. Split them when the boundary buys something concrete: separate context windows, separate tools, separate permissions, separate teams or ownership, parallel work, independent review, or different user-facing responsibilities.

The weak reasons are easy to recognize once you name them. The architecture diagram looks more advanced. Each prompt sounds like a different job title. The team wants a multi-agent system. The single-agent design has unclear goals or weak tools, so splitting it feels like progress. It is not. Splitting agents does not fix weak architecture; it multiplies it.

Design Review Checklist

Before approving a composed agentic system, ask:

Which parts are deterministic workflows?
Which parts are model-mediated decisions?
What owns the active goal?
What owns durable state?
What tools can cause side effects?
What policy runs before those side effects?
What evidence is required for the final answer?
What memory can be written, updated, or deleted?
What are the stop conditions?
What evals block release?
What trace lets an operator replay the run?
What happens when the model is wrong but persuasive?

If the design cannot answer these, it is not ready for more autonomy.

Design Rule

Compose patterns only when each one has a job, an owner, and a failure mode you can test.