Framework Selection
Agent frameworks can accelerate development, but they also shape the system’s boundaries. Select a framework after you know the pattern, state model, tools, eval needs, and deployment constraints.
Use this chapter to compare frameworks without turning the architecture into a framework demo.
For a side-by-side comparison of LangGraph, AutoGen-style systems, Mastra AI, CrewAI, and a custom mini-runtime, use Cross-Framework Decision Matrix. For concrete install and porting notes, use Real Framework Setup Notes. For the decision record, use Templates and Worksheets.
Decision Tree
Use this diagram to keep the framework conversation grounded in workload shape. The useful question is what the framework makes easier to inspect: direct state, graph execution, runtime packaging, role transcripts, or business-flow coordination.
Intent
Choose whether to build directly, use a lightweight library, or adopt a full agent framework.
The right framework should make the important parts more visible: state, tools, policy, traces, handoffs, and failure modes.
The wrong framework makes the demo faster and the system harder to own. It hides state, turns tool calls into magic, makes memory feel automatic, and gives the team a vocabulary before it has an architecture.
Selection Criteria
| Criterion | Questions |
|---|---|
| Control flow | Does it support chains, graphs, loops, routing, and human gates? |
| State | Can it persist, resume, inspect, and migrate state? |
| Tooling | Can tools be typed, scoped, tested, and audited? |
| Multi-agent | Can it represent roles, handoffs, and ownership clearly? |
| Memory | Does it separate working, episodic, semantic, and procedural memory? |
| Evaluation | Can evals run against routes, tools, traces, and final outputs? |
| Observability | Does it expose model calls, tool calls, costs, latency, and errors? |
| Security | Can tools and agents run with least privilege? |
| Deployment | Can it run where your system must run? |
| Escape hatch | Can you drop to code when the abstraction is wrong? |
| Portability | Can prompts, tools, evals, policies, and state survive a framework change? |
| Ownership | Can your team debug production failures without waiting for framework internals? |
Do not adopt a framework that hides a concern your product must control.
Fit By Problem Shape
Start from the workload, not the framework category.
| Problem Shape | Better Fit | Watch For |
|---|---|---|
| Single bounded task | direct code or prompt library. | overbuilding state and orchestration. |
| Known phases | prompt chain or workflow. | hiding validation between steps. |
| Branching workflow | graph or durable workflow framework. | graph sprawl and unclear state ownership. |
| Tool-heavy agent | tool registry plus policy boundary. | broad tools and weak approval controls. |
| Evidence-heavy answers | RAG framework plus context builder. | retrieval disconnected from policy and evals. |
| Long-running work | durable workflow runtime. | in-memory loops and lost approval state. |
| Multi-role work | multi-agent framework or service topology. | fake specialization with duplicated context. |
| Cross-agent communication | A2A, service contracts, queues, or workflows. | natural-language handoffs without schema. |
| Product runtime | application/runtime framework. | platform lock-in and opaque operations. |
| Coding agents | sandboxed harness with file, command, test, and approval controls. | shell access without policy or replay. |
If two approaches both work, choose the one that makes state, tools, policy, evals, and rollback easier to inspect.
Common Framework Shapes
| Shape | Strength | Risk |
|---|---|---|
| Prompt and tool library | Fast for simple agents. | You must build state, evals, and operations yourself. |
| Graph/workflow framework | Clear execution paths, durable state, conditional edges. | Graphs can become hard to read if every branch becomes a node. |
| Multi-agent framework | Role and delegation primitives. | Easy to overuse agents where a workflow is enough. |
| RAG framework | Retrieval, indexes, and data connectors. | Retrieval can become disconnected from policy and evals. |
| Runtime framework | Deployment, tracing, memory, workflows, and operations. | Runtime choices can become platform lock-in. |
The best choice may combine a small number of layers instead of one large framework.
Capability Matrix
Use this matrix during evaluation. The goal is not to assign a score for marketing features. The goal is to identify which responsibilities the framework owns and which ones your application must still own.
| Capability | Questions To Ask |
|---|---|
| State model | Where is state stored? Can it be inspected, migrated, replayed, and versioned? |
| Tool boundary | Are tools typed? Are scopes, side effects, approvals, and trace fields explicit? |
| Policy | Can policy run before tool calls, memory writes, retrieval, and final answers? |
| Memory | Are working, episodic, semantic, and user memory separated? Can writes be reviewed? |
| Context | Can the framework show what context packet was assembled for each model call? |
| Human approval | Are approvals durable, exact-action bound, expiring, and auditable? |
| Evals | Can evals test trajectories, tool calls, refusals, memory writes, and citations? |
| Observability | Are model calls, tool calls, costs, latency, policy decisions, and retries traced? |
| Security | Can it enforce least privilege, sandboxing, credential scope, and egress limits? |
| Deployment | Does it support your runtime, data residency, scaling, rollback, and incident process? |
| Interop | Can it expose or consume MCP, A2A, REST, queues, or your existing service contracts? |
| Exit path | Can you move prompts, tools, state, policies, and evals out later? |
Build Directly When
- The workflow is short and stable.
- You need exact control over state and policy.
- The team already has workflow infrastructure.
- The agent is embedded inside an existing product.
- Debuggability matters more than rapid prototyping.
- The framework would introduce more concepts than the domain requires.
Direct code is often the best first production version for a narrow workflow.
Build directly does not mean build carelessly. It means using normal software boundaries: typed inputs, state tables, tool clients, policy checks, tests, traces, and deployment controls. A small direct implementation with strong boundaries is better than a framework implementation that hides them.
Use a Framework When
- The system needs durable graph execution.
- Agents need tool registration and discovery.
- Multi-agent handoffs are central to the product.
- You need built-in tracing, evals, or deployment support.
- You will build many related agents.
- The team benefits from shared conventions.
Use a framework to make repeated patterns consistent, not to avoid understanding the architecture.
Framework adoption is strongest when the team already knows the responsibilities it needs the framework to carry. It is weakest when the team says, “we need agents”, and lets the framework define the architecture after the fact.
Evaluation Process
Run a framework through a vertical slice:
- one real user request;
- one route or workflow;
- one read tool;
- one write tool behind approval;
- one memory read or write;
- one failure path;
- one eval case;
- one trace inspection;
- one deployment path.
If the slice is hard to inspect, hard to test, or hard to secure, stop before the framework spreads.
Add a second vertical slice for failure:
- malformed user request;
- missing evidence;
- denied policy check;
- tool timeout;
- approval wait;
- cancellation;
- replay from trace;
- rollback of prompt, model, or tool configuration.
Most frameworks look good on the happy path. The failure slice tells you whether you can operate it.
Portability And Exit
Assume the framework may change. The safest way to adopt one is to keep the core assets portable:
- tool manifests;
- prompt and instruction files;
- state schemas;
- memory schemas;
- approval rules;
- policy rules;
- eval fixtures;
- trace schema;
- context packet shape;
- architecture decision records.
Avoid burying these assets inside framework-specific callbacks or decorators if they represent product policy. The framework can execute them. It should not be the only place where they exist.
Framework Anti-Patterns
- Selecting a framework because it supports every pattern.
- Replacing architecture decisions with framework defaults.
- Giving all framework agents the same context and tools.
- Accepting opaque traces.
- Skipping evals because the framework demo worked.
- Treating framework memory as trustworthy by default.
- Using a multi-agent framework for simple routing.
- Treating built-in observability as sufficient without incident replay.
- Accepting framework tool abstractions that cannot express side effects or approvals.
- Letting framework defaults decide memory retention, context assembly, or security scope.
Decision Checklist
Before adopting a framework, answer:
- Which pattern are we implementing?
- Which responsibilities does the framework own?
- Which responsibilities remain in our application?
- Can we inspect state, context, tool calls, memory, policy decisions, and traces?
- Can we run evals without calling real side-effectful tools?
- Can we enforce approval and least privilege outside the prompt?
- Can we deploy, roll back, and disable one capability quickly?
- Can we migrate away without rewriting prompts, tools, state, evals, and policy?
- Which ADR records this choice?
- What failure would make us reconsider the framework?
Related Chapters
- Agent Engineer Toolkit
- Cross-Framework Decision Matrix
- Real Framework Setup Notes
- Templates and Worksheets
- Agent Harnesses
- Agent Development Lifecycle
- Architecture Decision Records for Agents
- Choosing the Right Pattern
- Tool Capability Design
- Policy Enforcement
- Observability and Evals
- Mastra Runtime
- CrewAI Flows and Crews
- Agentic System Architecture