The matrix compares framework options so teams can choose based on runtime needs instead of vendor excitement.

Section
Engineering Practice and Frameworks
Type
Reference
Level
Intermediate
Read
5 min
Effort
10-20 min reference
BuilderArchitect

Cross-Framework Decision Matrix

This chapter compares LangGraph, AutoGen-style systems, Mastra AI, CrewAI, and a small custom runtime by engineering responsibility. The goal is not to crown one framework. The goal is to decide what should own state, tools, policy, memory, evals, deployment, and failure recovery for a specific system.

Use this chapter after the labs. The labs show each framework shape in isolation. This chapter helps choose among them for a real product. Pair it with Real Framework Setup Notes when you need concrete install and porting guidance.

Use this diagram to locate the state owner before comparing APIs. A framework is a better fit when its state boundary matches the failure you most need to inspect and recover.

Cross-framework state ownership comparison

Decision Rule

Choose the framework that makes your highest-risk boundary easiest to inspect, test, and operate.

If the highest risk is resumable state, prefer graph or workflow state. If the highest risk is multi-agent accountability, prefer a transcript or flow that records role behavior. If the highest risk is production runtime packaging, prefer a runtime with conventions for tools, memory, evals, and observability. If the highest risk is exact policy control, a small direct runtime may be safer than a large abstraction.

Framework Fit Matrix

Option Best When Avoid When Keep Portable
LangGraph-style state graph Work has branching, checkpoints, interrupts, resume, or node-level observability. A simple sequence or single tool call is enough. Graph shape would hide rather than clarify state. State schema, node contracts, checkpoint records, interrupt payloads, eval fixtures.
AutoGen-style team Collaboration, role behavior, and transcript review are central. The team is just a chain with agent names. Conversation history becomes the only state store. Message schema, role contracts, termination rules, transcript evals, tool policy.
Mastra-style runtime TypeScript product needs agents, tools, workflows, memory, evals, and observability packaged together. Runtime conventions would hide product policy or deployment requirements. Tool manifests, workflow contracts, memory rules, evals, trace schema.
CrewAI-style flows and crews Python workflows need flow-owned state plus bounded specialist crews. Roles overlap, flow acceptance is vague, or crews replace deterministic workflow design. Flow state, task contracts, role permissions, crew outputs, acceptance evals.
Mini-runtime/custom code Scope is narrow, policy needs exact control, or the team already has workflow infrastructure. You need production durability, scaling, hosted observability, and ecosystem integrations immediately. Everything: state, policy, tools, memory, evals, traces, deployment contracts.

Responsibility Matrix

Responsibility LangGraph-style AutoGen-style Mastra-style CrewAI-style Mini-runtime
State owner Graph state and checkpoints. Team/task state outside the transcript. Runtime workflow and memory state. Flow state. Your application model.
Control flow Nodes, edges, conditional transitions, interrupts. Team turn policy and termination. Workflows and agent runtime. Flows coordinate crews and tasks. Loop, router, or workflow code you write.
Tool policy Guard nodes, middleware, or tool wrappers. Manager/runtime execution boundary. Tool and workflow policy hooks. Role tools plus flow constraints. Explicit policy gate.
Memory State plus memory stores. Transcript plus external memory. Runtime memory abstractions. Flow/crew context and external stores. Context packet and memory policy.
Evals Node paths, state diffs, checkpoints, final output. Transcript turns, roles, tool calls, termination. Runtime traces, tool calls, workflow outcomes. Flow acceptance and role outputs. Trajectory tests you define.
Observability Per-node traces and checkpoint inspection. Structured message transcript. Runtime observability and eval hooks. Flow/task/role records. Trace schema and storage you build.
Deployment App runtime plus checkpointer and stores. App runtime plus agent service boundaries. TypeScript runtime packaging. Python flow app and workers. Existing product infrastructure.
Escape hatch Nodes are plain code if state stays explicit. Transcript can be exported and replayed. Keep tools/evals outside framework-only code. Keep flow state separate from role chat. Maximum control, maximum operational burden.

Choose By Risk

Dominant Risk Better Default Why
Lost progress or interrupted workflows LangGraph-style or durable workflow Checkpoints, resumability, and state transitions are first-class.
Unaccountable multi-agent behavior AutoGen-style transcript or CrewAI-style flow You can inspect role turns, handoffs, and final acceptance.
TypeScript product runtime consistency Mastra-style runtime Agents, tools, workflows, memory, and evals share one application structure.
Python role-based workflow automation CrewAI-style flow Flow state and crew execution map well to Python automation teams.
Strict policy and minimal scope Mini-runtime/custom code The application owns the exact boundary and avoids framework defaults.
Framework uncertainty Mini-runtime first, then migrate Building primitives once clarifies what the framework must provide.

Migration And Escape Hatches

Do not let the framework become the only place where product logic exists. Keep these assets portable:

  • state schemas;
  • tool manifests and side-effect classes;
  • policy rules;
  • approval records;
  • memory retention and deletion rules;
  • prompt and instruction files;
  • context packet shape;
  • trace schema;
  • eval fixtures;
  • deployment and rollback notes;
  • ADRs explaining why the framework was chosen.

Portability is not theoretical. Framework APIs change, teams switch stacks, and production systems outgrow early assumptions. The safest framework adoption keeps product contracts outside framework-specific decorators and callbacks wherever possible.

Decision Checklist

Before choosing a framework, answer:

  • What owns durable state?
  • Where does policy run before side effects?
  • How are tools typed, scoped, approved, and traced?
  • What is the smallest replayable trace that explains a failure?
  • Which evals fail the build?
  • What happens when a run is interrupted?
  • How do we roll back prompts, tools, policies, and model choices?
  • What data must never enter memory or traces?
  • Which parts can migrate if the framework changes?
  • Which production incident would prove this framework was the wrong choice?