DEBATE AND CONSENSUS REVIEW CHECKLIST

System:
Decision:
Owner:
Reviewer:
Date:

1. Fit Check

[ ] A simpler single-agent, deterministic workflow, or human review baseline was considered.
[ ] The decision benefits from independent critique or comparison.
[ ] The task is not a factual lookup that should be solved by retrieval or tools.
[ ] The final decision has one accountable owner.

Evidence:

2. Independence

[ ] Participants have different prompts, evidence, roles, models, or review angles.
[ ] Participants do not all share the same unverified context.
[ ] Participants cannot see each other's answer before the independent round.
[ ] Shared blind spots are named.

Evidence:

3. Protocol

[ ] Protocol is declared before execution: propose-critique-revise, blind vote, ranked choice, red-team review, or judge/reducer.
[ ] Inputs, outputs, vote format, and stop conditions are typed.
[ ] Tie, abstention, refusal, and uncertainty outcomes are defined.
[ ] Cost and latency budget are defined.

Evidence:

4. Evidence

[ ] Each proposal carries evidence, assumptions, and known gaps.
[ ] Votes or rankings cannot replace evidence.
[ ] Unsupported claims are rejected or escalated.

Evidence:

5. Judge and Merge

[ ] Judge rubric is explicit.
[ ] Judge records why one answer won or why the system escalated.
[ ] Disagreement is preserved in the final trace.
[ ] Final answer does not hide minority safety concerns.

Evidence:

6. Evaluation

[ ] Single-agent baseline is measured.
[ ] Consensus quality lift is measured against cost and latency.
[ ] Evals include correlated failure, false consensus, bad majority, and judge error.
[ ] Human escalation is tested for unresolved disagreement.

Evidence:

7. Final Decision

[ ] Use debate/consensus.
[ ] Use simpler baseline.
[ ] Use retrieval/tool verification instead.
[ ] Use human review.

Reason: