COMPUTER-USE AGENT REVIEW CHECKLIST

System:
Target application:
Owner:
Reviewer:
Date:

1. Fit Check

[ ] No stable API, database, workflow tool, or MCP/tool integration can solve the task better.
[ ] UI automation is scoped to a bounded workflow.
[ ] Human baseline and expected failure rate are understood.
[ ] High-risk actions require approval.

Evidence:

2. Interface Representation

[ ] Preferred representation is named: DOM, accessibility tree, screenshot, OCR, terminal buffer, or instrumentation.
[ ] The representation is stable enough to test.
[ ] Sensitive screen content is redacted from traces when needed.
[ ] UI observations are stored with timestamps and application state.

Evidence:

3. Action Space

[ ] Allowed actions are enumerated.
[ ] Forbidden actions are enumerated.
[ ] Action schema includes selector, target, value, precondition, timeout, and risk.
[ ] Free-form desktop control is blocked unless environment is disposable.

Evidence:

4. Sandbox

[ ] Browser profile, container, VM, or remote desktop is isolated.
[ ] Downloads and uploads are restricted to sandbox paths.
[ ] Network egress is allowlisted.
[ ] Local secrets and private files are inaccessible.
[ ] Sessions are cleared or rotated after runs.

Evidence:

5. Recovery

[ ] Checkpoints capture URL/app state, last action, visible errors, files, side effects, and retry count.
[ ] UI drift stops the run instead of guessing.
[ ] Modal, timeout, validation error, and session expiry paths are tested.
[ ] Failed runs produce a useful user/operator report.

Evidence:

6. Observability and Evals

[ ] Trace links goal, observation, proposed action, policy decision, executed action, result, and stop reason.
[ ] Evals cover happy path, UI drift, denied action, stale selector, slow page, and unexpected modal.
[ ] Screenshots or DOM snapshots are retained only under approved privacy rules.

Evidence:

7. Final Decision

[ ] Prototype only
[ ] Internal pilot
[ ] Production candidate
[ ] Prefer API/tool integration instead

Blocking gaps:

Next actions: