# Observability And Evals Review Checklist

Use this checklist before production traffic reaches an agent.

## Trace Contract

- [ ] Every run starts with trace ID, run ID, request ID, actor, tenant, environment, and version set.
- [ ] Model, tool, retrieval, memory, policy, approval, evaluator, retry, and workflow spans are captured where relevant.
- [ ] Traces record status, stop reason, latency, cost, token count, retry count, and policy outcome.
- [ ] Prompt, model, tool schema, policy, retriever, memory, and harness versions appear in the trace.

## Data Safety

- [ ] Trace fields are classified before storage.
- [ ] Secrets, credentials, unnecessary private data, and unsafe raw content are redacted.
- [ ] Retention, access control, deletion, and export rules are documented.
- [ ] Content-reference-only tracing is available when raw content should not be stored.

## Eval Coverage

- [ ] Evals cover task success, trajectory correctness, tool correctness, policy compliance, retrieval quality, memory safety, recovery, cost, and latency.
- [ ] Eval subsets map to prompt, model, tool, retrieval, memory, policy, workflow, and runtime changes.
- [ ] Failed blocking evals stop release or require a traceable override.
- [ ] Each eval suite has an owner and freshness process.

## Operational Loop

- [ ] Incidents, near misses, and human corrections can become eval fixtures.
- [ ] Dashboards show success rate, stop reason, policy denials, tool errors, cost, latency, retries, and eval regression rate.
- [ ] Operators can reconstruct one failed run without raw secrets.
- [ ] Release overrides record owner, reason, affected evals, expiry, and follow-up.