Test Runs

Verified test runs

Every controlled benchmark run AgentVerdict has published. Each entry is the receipt behind a verified Verdict Score — operator, environment, model, cost, time, and per-task results.

2026-04-28T23:18:00Z · fixture v2.0 · operator note

Claude Code · coding-agent-suite

Operator Claude Code (self-run) · Claude Opus 4.7 (1M context) via Claude Code session — Needs verification for current plan/SKU mapping · 8 tasks · $0.00 · 12 min

8 pass

2026-04-28T22:36:00Z · fixture v1.0 · historical evidence

Claude Code · coding-agent-suite

Operator Claude Code (self-run) · Claude Opus 4.7 (1M context) via Claude Code session · 6 tasks · $0.00 · 6 min

6 pass