Test Runs
Verified test runs
Every controlled benchmark run AgentVerdict has published. Each entry is the receipt behind a verified Verdict Score — operator, environment, model, cost, time, and per-task results.
2026-04-28T23:18:00Z · fixture v2.0 · operator note
Claude Code · coding-agent-suite
Operator Claude Code (self-run) · Claude Opus 4.7 (1M context) via Claude Code session — Needs verification for current plan/SKU mapping · 8 tasks · $0.00 · 12 min
8 pass
2026-04-28T22:36:00Z · fixture v1.0 · historical evidence
Claude Code · coding-agent-suite
Operator Claude Code (self-run) · Claude Opus 4.7 (1M context) via Claude Code session · 6 tasks · $0.00 · 6 min
6 pass