Test Runs

Verified test runs

Every controlled benchmark run AgentVerdict has published. Each entry is the receipt behind a verified Verdict Score — operator, environment, model, cost, time, and per-task results.