VerifiedClaude Code is the first AI coding agent to clear AgentVerdict's verified gate.
Read the launch post →
Independent · No paid scores · Tested, not trusted blindly

The Trust Score for AI Agents.

We test AI agents against real tasks, document where they fail, and publish buyer-friendly verdicts before you trust them with your work.

Public build mode

AgentVerdict is in public build mode.

Initial agent pages are live, benchmark suites are published, and verified scores will replace provisional scores as controlled test runs are completed. Watch the work happen.

12
Agents indexed
6
Benchmark suites
1
Verified scores
2
Verified runs
Status legend: 11 provisional 1 verified 0 needs retest
Documented Tasks

Every score is backed by tasks we publish in advance — not vibes.

Reproducible Evidence

Inputs, outputs, costs, and failure tags. The receipts are public.

Independence Over Revenue

Sponsors can pay for testing. They cannot buy the verdict.

Top Provisional Verdicts

Highest provisional scores right now

Provisional scores. Numbers marked with an asterisk are research-based estimates pending controlled benchmark runs. How we score →
Categories

Find an agent for your job

How we test

Eight axes. Two tiers. Zero hand-waving.

Every Verdict Score (0–100) sums eight sub-scores covering task completion, accuracy, autonomy, reliability, speed, cost, safety, and operator UX. A score is verified only after a full benchmark suite has been completed against the agent.

Read the full methodology →
Verdict tiers
  • Elite90–100
  • Strong80–89
  • Useful but limited70–79
  • Risky / inconsistent60–69
  • Not trusted yet0–59
The Verdict

New verdicts in your inbox, weekly.

One short brief. The new agents we tested, the failures worth knowing, and the agent we'd actually pay for this week.

Free · No spam · Unsubscribe anytime