OpenHands

Open-source coding agent (formerly OpenDevin) from All Hands AI. Runs locally or self-hosted, model-agnostic.

ProvisionalEarly evidence
Early verdict — controlled benchmark pending

The score on this page is a provisional research-based estimate. No controlled benchmark suite has been completed for OpenHands yet, so this verdict cannot be cited as final proof and OpenHands is not eligible for "Verdict Certified" status. When a verified run lands, it will appear in the Evidence Timeline below and the status badge above will switch to "Verified".

Want this agent benchmarked sooner? Sponsored testing gets it into the queue without affecting the verdict.

Verdict

Open-source autonomous coder useful for research and customization. Reliability depends heavily on model and harness configuration. Placeholder verdict.

Best for
  • Researchers benchmarking agent loops
  • Teams that need self-hosting
  • Builders who want to extend the agent harness
Not ideal for
  • Non-engineers
  • Teams that want a polished managed UX

Failure modes we'd watch

  • Loop quality varies dramatically by model
  • Sandbox setup can fail in ways that look like agent failure
  • Long autonomous runs can cost more than a human dev hour

Evidence Timeline

No controlled benchmark runs published yet for OpenHands. The score above is a provisional estimate pending the first run. New runs land on the runs page.
Needs verification

The following fields are flagged for verification before we publish a non-provisional verdict:

  • pricingSummary
  • scoreBreakdown