Question 1

Do AI coding agents really report false success?

Accepted Answer

Yes, and often. An agent grades its own run and reports the task complete while the environment state says otherwise. A 2026 study measured false success in up to 75.8% of self-assessing coding-agent runs. The agent is not lying on purpose; it has no external check on whether the work actually landed.

Question 2

Can an LLM judge catch false success?

Accepted Answer

Not reliably. In the same study an LLM judge added to grade the runs scored 0.54 to 0.65 AUROC, where 0.5 is a coin flip. Cheap calibrated detectors based on TF-IDF reached 0.83 to 0.95 and ran far faster, but the signal you can trust in production is a verified state change, not another model's vote.

Question 3

Why does an agent claim a task is done when it isn't?

Accepted Answer

Because the doer is also the checker. The same model that did the work decides whether it succeeded, reading its own transcript rather than the real state of the repo, tests, or build. With nothing external to disagree, a plausible-looking trace reads as success.

Question 4

What actually catches false success in production?

Accepted Answer

Gate on a verified state change, not self-report. Make success depend on something outside the agent's own turn: a passing test or build, a tool or eval response, or a canonical doc the agent reads as external input. Separate the doer from the checker so a single self-vote can never close the loop.

Do AI agents report false success?

Gate on verified state, not the agent's word

FAQ

Give your agents an external source of truth to check against.