Updated 20 June 2026

Why do AI coding agents pick stale or outdated docs?

Direct answer

Because retrieval ranks by similarity, not freshness. A current runbook and an old postmortem can read alike, so the agent reads the closest match and a stale or duplicate copy gets served as truth. The fix is a freshness signal: mark each result canonical, stale, or duplicate, return the canonical doc, and keep stale copies out of the agent's context.

Why similarity search surfaces the wrong copy

Embedding search answers "which text is most similar to the query?" — not "which doc is current?" An old incident write-up and the current runbook can be almost identical in wording, so the stale one scores just as high. Worse, near-duplicate copies (a wiki page and its README twin) both rank, so the agent pays to read the same content twice and still has to guess which to trust.

What actually stops it

A freshness marker per result (canonical, stale, or duplicate) so the agent never reasons over an outdated copy.
Return one canonical doc, not a ranked list, so there's nothing to mis-rank.
Skip duplicates instead of sending the agent two copies of the same thing.

Why a newer file timestamp isn't the fix

The obvious patch, "prefer the most recently modified file," doesn't hold. A file's modified time changes on any edit, including a typo fix to a doc whose content is still wrong, so recency and correctness drift apart. And the similarity ranker ignores the timestamp entirely. Freshness has to be a signal carried in the retrieval result (this doc is canonical, that one is stale), not a filesystem date the agent never sees. It also compounds: every stale and duplicate copy the agent reads is tokens spent reasoning over the wrong source. The cost of that is laid out in what context costs your agents.

How trovex handles it

trovex indexes your repo's markdown and answers a query with the single current doc that addresses it — a path:line pointer with a freshness marker. Stale and duplicate copies are skipped, not ranked. Same answers, about 60% fewer tokens per lookup. It runs locally (SQLite + ONNX, no cloud or keys).

FAQ

Why does my agent pick an old doc over the current one?

Retrieval ranks by similarity, not freshness. An old postmortem and the current runbook read alike, so the stale one scores just as high. Without a freshness signal, nothing tells the agent which copy is current.

What is a freshness marker?

A per-result label (canonical, stale, or duplicate) that tells an agent which doc is the current source of truth, so it reads the canonical one and skips outdated or duplicate copies.

Does a newer file timestamp fix this?

Not reliably. A file's modified time changes on any edit, including to a doc that is still stale in content, and similarity search ignores the timestamp anyway. Freshness has to be a signal in the retrieval result, not just a filesystem date.

How do I stop agents reading stale docs?

Return one canonical doc per query with a freshness marker and keep stale or duplicate copies out of the agent's context. trovex does this locally, returning a path:line pointer marked canonical, stale, or duplicate.

Skip the stale copies.

trovex is open source and in public beta. Install it and serve your agents the one current doc per query.

uv tool install trovex

get started estimate your own number → See how it works

Open source (AGPL-3.0 core, MIT CLIs). Local-first — your docs never leave your machine.