How we measure trovex's ~60%

trovex's public number is ~60% fewer tokens per .md doc-lookup. This page is the receipt: how we got it, what it covers, where it falls short, and how to check it on your own repo. ~60% is the conservative floor two separate methods both clear; we publish it deliberately lower than what we measured.

The number

Modeled estimate: median ~63%. A per-query model of the reads an agent avoids, assuming the unaided baseline reads the top-3 candidate docs.
Measured (answer-and-judge A/B): median 69% at equal task-success, range 41–81% (p25–p75).
What we publish: ~60%, the conservative floor under both. The reduction is on doc-lookup token spend, not a whole session.

How we measured it

26 pre-registered queries across eight categories (C1–C8), fixed before the run.
Corpus: trovex's own repo, 61 .md docs. One corpus, our own.
Both arms answer; an LLM judges. The judge (gpt-5.4-mini) scores whether each answer is responsive. No gold or human answers.
Baseline arm reads the top-3 candidate docs. trovex arm reads the one routed canonical doc plus its routing and index overhead, which we count against trovex.
A saving counts only when both arms answer correctly. Fewer tokens at worse quality does not count.

What it doesn't cover

The honest limits are the point; without them "measured" is just marketing.

n=26 is small, and it's a single corpus (ours).
An LLM judges responsiveness; there are no human or gold answers.
trovex got 3 answers wrong (wrong doc, or abstained when an answer existed).
The needle-in-a-large-corpus category (C8) underperformed this run. We don't claim it as a strength.

Check it yourself

The method is the claim, not a fixed percentage.

Reproduce our run: the harness (eval_bench.py + eval_llm.py) and the pre-registered query set are public.
On your own repo: the in-product savings view (trovex measure) shows your number. A doc-light repo saves less, and we'd rather say so.

See it on your own repo.

Point trovex at your repo and it prints the tokens it saved on each lookup — your number, not ours.

estimate on your numbers the method behind it

Open source. No cloud, no API keys. Your docs never leave your machine.