How we measure trovex's ~60%
trovex's public number is ~60% fewer tokens per .md doc-lookup. This page is the receipt: how we got it, what it covers, where it falls short, and how to check it on your own repo. ~60% is the conservative floor two separate methods both clear; we publish it deliberately lower than what we measured.
The number
- Modeled estimate: median ~63%. A per-query model of the reads an agent avoids, assuming the unaided baseline reads the top-3 candidate docs.
- Measured (answer-and-judge A/B): median 69% at equal task-success, range 41–81% (p25–p75).
- What we publish: ~60%, the conservative floor under both. The reduction is on doc-lookup token spend, not a whole session.
How we measured it
- 26 pre-registered queries across eight categories (C1–C8), fixed before the run.
- Corpus: trovex's own repo, 61
.mddocs. One corpus, our own. - Both arms answer; an LLM judges. The judge (gpt-5.4-mini) scores whether each answer is responsive. No gold or human answers.
- Baseline arm reads the top-3 candidate docs. trovex arm reads the one routed canonical doc plus its routing and index overhead, which we count against trovex.
- A saving counts only when both arms answer correctly. Fewer tokens at worse quality does not count.
What it doesn't cover
The honest limits are the point; without them "measured" is just marketing.
- n=26 is small, and it's a single corpus (ours).
- An LLM judges responsiveness; there are no human or gold answers.
- trovex got 3 answers wrong (wrong doc, or abstained when an answer existed).
- The needle-in-a-large-corpus category (C8) underperformed this run. We don't claim it as a strength.
Check it yourself
The method is the claim, not a fixed percentage.
- Reproduce our run: the harness (
eval_bench.py+eval_llm.py) and the pre-registered query set are public. - On your own repo: the in-product savings view (
trovex measure) shows your number. A doc-light repo saves less, and we'd rather say so.
See it on your own repo.
Point trovex at your repo and it prints the tokens it saved on each lookup — your number, not ours.
Open source. No cloud, no API keys. Your docs never leave your machine.