How do I reduce the token cost of giving a coding agent context?
Cut the repeated retrieval, not the one-time prompt. Most of a coding agent's context cost is paying again and again to reread the same docs to find which is current. Route each query to the single canonical doc that answers it, read at the section level, and skip stale and duplicate copies. Together that's about 60% fewer tokens per doc lookup.
Where does the cost actually go?
Not into the one big prompt people worry about — into the steady drip of lookups. Every session, an agent reopens the same files to re-establish context: which runbook is current, where the deploy steps live, what a past incident decided. Multiply that by every agent and every teammate and the retrieval bill dwarfs the occasional large prompt. The fix is to make each lookup cheap and final, so it isn't re-paid.
The three levers that move the bill
| Cost driver | Lever |
|---|---|
| Rereading files to find the current one | Route each query to one canonical doc |
| Loading whole files for a small answer | Section-level reads |
| Paying for stale / duplicate copies | Freshness markers skip them |
| Re-deriving what another agent already found | Write-back so the fleet shares one answer |
How trovex does it
trovex is an open-source, local-first MCP server built around these levers: one canonical answer per query, section-level reads, freshness markers, and a shared write path. It runs on your machine (SQLite + ONNX, no cloud or keys) and keeps a savings dashboard so the reduction is a number you can see, not a claim you have to trust.
See the tokens you'd stop spending.
Index your repo and let trovex serve one current answer per query.
Open source. No cloud, no API keys. Your docs never leave your machine.