Updated 19 June 2026

How do I reduce the token cost of giving a coding agent context?

Direct answer

Cut the repeated retrieval, not the one-time prompt. Most of a coding agent's context cost is paying repeatedly to reread the same docs to find which is current. Route each query to the single canonical doc that answers it, read at the section level, and skip stale and duplicate copies. Together that's about 60% fewer tokens per doc lookup.

Where does the cost actually go?

Not into the one big prompt people worry about — into the steady drip of lookups. Every session, an agent reopens the same files to re-establish context: which runbook is current, where the deploy steps live, what a past incident decided. Multiply that by every agent and every teammate and the retrieval bill dwarfs the occasional large prompt. Providers added prompt caching because re-sending context each call is a real cost — but caching only discounts the unchanging prefix, not the docs an agent rereads to re-find what's current. The fix is to make each lookup cheap and final, so it isn't re-paid.

The three levers that move the bill

What drives agent context cost, and the lever that cuts it
Cost driver	Lever
Rereading files to find the current one	Route each query to one canonical doc
Loading whole files for a small answer	Section-level reads
Paying for stale / duplicate copies	Freshness markers skip them
Re-deriving what another agent already found	Write-back so the fleet shares one answer

How trovex does it

trovex is an open-source, local-first MCP server built around these levers: one canonical answer per query, section-level reads, freshness markers, and a shared write path. It runs on your machine (SQLite + ONNX, no cloud or keys) and keeps a savings dashboard so the reduction is a number you can see, not a claim you have to trust.

FAQ

How much of the token bill is rereading versus real work?

Most of it. The expensive part of a doc lookup is the agent reopening several candidate files to work out which is current, then discarding them; the answer it keeps is a fraction of what it read. Cut the rereading and you cut the bill, without touching the model.

Does switching to a cheaper model cut the cost more?

Only at the margin, usually at the cost of quality. The spend is driven by how many tokens the agent reads to find one answer, not by the per-token price. Serving one canonical doc per query cuts the tokens read, which a cheaper model alone does not.

How do I measure my own savings?

trovex ships the benchmark as a command. On our own repo it measured a median of 69% fewer tokens per lookup at equal task-success across 26 queries, range 41 to 81%; we headline a conservative about 60%. Run uvx trovex bench on your repo for your number, or see the method at trovex.dev/measure.

See the tokens you'd stop spending.

Index your repo and let trovex serve one current answer per query.

uv tool install trovex

get started estimate your own number → See how it works

Open source. No cloud, no API keys. Your docs never leave your machine.