Does a bigger context window make rereading docs cheaper?
No. A bigger context window raises the ceiling on how much you can load, not the efficiency of loading it. You still pay for every token you put in the window, on every session, across every agent — so rereading the same docs is exactly as expensive per token, you can just do more of it. And a bigger window isn't a more current one. Serving one canonical answer per query is about 60% fewer tokens per lookup.
Why doesn't a larger window lower the cost?
Because pricing is per token, not per window. A 1M-token window doesn't discount the 20k tokens you load into it — it just lets you load more. If your agent rereads the same runbooks and READMEs each session to find which is current, a bigger window means it can reread more of them, not more cheaply. The cost compounds across sessions, agents, and teammates regardless of window size.
Doesn't more context at least mean better answers?
Not on its own. A bigger window can hold the stale copy just as easily as the canonical one, so it does nothing for correctness — and flooding the window with near-duplicate files can dilute the relevant part and make answers worse. More context is not the same as the right, current context.
What lowers the cost for real?
Loading less, but the right thing. Route each query to the single canonical doc, read the section that answers, and skip stale and duplicate copies. trovex does exactly this as an open-source, local-first MCP server — one current answer per query with a freshness marker — and shows you the tokens it saved, independent of how big your model's window is.
Load less, the right thing.
Serve your agents one canonical answer per query — whatever your window size.
Open source. No cloud, no API keys. Your docs never leave your machine.