Updated 20 June 2026

MCP context patterns for coding agents: dump, retrieve, or answer

Short version

When you wire project knowledge into a coding agent over MCP, you're choosing one of three patterns: dump everything into the window, return a set of candidate chunks to rank, or serve the single canonical doc that answers the query. They trade off cost, accuracy, and how much the agent has to do. For doc-heavy repos and multi-session work, the answer pattern wins on tokens (about 60% fewer per lookup) and on getting the current doc. The others have their place. Here's how to pick.

What is MCP doing here, briefly?

MCP (Model Context Protocol) is how an agent calls out to a tool for context: the agent sends a request, your MCP server returns content the agent folds into its reasoning. The protocol is plumbing. The interesting decision is what your server returns for a knowledge query. Three patterns cover almost everything.

Pattern 1: dump the repo

The server returns a large slab of the codebase or docs (think repomix, files-to-prompt). It fits small repos, one-off tasks, or when the agent genuinely needs broad context it can't predict.

What it costs: you pay for the whole slab on every call to use a fraction of it. It scales the worst, and a fat window can lower answer quality by burying the relevant part. Fine for small or exploratory work, expensive as a default.

Pattern 2: retrieve candidate chunks (RAG)

The server embeds the query, searches a vector store, and returns the top-k most similar chunks for the agent to sift. It fits large, heterogeneous corpora where no single doc is "the answer" and you want recall across many sources.

What it costs: you build and maintain the pipeline (chunking, embeddings, store, re-ranker), and you still hand the agent a pile to rank. Similarity ranking has no notion of current, so a stale chunk ranks next to a fresh one (more on that in why agents pick stale docs). The right tool for broad retrieval; a blunt one for "which doc is canonical."

Pattern 3: serve one canonical answer

The server returns the single current doc that answers the query, as a path:line pointer with a freshness marker, at the section level, not a pile. It fits doc-heavy repos with overlapping .md, multi-agent or multi-session work, and teams that need one source of truth.

What it costs: least per lookup (about 60% fewer tokens than reading candidates, see the token-cost math), and the agent does no ranking. The trade is that it's scoped to your authored docs, not arbitrary recall across everything. This is the pattern trovex implements, including a shared write path so agents save what they learn back to the same store.

How do I choose?

The three MCP context patterns, and when each one fits
Pattern	Fits	Cost per lookup
Dump the repo	tiny repo, one-off task, broad unpredictable context	highest — the whole slab, every call
Retrieve chunks (RAG)	broad, mixed corpus where recall matters	medium — a pile to rank, no freshness signal
Serve one answer	doc-heavy repo, a fleet, you want the current doc	lowest — ~60% fewer, no ranking

They also compose. A RAG layer for broad recall plus a canonical-answer layer for your core docs is a reasonable setup. The mistake is using dump-everything or raw RAG as the default for "which of my docs is the source of truth," then paying for it on every session.

Try the answer pattern on your own repo.

trovex is in private beta. Request access, point it at a repo, wire it into your agent over MCP, and watch the savings on your own docs.

Request beta access See setup per client

Open source. No cloud, no API keys. Your docs never leave your machine.

Standardizing how a team feeds context to agents? That's the kind of thing we help with — happy to talk.