trovex vs a vector-DB RAG setup
Rolling your own RAG gives you top-k chunks and a pipeline to maintain. trovex is the turnkey version for repo docs: one canonical answer, locally, over MCP.
A vector-DB RAG setup is a pipeline you build: chunk your docs, embed them, store them in Pinecone / pgvector / Chroma, and retrieve the top-k similar chunks at query time. trovex is a turnkey local tool that returns one canonical doc with a freshness marker over MCP — no chunking to tune, no ranking for the agent to do, no separate infra to run, about 60% fewer tokens per lookup. RAG gives you candidate chunks; trovex gives you the current answer.
What does a vector-DB RAG setup involve?
Several moving parts you own: a chunking strategy, an embedding model, a vector database (Pinecone, pgvector, Chroma, and so on), a retrieval step that returns the top-k most similar chunks, usually a re-ranker, and the glue that feeds it to your agent — increasingly an MCP server you also write. It's flexible for custom or heterogeneous retrieval, but it's infrastructure you build and maintain.
Where does raw RAG fall short for repo docs?
- Chunks, not answers. Top-k similar chunks still leave the agent to read, rank, and decide which is authoritative.
- No freshness. Similarity doesn't know which copy is current; a stale or duplicate chunk can rank high.
- You build and run it. Chunking, retrieval, ranking, the vector store, and the MCP wiring are all yours to maintain — and often a hosted DB with API keys.
How is trovex different?
trovex is the opinionated, turnkey version for a repo's markdown. It still uses embeddings and vectors — ONNX embeddings in a local SQLite store — but instead of returning a ranked list, it resolves a query to the one canonical doc, serves the section that answers, marks stale and duplicate copies, and lets agents write records back. You install it and point it at your repo; there's no pipeline to assemble and no cloud DB or keys.
| Capability | Vector-DB RAG (DIY) | trovex |
|---|---|---|
| What a query returns | Top-k similar chunks | ✓ one canonical doc, section-level |
| Setup | — build chunking + DB + retrieval + MCP | ✓ install, point at repo |
| Freshness signal | — similarity only | ✓ canonical / stale / duplicate |
| Agent sifts & ranks | — yes | ✓ no — one answer |
| Write-back / shared memory | ~ build it yourself | ✓ shared write path |
| Runs locally, no keys | ~ often hosted DB + keys | ✓ SQLite + ONNX |
When is a custom RAG stack the right choice?
When you need control RAG gives you: heterogeneous sources beyond markdown, very large or specialized corpora, custom ranking, or an existing vector-DB investment you want to extend. Building it yourself is the right call there. trovex is the better fit when the job is your project's docs and you'd rather have the opinionated, local, one-answer behavior out of the box than assemble and maintain the pipeline.
FAQ
What is the difference between trovex and a vector-DB RAG setup?
A vector-DB RAG setup is a pipeline you build: chunk, embed, store in a vector DB, retrieve the top-k similar chunks. trovex is a turnkey local tool that returns one canonical doc with a freshness marker over MCP — no chunking to tune, no ranking for the agent, no separate infra. RAG gives you candidate chunks; trovex gives you the current answer.
Why not just build RAG over a vector database?
You can, and for custom or large-scale retrieval it may be right. But raw RAG returns top-k similar chunks without telling the agent which is current, often includes near-duplicates, and leaves you to maintain chunking, retrieval, ranking, and MCP wiring. trovex packages the opinionated version for repo docs, locally, with no API keys.
Does trovex use embeddings and vectors?
Yes — it embeds your markdown with ONNX and stores vectors in SQLite locally. The difference is what it does with them: instead of a ranked list of chunks, it resolves a query to the one canonical doc and serves the section that answers, with a freshness marker.
Skip the pipeline. Get the answer.
Install trovex, point it at your repo, and serve one canonical answer per query.
Open source. No cloud, no API keys. Your docs never leave your machine.