trovex Compare Answers Use cases Setup star on GitHub

Updated

How do I debug a multi-agent failure?

Short answer

Find which agent and step caused it: that is failure attribution, and it needs the full execution trace, inputs and context, not just outputs. A 2026 benchmark (arXiv:2604.22708) found full traces improve attribution accuracy by up to 76% over output-only logs. Output-only logging hides most failure causes. Capture inputs, keep runs reproducible, and make the trace step-addressable.

Capture inputs, not just outputs

Every agent in the log can report that it did its part while the system still fails. The fault usually lives in what one step fed the next, which an output-only log never shows. So log each step's inputs and the context it read, keep the run reproducible so you can replay the exact failed path, and make the trace step-addressable so you can point at one node and inspect it. One thing you can take off the table up front: feed agents context from a canonical source you can inspect, so stale or wrong context is not the variable you are chasing.

FAQ

How do I find which agent caused a multi-agent failure?

That is failure attribution, and it needs the full execution trace, not just the final outputs. You have to see each step's inputs and the context it ran on to tell which agent and which turn went wrong. A 2026 benchmark found full traces improve attribution accuracy by up to 76% over output-only logs. If you only logged outputs, you usually cannot tell what broke.

Do you need full traces to debug AI agents?

For attribution, yes. Output-only logging hides most failure causes because the wrong output often comes from a bad input or stale context two steps earlier. Capture each step's inputs and context, not just what it returned, and keep the run reproducible so you can replay the exact path that failed.

Why isn't output logging enough to debug agents?

Because every agent in the log can report that it did its part while the system still fails. The fault is usually in what one step fed the next, which an output-only log never shows. The trace you trimmed to save space is the one that could have told you what broke.

What makes a multi-agent trace useful for debugging?

Make it step-addressable: each step records its inputs, the context it read, and its output, so you can point at one node and replay it. Capture inputs, not just results; keep runs reproducible; and feed agents context from a canonical source you can inspect, so stale or wrong context is one less variable when you trace a failure.

Take stale context off the list of suspects.

trovex is not a tracing tool, but it removes one common failure source: agents acting on stale or wrong context. It serves one canonical doc per query, current and inspectable, so when you trace a failure that variable is already controlled. Open source, local, about a minute to set up.

uv tool install trovex

Open source. No cloud, no API keys. Your docs never leave your machine.