Question 1

What is agent operations (agent-ops)?

Accepted Answer

The discipline of running AI agent fleets reliably in production. It covers four operating problems: reliability, observability, context, and cost. These decide whether agents work outside a demo. It is operations work, not a model or framework choice.

Question 2

What are the four operating problems?

Accepted Answer

Reliability: agents report success they did not achieve, so you gate on a verified state change. Observability: when a fleet fails you need the full execution trace to find which step broke. Context: agents reread the repo and drift, so you serve one canonical source per query. Cost: token spend compounds across every agent and teammate, so you cut the repeated retrieval centrally.

Question 3

Do I need a platform to run agents in production?

Accepted Answer

Not a single product. You need the operating model plus a tool for each problem. tsukumo open-sources a tool for each part (yoru, trovex, dokan, wrai.th) and consults on the model. Today that is a discipline and a set of open components, not one finished platform.

Question 4

How does tsukumo fit?

Accepted Answer

tsukumo is a Swiss dev studio that runs agent fleets in production to ship its own software. That operation is the proof. It open-sources a tool for each operating problem and consults with teams on running the model at scale.

How do I run AI agent fleets in production?

The four operating problems

FAQ

Want help running agents in production?