Latent agents, real receipts

There is a particular kind of AI demo that goes viral every few weeks. An agent books a flight. An agent files a tax return. An agent runs an entire marketing campaign while the founder sips coffee. The demos are real. The receipts are not. This post is about what AI agents have actually done in the Catalyst systems we've shipped — the boring, profitable, repeatable work — and what they still cannot.

What “agent” means in our deployments

When we use the word, we mean something specific: a piece of software with a clear task, access to a few tools (read a database, call an API, draft an email, post to a Slack channel), and an LLM that decides which tool to use and what to put in it. Not a chatbot. Not a copilot. A small, scoped worker that runs on an event and produces a result a human can verify.

The agents we deploy are usually attached to one of two surfaces: an inbox (email, form submissions, lead intake) or a queue (tasks, tickets, orders). They run in the background. The user often doesn't know an agent did the work — they just notice the work happened.

Three things they do well

1 · Intake summarization

A client receives 40–120 contact-form submissions a week. The agent reads each one, extracts the structured bits (industry, project type, budget signals, urgency), drafts a one-paragraph summary, scores the lead, and routes it to the right person on the team. Time from form submission to a qualified, summarized lead in the CRM: under 90 seconds. The owner stopped reading raw forms in week two.

2 · Drafted replies

The agent doesn't send anything. It drafts. For every inbound message that fits a known pattern (quote request, scheduling question, post-job follow-up), the agent writes a reply in the brand voice, attaches relevant prior context, and puts it in the team's inbox marked as a draft. The team approves with one click or edits the draft. Reply times dropped from a median of 9 hours to a median of 35 minutes. Tone stayed consistent. Net writing time per team member dropped about 60%.

3 · Internal routing and tagging

The boring one, and the most valuable. The agent reads inbound tickets and tags them by category, priority, and the right owner. It posts an internal note when something looks unusual (an outage signal, an angry customer, a large opportunity). It never replies to the customer. It just helps the team see what to do next.

Three things they don't

1 · Anything irreversible without a human

We do not let agents send external email, charge cards, file forms, sign documents, or change billing. The model is good. It is not so good that an undo button on a wire transfer is acceptable. Every external action passes through a person. The agent makes the human's decision a one-click confirmation instead of a 20-minute task, but the human is still there.

2 · Long-horizon planning across messy data

“Plan the year's marketing calendar based on last year's data” sounds great in a demo. In practice, real client data is incomplete, inconsistent, and contradictory. The agent confidently produces a plan that's 30% correct and 70% confabulated. We don't ship that. Long-horizon work belongs with humans, with the agent helping at narrower scopes (draft a brief, summarize the meeting, propose three options).

3 · Things they were never meant to do

Once an agent is reliable at a task, the temptation is to extend its scope. “If it can summarize leads, why can't it also…”. The answer is usually that scope creep is where agents go from useful to embarrassing. We keep each agent narrow on purpose. If we need it to do a new thing, we add a new agent.

The shape of a useful AI deployment

If you abstract across the dozen-or-so production deployments we've done, the pattern is the same:

Pick one boring task. One inbox. One queue. One repetitive thing.
Define “done” in writing. Bullet-list outputs. Edge cases. What “wrong” looks like.
Ship a draft-only version. The agent suggests; a human ships.
Measure for two weeks. Accuracy, time saved, edits required.
Tighten the prompt and tools. Most early errors come from one or two patterns.
Promote it to autonomous only when the human edits drop to noise.

The fastest path to a useful agent is one boring task done well. The fastest path to a useless agent is a demo of five tasks done occasionally.

The receipts

Across our active CatalystOS deployments, the median saved time per team is somewhere between four and eleven hours per week. The variance is high, and the savings come from a handful of mundane agents working on inboxes and queues. None of them are doing anything that would make a great demo video. All of them are doing things their teams used to do at 9 PM.

That's the honest receipt. AI is not (yet) replacing the team. It is, very reliably, removing the work the team hated most.

Latent agents,
real receipts.

What “agent” means in our deployments

Three things they do well

1 · Intake summarization

2 · Drafted replies

3 · Internal routing and tagging

Three things they don't

1 · Anything irreversible without a human

2 · Long-horizon planning across messy data

3 · Things they were never meant to do

The shape of a useful AI deployment

The receipts

Custom AI agents wired into your operation.

What “agent” means in our deployments

Three things they do well

1 · Intake summarization

2 · Drafted replies

3 · Internal routing and tagging

Three things they don't

1 · Anything irreversible without a human

2 · Long-horizon planning across messy data

3 · Things they were never meant to do

The shape of a useful AI deployment

The receipts

Custom AI agents wired into your operation.

Keep reading.

The CRM you have is the workflow you tolerate.

Build software around your process.

Choosing an AI agency.