Back to all posts

Two Surfaces, One Substrate

Madhu Murty avatar Madhu Murty
Two Surfaces, One Substrate
Two Surfaces, One Substrate

The work got done. How it got done is gone. The next infrastructure problem in AI agents isn't memory or orchestration — it's that almost nothing keeps a durable record of how the work actually happened, and you can't govern, debug, or improve what you can't reconstruct.

A two-million-dollar settlement posts twice.

The pipeline did everything it was told. Six agents — validate, screen, debit, convert, hand off, credit. The handoff timed out under load, the system retried, and because the posting step wasn't idempotent, the beneficiary got paid a second time. By the time anyone notices, the dashboard says one thing: complete. Every agent reports success. The money is gone, and the reconstructing exactly how the run unfolded now requires stitching together logs, traces, workflow history, and database records — assuming they still exist.

Now a different scene, same shape. A pull request, merged six months ago. The diff is two lines: batchSize: 500 became batchSize: 5. No comment. The engineer who made the change has moved teams. A new team owns the service, an incident hits, and someone opens the file to a fresh mystery. Why five? Was that a fix or a guess? Had anyone hit this before? The diff survived. The decision trail or the context didn't.

I've spent enough time in financial-services systems to recognize these two stories as the same story. In both, the decision persisted and the how evaporated. And in the agent era, that gap is about to get a lot more expensive.

One agent feels fine. The second one is the problem.

With a single agent on a single task, it's easy to believe you have this covered. You can read the transcript, watch it work, and see the answer. But that's the trap Part One named: a transcript is not a memory ( Your Agent has a Transcript, not Memory ) . It's a live readout that scrolls past, compacts, and resets — you were watching the work, not keeping a record of it. The gap was there with one agent all along. You just don't feel it until there's a second of anything.

A second agent, and now they have to coordinate — and coordination means one of them can quietly overwrite what the other did, or act on a stale view of the world, and you have no ordered account of who saw what when. A second person, and now the context that lived in one engineer's head (or one agent's session) has to travel — and it usually doesn't. A second point in time, and "future you," or the team that inherits the code, arrives with none of the reasoning that produced it.

The hard part was never one agent. It's the second one. And almost every organization deploying agents today is racing toward many agents — fanning out, handing off, running in production — while the substrate underneath them still assumes a single actor narrating its own story in real time.

The event is the first thing we throw away

Here's the uncomfortable part: we throw the most valuable thing away on purpose, by default.

An agent's reasoning is ephemeral by nature. It lives in a context window that scrolls, compacts, and resets. Agents now generate an enormous volume of work, and the decision trail behind that work is genuinely hard to reconstruct after the fact. What we keep is the output the agent produces — the merged code, the posted transaction, the final answer. What we discard is the decision trail.

For a long time that was acceptable. Computing was the bottleneck; humans were in the loop to remember the why. Soon this might not be true anymore, if it isn't already.Agents generate work at a scale no human can continuously supervise, and decisions accumulate faster than organizations can explain them. We are accumulating decisions at an incredible pace while keeping no durable account of how any of them were reached.

“A decision you can't reconstruct is a decision you have to take on faith.

The outcome survives. History doesn’t.

When teams notice the gap, they reach for tools they already have. Each one loses the thing that matters.

Last-write-wins state. Keep the current value, overwrite the rest. Fast, simple, and it erases the sequence — exactly the part you need when two agents disagree or a retry doubles a payment.

Checkpoint and restore. Snapshot the state periodically so you can roll back. But a snapshot is a moment in time like a photograph. It tells you where things stood, never how they got there. Flatten to a vector. Embed the session, store it for similarity search. Useful for "find me something like this," useless for "replay exactly what happened, in order." Semantics survive; causality doesn't.

Capture traces. Traces are invaluable for understanding behavior, but they were designed for observability, not as a system of record. They help explain what happened. They rarely serve as the authoritative source from which the system can be rebuilt.

All four share a tell: they treat the record of how the work happened as exhaust — something to sample, summarize, or discard — rather than as the asset.

The missing primitive is a record of how the work happened

Flip the assumption. Treat how the work happened as the durable thing, and the current state as a view you can always rebuild from it.

Concretely, that means an append-only, ordered, immutable record of events: in the exact sequence it occurred, that you can never quietly overwrite and can always replay. Current state becomes a projection over that.

We have trusted this system our whole life: the ledger

We already rely on a system built exactly this way, and we have since long before software existed. A ledger.

A ledger never stores the balance as the truth. It stores entries — a deposit, a withdrawal, a transfer — each one a fact about something that happened, in the order it happened. You don't erase a line you got wrong; you add a correcting one. The balance isn't written down anywhere as gospel. It's derived: you add the entries up. At the close of the day, or the month, you total what happened and the state falls out.

That's all an event is — a ledger entry for your system. A fact, recorded in sequence, that nobody quietly overwrites. State — the balance, the current status, the latest version of the file — is just what you get when you replay the entries. We've kept money this way since we started keeping track of money; double-entry bookkeeping is six centuries old. None of it is exotic, and none of it is new.

So what actually changed? Not the idea. The volume, and who's doing it. The entries are written now by agents, across systems no single person watches — and somewhere in the rush to ship, we started keeping only the balance and throwing the entries away. We stopped keeping the ledger.

That's the bet behind what we build at Kurrent: keep the ledger for everything your agents do. An ordered, immutable, replayable record of how the work happened, treated as the source of truth — with current state derived from it, not the other way around.

It's a category, because it shows up twice

Here's the part that convinced me this is a category and not a feature.

The exact same blind spot appears on two completely different surfaces.

On the production surface, it's agents running a workflow — the settlement that posted twice. The fix is an ordered, replayable record of the run, so a crash can be reconstructed cleanly, duplicates become detectable and preventable, and an auditor’s question takes minutes instead of weeks.

On the coding surface, its agents building the workflow — the PR with no “why”. The fix is a durable record of the session behind the work: what the agent tried, what the engineer rejected, why this approach won. So a reviewer sees the reasoning, not just the diff; a teammate inherits a session instead of a mystery; and the next agent to touch the code starts from the answer instead of a blank file. That's the surface we built Capacitor for.

Two surfaces. Same missing primitive. One substrate underneath both. When the same gap shows up in places that look this different — a payment pipeline and a codebase — you're not looking at a point solution. You're looking at a missing layer in the stack.

Why this is suddenly urgent

Three things changed at once.

Agents crossed from experiment into production. The question stopped being "can it do the task" and became "can you operate it" — and operating means reconstructing what it did when something goes wrong.

Regulatory oversight started to arrive. The first questions an auditor asks about an automated decision are who decided, on what basis, and can you show me. "The model did it" is not an answer. An immutable record is.

And the agents started talking to each other. The organizations furthest along aren't running one agent; they're running fleets that coordinate. Almost none of them are thinking about how agents communicate, hand off, and reconcile — which is precisely where the absence of an ordered record turns from a debugging annoyance into a correctness problem.

Strip it down and it's the oldest rule in systems engineering, wearing new clothes: you cannot govern, debug, improve, or trust what you cannot reconstruct. Agents didn't change that rule. They just made it apply to everything, all the time, faster than we can keep up by hand.

Human in the loop, or human on the hook?

There's a phrase everyone reaches for: human in the loop. Someone reviews the output before the decision goes out. It matters — but it's a statement about participation, about who was present, not about who owns the result.

Jason Forget, of Cockroach Labs, put a sharper frame on it that's stuck with me: human on the hook. His point — drawn from a court ruling he cited, where conversations with an AI chatbot weren't shielded by attorney-client privilege because the chatbot wasn't actually an attorney — is that using AI doesn't transfer responsibility. An agent can contribute to a decision; it does not own the consequences. The people and organizations deploying it do. Don't confuse participation with accountability.

Follow that one step further and you land right back on the record. If you're on the hook for what your agents do, you need to be able to show how they did it — to a regulator, to a customer, to the teammate who inherits the system, to yourself at 2 a.m. during an incident. Accountability you can't reconstruct isn't accountability; it's exposure. The ledger is what lets a human stand behind the outcome instead of merely being liable for it — answer the question, defend the call, or own the miss with the facts in hand. You cannot be on the hook for what you cannot reconstruct.

You can't reconstruct what you never recorded

The teams that get this right won't be the ones with the cleverest single agent. They'll be the ones who decided, early, that the record of how the work happened was worth keeping — ordered, immutable, replayable — on both the surface where agents run the work and the surface where they build it.

Everything else is hope with a dashboard.

In Part One, I argued that transcripts are not memory. This is the next layer down. Even with perfect memory, agents still need a durable record of how work moved through a system, across agents, and across time.

We're showing both surfaces live — production orchestration and the coding agents behind it, on one substrate — in Part Two of our series, Two Surfaces, One Substrate, on June 18. If the two stories at the top of this piece felt familiar, that session is for you. Register →