
I've been building AgentOps on KurrentDB — fifteen scenarios stress-testing the three layers production AI agents actually depend on: memory, signal, and state. The deeper I got into it, the more convinced I became that the AI infrastructure space is using the wrong word for the wrong layer.
"Memory" has become the most overloaded word in agent infrastructure. Most of what teams ship as agent memory is a session transcript piped into a vector database, with similarity search standing in for recall. It works till it doesnt - like when someone asks what the agent knew on Tuesday at 3 PM, and the room goes quiet.
That's not a durable memory. That's retrieval over a transcript with extra steps.
What memory actually is
AWS's AgentCore Memory has a clean definition: agent memory is information retained from prior interactions that can be recalled and used in future ones. That's right as far as it goes. We'd push it one step further — durable operational memory should be a retrievable record of what the agent actually did and saw, not just what an LLM later decided was important to remember. What many teams ship as memory is the extraction layered on top — a vector index of summaries, an embedding of a transcript — while the source interactions are often retained only temporarily, or stored separately from the memory layer itself. That's the architectural tradeoff nearly every AI team makes the first time: keep the summary, lose the proof.
What teams actually ship
Session ends. Transcript gets dumped into a vector DB — the choice of vector store doesn't matter. Next session, the agent does similarity search, stuffs the top-k into context, calls it memory.
Cross-session continuity becomes increasingly difficult to maintain as context grows. The agent recalls fragments, not the state it left things in. Engineers patch it by stuffing more transcripts into context; latency goes up, relevance goes down.
Audit fails. What did the agent know on October 14th at 2:47 PM? "Approximately these embeddings were probably retrieved" doesn't pass an audit review.
Vendor swap complicates it. A new model may embed text differently, forcing memory stores to be re-indexed or re-embedded. Operational history becomes tied to the characteristics of a particular inference stack.
Raw events age out; only extractions persist. AgentCore Memory, at the time of writing, retains raw interactions in short-term memory for a configurable window — 90 days by default and up to 365 days. The long-term layer keeps only what the LLM extracted into SEMANTIC, SUMMARY, USER_PREFERENCES, or CUSTOM strategies — not the source interaction stream. Once the raw events age out, you can't reprocess or reconstruct from interactions that no longer exist. Not a bug; the design.
These four are why we built AgentOps the way we did — on a durable record, not an extraction layer.
The right model
Memory and retrieval are often discussed as if they were the same thing. They're different layers. Retrieval — vector indices, summaries, knowledge graphs, semantic memories — helps an agent recall relevant information. A durable record preserves what actually happened. Both are valuable. Problems appear when one is mistaken for the other.
The memory layer should be derived from the record, not stand in for it. Retrieval helps an agent remember; the record lets an organization trust what the agent remembers.
An agentic AI backbone is a durable, ordered record of every event that shaped the agent’s decision: every tool call, model output, state transition. Events are immutable, globally ordered, retained as long as auditors require, replayable.
Retrieval is built on top. Short-term retrieval is a window over recent events; long-term retrieval is curated — extracted, indexed, queryable. AgentCore's SEMANTIC and SUMMARY strategies are extraction strategies. So is a vector index. So is a knowledge graph. Extractions are cheap and replaceable; the record is precious and singular. None of that works once the source events are gone.
Where this fits alongside existing tools
Memory tools today answer different questions with different designs. Mem0 focuses on extracting and retaining the memories an agent should carry forward. Zep enriches recall through conversational history, semantic relationships, and knowledge graphs. Letta introduces a distinction between working memory and archival memory, helping agents manage context over long periods. LangGraph brings durability and resumability to workflow execution through checkpointed state.
These capabilities are valuable, but they all operate above a more fundamental layer: the record itself. Before a memory can be extracted, a graph can be built, or a workflow can be resumed, something must preserve what actually happened. An agentic AI backbone fills that role by capturing every prompt, tool call, state transition, policy decision, and outcome as an immutable stream of events. Memory systems can evolve. Retrieval strategies can change. Models can be replaced. The record remains. That is what makes replay, audit, reconstruction, and long-term operational continuity possible.
What this gives you
We built fifteen scenarios into AgentOps, organized across the three layers — Memory, Signal, State. Six of them show the difference most directly. Each becomes operationally expensive — or unreliable — on transcript-and-vector-DB stacks:
Stale data detection. Replay the events the agent saw against the current world state; prove it acted on invalidated data.
Agent crash recovery. Restart from the last durable event and reconstruct the state deterministically. No lost reasoning history.
Auditor queries. Point-in-time reconstruction returns what the agent knew at the timestamp asked, with lineage from input event to action.
Vendor swap. The operational state lives in the backbone. Change the LLM; the agent's history persists.
Multi-agent coordination. Global event ordering means agents operating on the same stream can reconstruct the same sequence of events and decisions — handoffs and fan-out from one ordered source instead of nondeterministic pub/sub.
Point-in-time reconstruction. What did the system look like at 14:23:07 on October 14th? The events answer — assuming policies, prompts, and model versions are also treated as events.
The dominant pattern today is extraction-based memory — a layer that summarizes the agent's history into vectors and stores it elsewhere, often without preserving the underlying record. The store optimizes for retrieval; it doesn't preserve truth. Lineage from a stored memory back to the originating event is a citation, not deterministic replay. That's not a flaw in any one product — it's the consequence of treating extraction as a substitute for the record. When extraction replaces the record instead of deriving from it, you lose the only thing that lets you trust what the agent remembers.
Deployment strategies
When I talk to platform teams about this, they fall into one of four adoption patterns:
Sidecar. Run the agentic AI backbone next to the existing agent stack as the durable record. Current memory tooling keeps working but now has an ordered, replayable feed. Lowest disruption, fastest path to a defensible audit story.
Replace. Swap the extraction-based memory layer for native retrieval off the backbone. Native vector and FTS make this viable without a separate index store. Higher migration cost, lower long-term operational surface.
Under a managed memory service. Place the agentic AI backbone behind a service like AgentCore Memory. The managed service continues doing what it does best — extracting facts, summaries, preferences, and semantic context — while the agentic AI backbone preserves the underlying operational record. The two layers complement each other rather than compete.
Greenfield. New agent platforms start with the agentic AI backbone from day one, build retrieval natively. Cleanest architecture; only available at the start.
The 3 Operational Actions
From the agent's perspective:
Record — the durable operational history beneath what memory systems expose as recall, preferences, summaries, and context. Unlike extracted memories, the record preserves the underlying events themselves.
Signal — what Microsoft calls Connected Agents and Google calls ADK multi-agent, with global event ordering instead of unordered handoffs.
Reason — replayable, projectable, point-in-time over the durable record, with plans, handoffs, and resumption preserved end-to-end. What workflow engines keep hidden inside their SDK runtime, Kurrent keeps observable to any subscriber. Distinct from what the LLM does inside its forward pass — the LLM reasons inside the model; the system reasons over the record.
The $4 million question
Picture a fraud agent at your firm approving a $4 million wire transfer. Two other agents weighed in first — one flagged a velocity anomaly and wanted to block; the other cleared it after correlating against the customer's recent activity. Your agent makes the call. The money moves.
The next morning, the regulator calls: walk me through exactly what the agent saw, why it decided, which model version was in production, what the other agents said, what policy was in force.
Can you answer in minutes — or does it take weeks?
The answer was decided long before the wire moved, by one architectural choice: whether your agents' memory is the durable record of what actually happened, or a summary of it. Get it right and the rest composes — signal across agents, state across handoffs, audit across years. Get it wrong and every layer is shipping summaries of summaries.
Extraction keeps the summary. The record keeps the proof. Start with the record.
This is where the webinar starts. The $4 million question opens Part 1 of our AI agent infrastructure series, live June 3 — seven scenes against a real KurrentDB cluster: a fraud decision reconstructed after the fact, two agents in conflict with both positions preserved, an agent resuming from the exact event after a mid-workflow crash, two model versions replayed against each other to measure drift. Memory, Signal, State — proven on stage, not asserted on a slide. Register Here
