Cosmos DB for Event Sourcing? The Cost of Building Your Own Event Store

Tony Young avatar
Tony Young

When your cloud provider recommends building an event store from general-purpose components, it’s worth understanding what that actually means for your team.


If you’re evaluating event sourcing for a mission-critical system on Azure, there’s a good chance someone has suggested you build it entirely from Azure-native components: Cosmos DB as the event store, Azure Functions for projections, Service Bus for messaging, Azure SQL for read models, and Redis Cache for acceleration.

On paper, it sounds appealing. Familiar SDKs. Consumption-based pricing. Everything inside your Azure tenant. One vendor to manage.

But there’s an important question hiding behind that architecture diagram: who builds the event-sourcing engine?

The Architecture You’re Actually Being Sold

Event sourcing isn’t just “storing events in a database.” It’s a set of interconnected capabilities that work together: append-only streams with version tracking, optimistic concurrency at the stream level, global event ordering, subscriptions (both persistent and catch-up), projections, replay, and temporal queries. These aren’t nice-to-haves—they’re the mechanical foundation that makes event sourcing work.

When someone proposes Cosmos DB as an event store, what they’re really proposing is that your team designs, builds, tests, and maintains all of that infrastructure. Cosmos DB is a powerful general-purpose document database, but it has no native concept of event streams, no stream-level concurrency control, no global ordering guarantee, no built-in subscriptions, no projection engine, and no replay mechanism.

Each of those capabilities becomes a custom development project:

Stream abstraction. Cosmos DB stores JSON documents. Your team must design a partition key strategy that models event streams, enforce append-only semantics, and implement stream versioning—all in application code.

Optimistic concurrency. Cosmos DB provides ETags on individual documents, but event sourcing requires concurrency control at the stream level. Your team writes the version-check logic inside Azure Functions, handling conflict detection and retry behavior.

Global ordering. Event-sourced systems often need to replay all events in order—for audits, for rebuilding state, for populating new read models. Cosmos DB distributes data across partitions with no global ordering guarantee. Reconstructing a total order across partition feeds is a non-trivial distributed systems problem.

Projections. Every read model requires an Azure Function triggered by Change Feed, complete with lease management, checkpoint tracking, error handling, and its own deployment pipeline. KurrentDB provides a built-in projection engine, and as of v26, a SQL Sink that automatically materializes read models in PostgreSQL and SQL Server—through configuration, not code.

Subscriptions and replay. Both persistent subscriptions (server-side, with competing consumers) and catch-up subscriptions (client-side, for replay and projection rebuilds) must be implemented from scratch on top of Change Feed’s lease-based processing model.

Idempotency. In a purpose-built event store, idempotent writes are handled at the database level. In the Azure-native approach, every Azure Function must implement its own idempotency logic to prevent duplicate event processing.

This isn’t “operational simplicity.” This is building a bespoke event-sourcing framework.

The Services You’re Actually Operating

The Azure-native event-sourcing architecture doesn’t replace operational complexity—it distributes it. Instead of one system purpose-built for the job, your team operates and monitors:

  • Cosmos DB — partition strategy, Request Unit capacity planning, Change Feed configuration
  • Azure Functions — one or more per projection, plus integration handlers, each with scaling, cold start, and error behaviors
  • Service Bus — topics, subscriptions, dead-letter queues, shared access policies
  • Azure SQL — the read model that Cosmos DB can’t serve directly
  • Redis Cache — because the read path needs acceleration
  • Azure Monitor / Application Insights — observability fragmented across all of the above

Each service is a credential to rotate, a failure mode to handle, a scaling knob to tune, and a monitoring dashboard to watch. And when something goes wrong at 2 AM, your on-call engineer needs to understand how all six services interact to trace the problem.

What a Purpose-Built Event Store Gives You

KurrentDB was designed specifically for event sourcing. That means the capabilities listed above aren’t things you build—they’re things the database provides:

Native event streams with append-only semantics, stream-level optimistic concurrency, and version tracking are core data model primitives—not patterns you implement on top of a document store.

The $all stream provides a single, globally ordered log of every event in the system. Full-system replays, audits, and cross-aggregate queries are a single read operation.

Built-in projections let you create derived views of your event data directly within the database. As of v26, the SQL Sink connector automatically synchronizes read models to PostgreSQL and SQL Server—no custom Azure Functions pipeline required.

Persistent subscriptions provide a server-side subscription model where the database tracks position and distributes events across competing consumers—ideal for workload distribution across multiple processing nodes. Catch-up subscriptions are a separate mechanism designed for replay: a client reads from a specific position in a stream and then seamlessly transitions to receiving live events once it has caught up. Rebuilding a read model or populating a new reporting tool means starting a catch-up subscription from the beginning of the stream.

Time-travel queries are native. Read any stream at any version to reconstruct the state of an entity at any point in its history. For systems that need audit-ready compliance or the ability to reconstruct historical state, this is a foundational capability—not a feature you hope to build eventually.

Multi-stream atomic appends, introduced in v25.1, enable consistency across multiple event streams in a single operation—eliminating the need for saga patterns and process managers in scenarios where related entities must be updated together. Cosmos DB’s transactional batches are scoped to a single partition key.

Secondary and custom indexes (v25.1 and v26) deliver flexible query patterns with significant performance improvements while preserving event-sourcing integrity. The embedded Web UI includes a SQL query interface for ad-hoc exploration.

A comprehensive connector ecosystem ships with the database: Kafka (bidirectional), SQL Sink, MongoDB, RabbitMQ, Pulsar, Elasticsearch, and HTTP—all configuration-driven. For Azure Event Hubs specifically, KurrentDB’s Kafka connector integrates directly since Event Hubs exposes a Kafka-compatible endpoint.

”But What About Azure-Native Monitoring?”

This is one of the most common objections, and it’s based on a misunderstanding. KurrentDB doesn’t require a separate monitoring stack. It integrates directly into Azure’s own monitoring services:

  1. KurrentDB exposes a Prometheus metrics endpoint covering server health, stream throughput, subscription lag, and other production metrics.
  2. Azure Monitor managed service for Prometheus scrapes and stores these metrics in an Azure Monitor workspace—with full PromQL support, SLA guarantees, and 18-month retention.
  3. Azure Managed Grafana connects to Azure Managed Prometheus as a data source. Kurrent provides ready-made Grafana dashboards for KurrentDB that can be imported directly by dashboard ID or JSON upload.

The result is a fully Azure-native observability pipeline: KurrentDB Prometheus → Azure Managed Prometheus → Azure Managed Grafana. This is the same monitoring toolchain that Microsoft recommends for monitoring their own Azure services. For teams using Datadog or Dynatrace, KurrentDB also supports OTLP export.

The Cost Conversation

The “consumption-based pricing is cheaper” argument deserves scrutiny. The Azure-native approach isn’t one service with consumption pricing—it’s six or more services, each with its own billing meter:

Cosmos DB Request Units can spike dramatically during event replays, reprocessing, or peak traffic periods—precisely when your system needs to perform. Azure Functions bill per invocation for every Change Feed trigger, every projection update, every Service Bus handler. Service Bus charges per message for every integration event. Azure SQL and Redis Cache have their own pricing tiers. And Azure Monitor charges for ingestion across all of it.

KurrentDB pricing is predictable regardless of throughput spikes.

It’s also worth noting that Kurrent Cloud can be procured through an Azure Private Offer, which allows the purchase to count toward your organization’s Microsoft Azure Consumption Commitment (MACC). If the argument for Azure-native is “use your pre-allocated cloud spend,” that argument applies equally to Kurrent Cloud.

The Day 2 Question

Architecture decisions reveal their true cost on Day 2—when a new business requirement arrives.

With a purpose-built event store, adding a new read model means configuring a new projection or SQL Sink connector. The events are already there. The new view is populated by replaying history. No new services to deploy, no new Functions to write, no new monitoring to configure.

With the Azure-native approach, a new read model means writing a new Azure Function, configuring its Change Feed trigger with proper lease management, implementing checkpoint tracking and error handling, deploying it through the CI/CD pipeline, configuring monitoring and alerting, and managing its scaling behavior independently of every other Function. If it needs to process historical events, you must implement a custom replay mechanism.

This pattern repeats for every new projection, every new integration, every new reporting requirement. The custom infrastructure code you built on Day 1 becomes the maintenance burden you carry on Day 200, Day 500, and Day 1,000.

Why This Matters for AI

There’s a larger strategic question that most event-sourcing architecture discussions overlook entirely: what happens when you want to bring AI into your operations?

AI systems—whether they’re predictive models, copilots, or autonomous agents—are only as good as the context they operate on. And context isn’t a single dimension. Reliable AI requires four distinct types of context that your data infrastructure either provides or doesn’t:

Temporal context is the ability to understand when things happened and how state evolved over time. A traditional database gives you a snapshot—the current state of an order, an account, a resource. But an AI agent optimizing operations doesn’t just need to know where things stand right now. It needs to understand patterns, trends, and how state has changed over weeks or months. Event sourcing captures this naturally: every state change is an event with a position in its stream and a position in the global log. KurrentDB provides this through global ordering in the $all stream and total ordering within fine-grained per-aggregate streams—giving you both the system-wide timeline and the precise history of every individual entity. With a CRUD database—or a DIY event store where replay is a custom-coded afterthought—temporal context is either unavailable or expensive to reconstruct.

Causal context answers the question why something happened. In an event-sourced system, the sequence of events tells a causal story: an order was placed, which triggered a fulfillment process, which triggered an allocation that encountered a constraint, which triggered a fallback workflow. Each event is causally linked to what came before it. KurrentDB makes this explicit through $causationId—a metadata field on every event that points back to the event that directly caused it, forming a traceable chain of cause and effect. This is exactly what AI systems need to reason about process failures, identify root causes, and make informed decisions. A current-state database collapses this causal chain into a single row: “status = pending.” The why is gone.

Relational context captures how entities interact with each other across the system. In an event store, streams represent individual aggregates, but the relationships between those aggregates—which processes span multiple entities, which events belong to the same business transaction, which upstream changes propagate downstream—are equally important. KurrentDB provides this through $correlationId, a metadata field that tags related events across different streams with a shared identifier, and the built-in $by_correlation projection that automatically groups all events sharing the same correlation ID into a single queryable stream. An AI agent reasoning about operational throughput can follow a correlation ID to see the complete cross-entity picture of a business process, not just isolated entity states.

Semantic context is the meaning embedded in the events themselves. Domain events like OrderPlaced, PaymentAuthorized, FulfillmentStarted, and ShipmentDelayed carry business meaning in their names, their payloads, and their position in the stream. They describe what actually happened in the language of the domain. This is fundamentally different from a generic database update where a row changed from one state to another with no inherent semantics. When AI systems retrieve context from an event store, they get domain-meaningful facts—not raw data mutations that require additional interpretation.

Here’s the practical implication: if you build your event store from general-purpose components, these four dimensions of context are theoretically present in the events you store, but accessing them becomes a custom engineering project for each dimension. Global ordering for temporal context? Custom aggregation across Cosmos DB partitions. Causal chains? Custom application logic to trace event sequences. Cross-entity relationships? Custom projections coded in Azure Functions. And if your DIY event store has gaps in replay fidelity, idempotency, or ordering guarantees, the AI systems built on top of it inherit those gaps as hallucination risks and reasoning errors.

A purpose-built event store makes all four dimensions of context accessible as first-class capabilities. Global ordering in the $all stream and total ordering within fine-grained streams provide temporal context. The $causationId metadata field traces causal chains between events. The $correlationId field and the $by_correlation projection reveal relational context across entity boundaries. And the event-native data model preserves semantic context by design. When new AI use cases emerge—anomaly detection, predictive maintenance, autonomous agents—the foundation is already there. No architectural changes required.

This isn’t a future concern. Organizations are building AI-powered operations today, and the ones that invested in event sourcing are discovering that their event stores are the most valuable data asset they own—not because they planned for AI, but because event sourcing’s natural structure happens to be exactly what AI systems need.

Making the Decision

If your team is evaluating event sourcing on Azure, the question isn’t whether Azure-native components can be assembled into an event store. With enough engineering effort, they can. The question is whether that’s the best use of your team’s time and expertise—and whether the result will be robust enough to serve as the contextual foundation for AI when that requirement inevitably arrives.

A purpose-built event-sourcing database provides the mechanical foundation so your developers can focus on what actually differentiates your system: the domain logic, the business workflows, the operational intelligence, and increasingly, the AI capabilities that depend on rich historical context. Building event-sourcing infrastructure from general-purpose components is engineering effort that produces zero business differentiation—and may leave you with a foundation that can’t support the AI workloads coming next.

The most resilient architecture isn’t the one that minimizes the number of vendors. It’s the one that minimizes the amount of custom infrastructure code your team must build and maintain.


KurrentDB is an event-native database purpose-built for event sourcing. Kurrent Cloud is available on AWS, Azure, and GCP, and can be procured via Azure Private Offer to count toward your MACC. To learn more, visit the documentation or get started with a free cluster.