Fine-tune your models with Skills

Lokhesh Ujhoodha avatar
Lokhesh Ujhoodha

Using KurrentDB Event Sourcing to Train Smarter Financial AI Models


Preface

This is a prototype project built entirely using two Claude Code skills and synthetic data:

The goal of this prototype is to demonstrate what’s possible when combining event sourcing with modern AI fine-tuning techniques. This is not production-ready code, but rather a working proof-of-concept that shows:

  1. How event-sourced data naturally fits sequence prediction tasks
  2. How small models can achieve dramatic improvements with domain-specific training
  3. How existing event stores can be leveraged as training data sources

We encourage you to use this as a starting point for your own experiments. The patterns shown here - generating training data from event streams, fine-tuning with LoRA, and evaluating process prediction accuracy - can be adapted to any domain where you’re capturing sequential business events.

Source Code: The complete prototype is available alongside this article.


How Skills Made This Easy

This entire prototype was built through a conversational workflow with Claude Code, leveraging two specialized skills. The KurrentDB skill provided correct client packages, connection string formats, and event sourcing patterns like stream naming conventions ({aggregate}-{id}). The Hugging Face Model Trainer skill supplied TRL training patterns, LoRA configuration best practices, and current API usage - including knowing that TRL’s tokenizer parameter had recently changed to processing_class. Together, these skills eliminated the typical research-debug-fix cycle, providing working code patterns on the first attempt.

The result was dramatic time savings. Event schema design took 5 minutes, the data generator 10 minutes, training script setup 5 minutes, and debugging just 5 minutes. Combined with 27 minutes of model training and 10 minutes for evaluation, the entire prototype was completed in approximately 1 hour - a task that would typically require days of documentation research, API debugging, and trial-and-error.


Introduction

Large Language Models have transformed what’s possible in AI, but training them for domain-specific tasks remains challenging. Generic models lack the specialized knowledge needed for complex business processes. In this article, we demonstrate how KurrentDB’s event sourcing architecture provides an ideal foundation for training AI models that understand business workflows.

We fine-tuned a small language model (360M parameters) to predict the next event in financial processes. When deployed as an agentic anomaly detector, the model achieved 100% recall - catching every process anomaly in our benchmark - with just 27 minutes of training on consumer hardware.

The Problem: AI Models Don’t Understand Business Processes

Consider a trade order in a financial system. A generic LLM has no understanding that:

  • OrderSubmitted should be followed by OrderValidated
  • After validation, orders get routed to exchanges
  • Settlements happen T+2 after execution
  • Risk alerts trigger mitigation workflows

Without domain knowledge, models can’t help with process automation, anomaly detection, or workflow optimization.

The Solution: Event Sourcing as Training Data

KurrentDB stores every state change as an immutable event in time-ordered streams. This architecture is perfectly suited for training AI models because:

  1. Events capture causality - Each event is caused by previous events
  2. Streams represent processes - A stream like trade-ABC123 contains the complete lifecycle
  3. Order is preserved - Events appear in the exact sequence they occurred
  4. Context is rich - Events include metadata about what happened and why

Our Approach

We used KurrentDB to store financial process events across five workflow types:

KurrentDB Streams:
  trade-{order_id}      -> OrderSubmitted, OrderValidated, OrderRouted, OrderFilled...
  payment-{payment_id}  -> PaymentInitiated, PaymentValidated, PaymentApproved...
  risk-{alert_id}       -> RiskLimitBreached, RiskAlertCreated, RiskMitigationStarted...
  compliance-{check_id} -> ComplianceCheckTriggered, ComplianceFlagRaised...
  account-{app_id}      -> AccountApplicationSubmitted, AccountKYCStarted...

Each stream captures a complete process lifecycle - exactly the sequential understanding we want the model to learn.

Training Data Generation

From KurrentDB streams, we generated training examples in a simple format:

Input: “Given these events: OrderSubmitted, OrderValidated, OrderRouted - What comes next?”

Output: “OrderFilled”

This creates a supervised learning task where the model learns to predict the next event given the process history.

Sample Training Examples

Process: Trade Lifecycle
Input:  [OrderSubmitted]
Target: OrderValidated

Process: Payment (with approval)
Input:  [PaymentInitiated, PaymentValidated, PaymentPendingApproval]
Target: PaymentApproved

Process: Risk Management
Input:  [RiskLimitBreached, RiskAlertCreated, RiskAlertAcknowledged]
Target: RiskMitigationStarted

We generated 1,487 training examples from 200 process sequences.

Training the Model

We fine-tuned SmolLM2-360M-Instruct using LoRA (Low-Rank Adaptation) for just 3 epochs on 1,487 training examples.

Training Efficiency

MetricValue
Training Time27 minutes
GPURTX 4070 Laptop
Model Size360M parameters
MethodLoRA (rank 16)
Final Loss0.034

A small model, minimal compute - because the training data captured exactly what we wanted the model to learn.

Why Event Sourcing Works for AI Training

1. Natural Sequence Structure

Events in KurrentDB are inherently sequential. Unlike traditional databases where you’d need complex queries to reconstruct process flows, event streams are already in the exact format needed for sequence prediction tasks.

Stream: trade-ABC123
Position 0: OrderSubmitted
Position 1: OrderValidated
Position 2: OrderRouted
Position 3: OrderFilled
Position 4: TradeBooked
Position 5: TradeSettled

2. Rich Context Without Labeling

Each event carries meaningful context - timestamps, entity IDs, status fields, amounts. This metadata helps the model understand not just what happens, but why certain transitions occur.

{
  "event_type": "PaymentPendingApproval",
  "data": {
    "payment_id": "PAY-123",
    "amount": 500000.00,
    "approval_level": "dual",
    "required_approvers": 2
  }
}

The model learns that large payments (like $500K) require dual approval - domain knowledge embedded in the event structure.

3. Process Variants Are Captured Naturally

Real processes have branches and exceptions. Event sourcing captures all variants:

  • Trade success: OrderSubmitted -> OrderValidated -> … -> TradeConfirmed
  • Trade rejection: OrderSubmitted -> OrderRejected
  • Trade cancellation: OrderSubmitted -> … -> OrderCancelled

The model learns both the happy path and exception handling.

4. Temporal Relationships Are Preserved

Event sourcing maintains exact ordering and timestamps. The model learns temporal patterns:

  • Settlement happens 2 days after execution
  • Compliance reviews have SLA deadlines
  • Risk mitigation follows acknowledgment

Practical Applications

A model trained on process events enables several powerful capabilities:

1. Process Automation

  • Auto-suggest next steps in workflow UIs
  • Pre-populate forms based on expected events
  • Trigger automated actions for predictable transitions

2. Anomaly Detection

  • Flag unexpected events for investigation
  • Detect process violations in real-time
  • Identify mixed or corrupted event streams

3. Compliance Monitoring

  • Ensure required steps aren’t skipped
  • Detect when approval workflows are bypassed
  • Alert on missing compliance events

Agentic Anomaly Detection: A Practical Benchmark

To demonstrate practical utility beyond prediction accuracy, we built an agentic process orchestrator - an AI agent that monitors event streams in real-time, predicts expected events, and flags anomalies when actual events don’t match predictions.

How It Works

Event Stream: trade-ABC123
  [1] OrderSubmitted     -> Agent predicts: OrderValidated
  [2] OrderValidated     -> Agent predicts: OrderRouted (matches!)
  [3] PaymentFailed      -> Agent: ANOMALY! Expected OrderRouted, got PaymentFailed

The agent monitors each transition, comparing its prediction against the actual event. When they differ significantly, it flags an anomaly with an explanation and recommended action.

Benchmark Design

We tested the orchestrator with two types of sequences:

  1. Normal sequences (50): Valid process flows the model should recognize
  2. Anomalous sequences (50): Sequences with intentional errors:
    • Wrong event type injected
    • Steps skipped in the workflow
    • Events out of expected order

Results: The Agent as a Watchdog

MetricValueInterpretation
Recall100%Caught every anomaly
Precision58.8%Some false alarms
F1 Score74.1%Good overall balance

The confusion matrix tells the story:

                    Predicted
                 Normal  Anomaly
Actual  Normal     15       35
        Anomaly     0       50
  • True Positives: 50 - All 50 anomalies were detected
  • False Negatives: 0 - No anomalies slipped through
  • False Positives: 35 - Some normal sequences flagged (acceptable for a watchdog)

Why 100% Recall Matters

For a monitoring system, missing an anomaly is worse than a false alarm. A payment routed to the wrong account, a compliance step skipped, or an unauthorized trade execution - these are costly mistakes that must be caught.

The agent achieved perfect recall: every injected anomaly was detected. The 35 false positives represent cases where the model was uncertain about valid but less common transitions - these can be reviewed quickly and provide learning opportunities.

Example Detection

Sequence: OrderSubmitted -> OrderValidated -> PaymentFailed -> OrderFilled

Agent Analysis:
  After OrderValidated: saw 'PaymentFailed'
  Expected: 'OrderRouted'

  [ANOMALY DETECTED]
  Reason: Expected trade event 'OrderRouted', but received payment
          event 'PaymentFailed'. Events from different process types
          are mixed.

  Action: Investigate why PaymentFailed appeared in this stream.
          Check for event routing errors.

The agent not only detects the anomaly but explains why it’s unexpected and suggests investigation steps - turning raw predictions into actionable insights.

Running the example

You can find the source code in the examples folder of the test_scenario_huggingface branch at https://github.com/kurrent-io/coding-agent-skills/tree/test_scenario_huggingface Or you can create your own with a few prompts with Kurrent Skills and Huggingface Skills.

Step 1: Set Up KurrentDB

docker run -d --name kurrentdb \
  -p 2113:2113 \
  docker.kurrent.io/kurrent-latest/kurrentdb:latest \
  --insecure --run-projections=All

Step 2: Define Your Events

@dataclass
class OrderSubmitted:
    order_id: str
    account_id: str
    symbol: str
    quantity: float
    timestamp: str

Step 3: Generate Training Data

def create_training_example(events, position):
    context = [e["event_type"] for e in events[:position]]
    target = events[position]["event_type"]

    return {
        "messages": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Events: {context}\nWhat's next?"},
            {"role": "assistant", "content": target}
        ]
    }

Step 4: Fine-Tune with LoRA

from trl import SFTTrainer, SFTConfig
from peft import LoraConfig

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
    args=SFTConfig(num_train_epochs=3, learning_rate=2e-4)
)
trainer.train()

Conclusion

Event sourcing isn’t just an architecture pattern - it’s a training data goldmine. KurrentDB’s immutable, ordered event streams provide exactly the sequential structure that AI models need to learn business processes.

Our experiment demonstrated:

  • 100% anomaly recall - the agentic watchdog caught every process anomaly
  • 27 minutes of training on consumer hardware
  • 360M parameter model - no massive infrastructure needed
  • 1,487 training examples generated automatically from event streams

The key insight: your event store already contains the training data. Every process that flows through KurrentDB creates examples of “given this history, this happens next.” And when combined with an agentic architecture, that predictive capability transforms into real-time monitoring that catches every process anomaly.

For organizations already using event sourcing, fine-tuning AI models on process events is a natural next step. The events you’re capturing for audit, replay, and debugging are also the foundation for intelligent automation and proactive anomaly detection.


Resources

About This Experiment

  • Model: SmolLM2-360M-Instruct
  • Training Method: SFT with LoRA
  • Event Types: 30+ financial process events
  • Process Types: Trade, Payment, Risk, Compliance, Account
  • Training Examples: 1,487
  • Hardware: NVIDIA RTX 4070 Laptop GPU