Fine-tune your models with Skills

Lokhesh Ujhoodha

•Mon Dec 15 2025

Beginner Kurrent Client Ai Agent Kurrentdb Event Sourcing Aiml

Using KurrentDB Event Sourcing to Train Smarter Financial AI Models

Preface

This is a prototype project built entirely using two Claude Code skills and synthetic data:

KurrentDB Skills - Provides KurrentDB (EventStoreDB) client code patterns for efficient and accurate code generation.
Hugging Face Model Trainer Skill - Guidance for fine-tuning language models using TRL on Hugging Face infrastructure

The goal of this prototype is to demonstrate what’s possible when combining event sourcing with modern AI fine-tuning techniques. This is not production-ready code, but rather a working proof-of-concept that shows:

How event-sourced data naturally fits sequence prediction tasks
How small models can achieve dramatic improvements with domain-specific training
How existing event stores can be leveraged as training data sources

We encourage you to use this as a starting point for your own experiments. The patterns shown here - generating training data from event streams, fine-tuning with LoRA, and evaluating process prediction accuracy - can be adapted to any domain where you’re capturing sequential business events.

Source Code: The complete prototype is available alongside this article.

How Skills Made This Easy

This entire prototype was built through a conversational workflow with Claude Code, leveraging two specialized skills. The KurrentDB skill provided correct client packages, connection string formats, and event sourcing patterns like stream naming conventions ({aggregate}-{id}). The Hugging Face Model Trainer skill supplied TRL training patterns, LoRA configuration best practices, and current API usage - including knowing that TRL’s tokenizer parameter had recently changed to processing_class. Together, these skills eliminated the typical research-debug-fix cycle, providing working code patterns on the first attempt.

The result was dramatic time savings. Event schema design took 5 minutes, the data generator 10 minutes, training script setup 5 minutes, and debugging just 5 minutes. Combined with 27 minutes of model training and 10 minutes for evaluation, the entire prototype was completed in approximately 1 hour - a task that would typically require days of documentation research, API debugging, and trial-and-error.

Introduction

Large Language Models have transformed what’s possible in AI, but training them for domain-specific tasks remains challenging. Generic models lack the specialized knowledge needed for complex business processes. In this article, we demonstrate how KurrentDB’s event sourcing architecture provides an ideal foundation for training AI models that understand business workflows.

We fine-tuned a small language model (360M parameters) to predict the next event in financial processes. When deployed as an agentic anomaly detector, the model achieved 100% recall - catching every process anomaly in our benchmark - with just 27 minutes of training on consumer hardware.

The Problem: AI Models Don’t Understand Business Processes

Consider a trade order in a financial system. A generic LLM has no understanding that:

OrderSubmitted should be followed by OrderValidated
After validation, orders get routed to exchanges
Settlements happen T+2 after execution
Risk alerts trigger mitigation workflows

Without domain knowledge, models can’t help with process automation, anomaly detection, or workflow optimization.

The Solution: Event Sourcing as Training Data

KurrentDB stores every state change as an immutable event in time-ordered streams. This architecture is perfectly suited for training AI models because:

Events capture causality - Each event is caused by previous events
Streams represent processes - A stream like trade-ABC123 contains the complete lifecycle
Order is preserved - Events appear in the exact sequence they occurred
Context is rich - Events include metadata about what happened and why

Our Approach

We used KurrentDB to store financial process events across five workflow types:

KurrentDB Streams:
  trade-{order_id}      -> OrderSubmitted, OrderValidated, OrderRouted, OrderFilled...
  payment-{payment_id}  -> PaymentInitiated, PaymentValidated, PaymentApproved...
  risk-{alert_id}       -> RiskLimitBreached, RiskAlertCreated, RiskMitigationStarted...
  compliance-{check_id} -> ComplianceCheckTriggered, ComplianceFlagRaised...
  account-{app_id}      -> AccountApplicationSubmitted, AccountKYCStarted...

Each stream captures a complete process lifecycle - exactly the sequential understanding we want the model to learn.

Training Data Generation

From KurrentDB streams, we generated training examples in a simple format:

Input: “Given these events: OrderSubmitted, OrderValidated, OrderRouted - What comes next?”

Output: “OrderFilled”

This creates a supervised learning task where the model learns to predict the next event given the process history.

Sample Training Examples

Process: Trade Lifecycle
Input:  [OrderSubmitted]
Target: OrderValidated

Process: Payment (with approval)
Input:  [PaymentInitiated, PaymentValidated, PaymentPendingApproval]
Target: PaymentApproved

Process: Risk Management
Input:  [RiskLimitBreached, RiskAlertCreated, RiskAlertAcknowledged]
Target: RiskMitigationStarted

We generated 1,487 training examples from 200 process sequences.

Training the Model

We fine-tuned SmolLM2-360M-Instruct using LoRA (Low-Rank Adaptation) for just 3 epochs on 1,487 training examples.

Training Efficiency

Metric	Value
Training Time	27 minutes
GPU	RTX 4070 Laptop
Model Size	360M parameters
Method	LoRA (rank 16)
Final Loss	0.034

A small model, minimal compute - because the training data captured exactly what we wanted the model to learn.

Why Event Sourcing Works for AI Training

1. Natural Sequence Structure

Events in KurrentDB are inherently sequential. Unlike traditional databases where you’d need complex queries to reconstruct process flows, event streams are already in the exact format needed for sequence prediction tasks.

Stream: trade-ABC123
Position 0: OrderSubmitted
Position 1: OrderValidated
Position 2: OrderRouted
Position 3: OrderFilled
Position 4: TradeBooked
Position 5: TradeSettled

2. Rich Context Without Labeling

Each event carries meaningful context - timestamps, entity IDs, status fields, amounts. This metadata helps the model understand not just what happens, but why certain transitions occur.

{
  "event_type": "PaymentPendingApproval",
  "data": {
    "payment_id": "PAY-123",
    "amount": 500000.00,
    "approval_level": "dual",
    "required_approvers": 2
  }
}

The model learns that large payments (like $500K) require dual approval - domain knowledge embedded in the event structure.

3. Process Variants Are Captured Naturally

Real processes have branches and exceptions. Event sourcing captures all variants:

Trade success: OrderSubmitted -> OrderValidated -> … -> TradeConfirmed
Trade rejection: OrderSubmitted -> OrderRejected
Trade cancellation: OrderSubmitted -> … -> OrderCancelled

The model learns both the happy path and exception handling.

4. Temporal Relationships Are Preserved

Event sourcing maintains exact ordering and timestamps. The model learns temporal patterns:

Settlement happens 2 days after execution
Compliance reviews have SLA deadlines
Risk mitigation follows acknowledgment

Practical Applications

A model trained on process events enables several powerful capabilities:

1. Process Automation

Auto-suggest next steps in workflow UIs
Pre-populate forms based on expected events
Trigger automated actions for predictable transitions

2. Anomaly Detection

Flag unexpected events for investigation
Detect process violations in real-time
Identify mixed or corrupted event streams

3. Compliance Monitoring

Ensure required steps aren’t skipped
Detect when approval workflows are bypassed
Alert on missing compliance events

Agentic Anomaly Detection: A Practical Benchmark

To demonstrate practical utility beyond prediction accuracy, we built an agentic process orchestrator - an AI agent that monitors event streams in real-time, predicts expected events, and flags anomalies when actual events don’t match predictions.

How It Works

Event Stream: trade-ABC123
  [1] OrderSubmitted     -> Agent predicts: OrderValidated
  [2] OrderValidated     -> Agent predicts: OrderRouted (matches!)
  [3] PaymentFailed      -> Agent: ANOMALY! Expected OrderRouted, got PaymentFailed

The agent monitors each transition, comparing its prediction against the actual event. When they differ significantly, it flags an anomaly with an explanation and recommended action.

Benchmark Design

We tested the orchestrator with two types of sequences:

Normal sequences (50): Valid process flows the model should recognize
Anomalous sequences (50): Sequences with intentional errors:
- Wrong event type injected
- Steps skipped in the workflow
- Events out of expected order

Results: The Agent as a Watchdog

Metric	Value	Interpretation
Recall	100%	Caught every anomaly
Precision	58.8%	Some false alarms
F1 Score	74.1%	Good overall balance

The confusion matrix tells the story:

                    Predicted
                 Normal  Anomaly
Actual  Normal     15       35
        Anomaly     0       50

True Positives: 50 - All 50 anomalies were detected
False Negatives: 0 - No anomalies slipped through
False Positives: 35 - Some normal sequences flagged (acceptable for a watchdog)

Why 100% Recall Matters

For a monitoring system, missing an anomaly is worse than a false alarm. A payment routed to the wrong account, a compliance step skipped, or an unauthorized trade execution - these are costly mistakes that must be caught.

The agent achieved perfect recall: every injected anomaly was detected. The 35 false positives represent cases where the model was uncertain about valid but less common transitions - these can be reviewed quickly and provide learning opportunities.

Example Detection

Sequence: OrderSubmitted -> OrderValidated -> PaymentFailed -> OrderFilled

Agent Analysis:
  After OrderValidated: saw 'PaymentFailed'
  Expected: 'OrderRouted'

  [ANOMALY DETECTED]
  Reason: Expected trade event 'OrderRouted', but received payment
          event 'PaymentFailed'. Events from different process types
          are mixed.

  Action: Investigate why PaymentFailed appeared in this stream.
          Check for event routing errors.

The agent not only detects the anomaly but explains why it’s unexpected and suggests investigation steps - turning raw predictions into actionable insights.

Running the example

You can find the source code in the examples folder of the test_scenario_huggingface branch at https://github.com/kurrent-io/coding-agent-skills/tree/test_scenario_huggingface Or you can create your own with a few prompts with Kurrent Skills and Huggingface Skills.

Step 1: Set Up KurrentDB

docker run -d --name kurrentdb \
  -p 2113:2113 \
  docker.kurrent.io/kurrent-latest/kurrentdb:latest \
  --insecure --run-projections=All

Step 2: Define Your Events

@dataclass
class OrderSubmitted:
    order_id: str
    account_id: str
    symbol: str
    quantity: float
    timestamp: str

Step 3: Generate Training Data

def create_training_example(events, position):
    context = [e["event_type"] for e in events[:position]]
    target = events[position]["event_type"]

    return {
        "messages": [
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Events: {context}\nWhat's next?"},
            {"role": "assistant", "content": target}
        ]
    }

Step 4: Fine-Tune with LoRA

from trl import SFTTrainer, SFTConfig
from peft import LoraConfig

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
    args=SFTConfig(num_train_epochs=3, learning_rate=2e-4)
)
trainer.train()

Conclusion

Event sourcing isn’t just an architecture pattern - it’s a training data goldmine. KurrentDB’s immutable, ordered event streams provide exactly the sequential structure that AI models need to learn business processes.

Our experiment demonstrated:

100% anomaly recall - the agentic watchdog caught every process anomaly
27 minutes of training on consumer hardware
360M parameter model - no massive infrastructure needed
1,487 training examples generated automatically from event streams

The key insight: your event store already contains the training data. Every process that flows through KurrentDB creates examples of “given this history, this happens next.” And when combined with an agentic architecture, that predictive capability transforms into real-time monitoring that catches every process anomaly.

For organizations already using event sourcing, fine-tuning AI models on process events is a natural next step. The events you’re capturing for audit, replay, and debugging are also the foundation for intelligent automation and proactive anomaly detection.

Resources

About This Experiment

Model: SmolLM2-360M-Instruct
Training Method: SFT with LoRA
Event Types: 30+ financial process events
Process Types: Trade, Payment, Risk, Compliance, Account
Training Examples: 1,487
Hardware: NVIDIA RTX 4070 Laptop GPU

Back to all posts