Fine-tune your models with Skills

Using KurrentDB Event Sourcing to Train Smarter Financial AI Models
Preface
This is a prototype project built entirely using two Claude Code skills and synthetic data:
- KurrentDB Skills - Provides KurrentDB (EventStoreDB) client code patterns for efficient and accurate code generation.
- Hugging Face Model Trainer Skill - Guidance for fine-tuning language models using TRL on Hugging Face infrastructure
The goal of this prototype is to demonstrate what’s possible when combining event sourcing with modern AI fine-tuning techniques. This is not production-ready code, but rather a working proof-of-concept that shows:
- How event-sourced data naturally fits sequence prediction tasks
- How small models can achieve dramatic improvements with domain-specific training
- How existing event stores can be leveraged as training data sources
We encourage you to use this as a starting point for your own experiments. The patterns shown here - generating training data from event streams, fine-tuning with LoRA, and evaluating process prediction accuracy - can be adapted to any domain where you’re capturing sequential business events.
Source Code: The complete prototype is available alongside this article.
How Skills Made This Easy
This entire prototype was built through a conversational workflow with Claude Code, leveraging two specialized skills. The KurrentDB skill provided correct client packages, connection string formats, and event sourcing patterns like stream naming conventions ({aggregate}-{id}). The Hugging Face Model Trainer skill supplied TRL training patterns, LoRA configuration best practices, and current API usage - including knowing that TRL’s tokenizer parameter had recently changed to processing_class. Together, these skills eliminated the typical research-debug-fix cycle, providing working code patterns on the first attempt.
The result was dramatic time savings. Event schema design took 5 minutes, the data generator 10 minutes, training script setup 5 minutes, and debugging just 5 minutes. Combined with 27 minutes of model training and 10 minutes for evaluation, the entire prototype was completed in approximately 1 hour - a task that would typically require days of documentation research, API debugging, and trial-and-error.
Introduction
Large Language Models have transformed what’s possible in AI, but training them for domain-specific tasks remains challenging. Generic models lack the specialized knowledge needed for complex business processes. In this article, we demonstrate how KurrentDB’s event sourcing architecture provides an ideal foundation for training AI models that understand business workflows.
We fine-tuned a small language model (360M parameters) to predict the next event in financial processes. When deployed as an agentic anomaly detector, the model achieved 100% recall - catching every process anomaly in our benchmark - with just 27 minutes of training on consumer hardware.
The Problem: AI Models Don’t Understand Business Processes
Consider a trade order in a financial system. A generic LLM has no understanding that:
OrderSubmittedshould be followed byOrderValidated- After validation, orders get routed to exchanges
- Settlements happen T+2 after execution
- Risk alerts trigger mitigation workflows
Without domain knowledge, models can’t help with process automation, anomaly detection, or workflow optimization.
The Solution: Event Sourcing as Training Data
KurrentDB stores every state change as an immutable event in time-ordered streams. This architecture is perfectly suited for training AI models because:
- Events capture causality - Each event is caused by previous events
- Streams represent processes - A stream like
trade-ABC123contains the complete lifecycle - Order is preserved - Events appear in the exact sequence they occurred
- Context is rich - Events include metadata about what happened and why
Our Approach
We used KurrentDB to store financial process events across five workflow types:
KurrentDB Streams:
trade-{order_id} -> OrderSubmitted, OrderValidated, OrderRouted, OrderFilled...
payment-{payment_id} -> PaymentInitiated, PaymentValidated, PaymentApproved...
risk-{alert_id} -> RiskLimitBreached, RiskAlertCreated, RiskMitigationStarted...
compliance-{check_id} -> ComplianceCheckTriggered, ComplianceFlagRaised...
account-{app_id} -> AccountApplicationSubmitted, AccountKYCStarted...Each stream captures a complete process lifecycle - exactly the sequential understanding we want the model to learn.
Training Data Generation
From KurrentDB streams, we generated training examples in a simple format:
Input: “Given these events: OrderSubmitted, OrderValidated, OrderRouted - What comes next?”
Output: “OrderFilled”
This creates a supervised learning task where the model learns to predict the next event given the process history.
Sample Training Examples
Process: Trade Lifecycle
Input: [OrderSubmitted]
Target: OrderValidated
Process: Payment (with approval)
Input: [PaymentInitiated, PaymentValidated, PaymentPendingApproval]
Target: PaymentApproved
Process: Risk Management
Input: [RiskLimitBreached, RiskAlertCreated, RiskAlertAcknowledged]
Target: RiskMitigationStartedWe generated 1,487 training examples from 200 process sequences.
Training the Model
We fine-tuned SmolLM2-360M-Instruct using LoRA (Low-Rank Adaptation) for just 3 epochs on 1,487 training examples.
Training Efficiency
| Metric | Value |
|---|---|
| Training Time | 27 minutes |
| GPU | RTX 4070 Laptop |
| Model Size | 360M parameters |
| Method | LoRA (rank 16) |
| Final Loss | 0.034 |
A small model, minimal compute - because the training data captured exactly what we wanted the model to learn.
Why Event Sourcing Works for AI Training
1. Natural Sequence Structure
Events in KurrentDB are inherently sequential. Unlike traditional databases where you’d need complex queries to reconstruct process flows, event streams are already in the exact format needed for sequence prediction tasks.
Stream: trade-ABC123
Position 0: OrderSubmitted
Position 1: OrderValidated
Position 2: OrderRouted
Position 3: OrderFilled
Position 4: TradeBooked
Position 5: TradeSettled2. Rich Context Without Labeling
Each event carries meaningful context - timestamps, entity IDs, status fields, amounts. This metadata helps the model understand not just what happens, but why certain transitions occur.
{
"event_type": "PaymentPendingApproval",
"data": {
"payment_id": "PAY-123",
"amount": 500000.00,
"approval_level": "dual",
"required_approvers": 2
}
}The model learns that large payments (like $500K) require dual approval - domain knowledge embedded in the event structure.
3. Process Variants Are Captured Naturally
Real processes have branches and exceptions. Event sourcing captures all variants:
- Trade success: OrderSubmitted -> OrderValidated -> … -> TradeConfirmed
- Trade rejection: OrderSubmitted -> OrderRejected
- Trade cancellation: OrderSubmitted -> … -> OrderCancelled
The model learns both the happy path and exception handling.
4. Temporal Relationships Are Preserved
Event sourcing maintains exact ordering and timestamps. The model learns temporal patterns:
- Settlement happens 2 days after execution
- Compliance reviews have SLA deadlines
- Risk mitigation follows acknowledgment
Practical Applications
A model trained on process events enables several powerful capabilities:
1. Process Automation
- Auto-suggest next steps in workflow UIs
- Pre-populate forms based on expected events
- Trigger automated actions for predictable transitions
2. Anomaly Detection
- Flag unexpected events for investigation
- Detect process violations in real-time
- Identify mixed or corrupted event streams
3. Compliance Monitoring
- Ensure required steps aren’t skipped
- Detect when approval workflows are bypassed
- Alert on missing compliance events
Agentic Anomaly Detection: A Practical Benchmark
To demonstrate practical utility beyond prediction accuracy, we built an agentic process orchestrator - an AI agent that monitors event streams in real-time, predicts expected events, and flags anomalies when actual events don’t match predictions.
How It Works
Event Stream: trade-ABC123
[1] OrderSubmitted -> Agent predicts: OrderValidated
[2] OrderValidated -> Agent predicts: OrderRouted (matches!)
[3] PaymentFailed -> Agent: ANOMALY! Expected OrderRouted, got PaymentFailedThe agent monitors each transition, comparing its prediction against the actual event. When they differ significantly, it flags an anomaly with an explanation and recommended action.
Benchmark Design
We tested the orchestrator with two types of sequences:
- Normal sequences (50): Valid process flows the model should recognize
- Anomalous sequences (50): Sequences with intentional errors:
- Wrong event type injected
- Steps skipped in the workflow
- Events out of expected order
Results: The Agent as a Watchdog
| Metric | Value | Interpretation |
|---|---|---|
| Recall | 100% | Caught every anomaly |
| Precision | 58.8% | Some false alarms |
| F1 Score | 74.1% | Good overall balance |
The confusion matrix tells the story:
Predicted
Normal Anomaly
Actual Normal 15 35
Anomaly 0 50- True Positives: 50 - All 50 anomalies were detected
- False Negatives: 0 - No anomalies slipped through
- False Positives: 35 - Some normal sequences flagged (acceptable for a watchdog)
Why 100% Recall Matters
For a monitoring system, missing an anomaly is worse than a false alarm. A payment routed to the wrong account, a compliance step skipped, or an unauthorized trade execution - these are costly mistakes that must be caught.
The agent achieved perfect recall: every injected anomaly was detected. The 35 false positives represent cases where the model was uncertain about valid but less common transitions - these can be reviewed quickly and provide learning opportunities.
Example Detection
Sequence: OrderSubmitted -> OrderValidated -> PaymentFailed -> OrderFilled
Agent Analysis:
After OrderValidated: saw 'PaymentFailed'
Expected: 'OrderRouted'
[ANOMALY DETECTED]
Reason: Expected trade event 'OrderRouted', but received payment
event 'PaymentFailed'. Events from different process types
are mixed.
Action: Investigate why PaymentFailed appeared in this stream.
Check for event routing errors.The agent not only detects the anomaly but explains why it’s unexpected and suggests investigation steps - turning raw predictions into actionable insights.
Running the example
You can find the source code in the examples folder of the test_scenario_huggingface branch at https://github.com/kurrent-io/coding-agent-skills/tree/test_scenario_huggingface Or you can create your own with a few prompts with Kurrent Skills and Huggingface Skills.
Step 1: Set Up KurrentDB
docker run -d --name kurrentdb \
-p 2113:2113 \
docker.kurrent.io/kurrent-latest/kurrentdb:latest \
--insecure --run-projections=AllStep 2: Define Your Events
@dataclass
class OrderSubmitted:
order_id: str
account_id: str
symbol: str
quantity: float
timestamp: strStep 3: Generate Training Data
def create_training_example(events, position):
context = [e["event_type"] for e in events[:position]]
target = events[position]["event_type"]
return {
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Events: {context}\nWhat's next?"},
{"role": "assistant", "content": target}
]
}Step 4: Fine-Tune with LoRA
from trl import SFTTrainer, SFTConfig
from peft import LoraConfig
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
args=SFTConfig(num_train_epochs=3, learning_rate=2e-4)
)
trainer.train()Conclusion
Event sourcing isn’t just an architecture pattern - it’s a training data goldmine. KurrentDB’s immutable, ordered event streams provide exactly the sequential structure that AI models need to learn business processes.
Our experiment demonstrated:
- 100% anomaly recall - the agentic watchdog caught every process anomaly
- 27 minutes of training on consumer hardware
- 360M parameter model - no massive infrastructure needed
- 1,487 training examples generated automatically from event streams
The key insight: your event store already contains the training data. Every process that flows through KurrentDB creates examples of “given this history, this happens next.” And when combined with an agentic architecture, that predictive capability transforms into real-time monitoring that catches every process anomaly.
For organizations already using event sourcing, fine-tuning AI models on process events is a natural next step. The events you’re capturing for audit, replay, and debugging are also the foundation for intelligent automation and proactive anomaly detection.
Resources
About This Experiment
- Model: SmolLM2-360M-Instruct
- Training Method: SFT with LoRA
- Event Types: 30+ financial process events
- Process Types: Trade, Payment, Risk, Compliance, Account
- Training Examples: 1,487
- Hardware: NVIDIA RTX 4070 Laptop GPU
