Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation
Why We Built It
Most AI demos focus on chatbots, copilots, or AI agents.
However, one of the largest automation opportunities inside enterprises is much less glamorous:
Financial reconciliation.
Every day, finance teams receive thousands of transactions from bank statements.
A transaction may look like this:
PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
For a human accountant, the meaning is obvious.
For a machine, it's just text.
The challenge is transforming transaction narratives into structured business knowledge.
This article explains how I built a Transaction Intelligence System that converts raw MT950 bank statements into machine-readable entities that can be automatically reconciled against invoices, contracts, and customer records.
The Real Problem
Many people assume payment gateways solve reconciliation.
They don't.
Payment gateways solve payment collection.
Enterprise reconciliation requires answering different questions:
- Which customer made the payment?
- Which invoice is being settled?
- Which contract governs the transaction?
- Is this a partial payment?
- Is the payment amount correct?
Those answers don't exist in the payment itself.
They exist in business context.
System Architecture
The architecture consists of multiple layers:
MT950 Statement
↓
Canonical Transformation
↓
Named Entity Recognition
↓
Entity Resolution
↓
Reconciliation Engine
↓
Automation API
Each layer solves a specific problem.
Step 1: Synthetic Enterprise Dataset Generation
One of the biggest challenges was obtaining training data.
Real enterprise financial data is typically unavailable due to privacy restrictions.
Instead, I generated synthetic datasets containing:
Customer Master
{
"customer_id": "CUS-00002",
"legal_name": "ALPHABRIDGE SOLUTIONS"
}
Contract Master
{
"contract_id": "CNT-2024-587",
"customer_id": "CUS-00002"
}
Invoice Master
{
"invoice_number": "MFG-INV-000157",
"contract_id": "CNT-2024-587"
}
MT950 Statements
PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
This created a complete ground-truth environment for training and evaluation.
Step 2: Canonical Transformation
Raw MT950 files are difficult to work with.
A transaction:
:61:240226C3979,85NTRFNONREF
:86:PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
is transformed into a canonical structure:
{
"transaction_id": "...",
"currency": "EUR",
"amount": 3979.85,
"narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}
This becomes the standardized input for downstream processing.
Step 3: Taxonomy Design
Before training a model, we must define what matters.
The taxonomy includes:
COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE
Example:
PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
becomes:
{
"COMPANY": "ALPHABRIDGE SOLUTIONS",
"INVOICE": "MFG-INV-000157",
"PAYMENT_TYPE": "PART PMT"
}
This taxonomy becomes the language of the system.
Step 4: Automated Prelabeling
Manual annotation does not scale.
Instead, I built a prelabel engine using:
- Regular expressions
- Master data lookups
- Heuristic rules
Example:
invoice_pattern = r"[A-Z]{3}-INV-\d+"
This automatically generates initial annotations before human review.
The result:
- Faster annotation
- Higher consistency
- Reduced labeling cost
Step 5: Doccano Annotation
Prelabeled data is imported into Doccano.
Human reviewers validate:
- Company names
- Invoice references
- Contract identifiers
- Purchase orders
- Payment types
This creates the ground truth required for model training.
Step 6: Fine-Tuning a Financial NER Model
The training pipeline:
Doccano
↓
BIO Conversion
↓
IndoBERT
↓
Fine-Tuning
Target entities:
COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE
The objective is not generic NER.
The objective is enterprise transaction understanding.
Step 7: Entity Resolution
Entity extraction alone is not enough.
For example:
ALPHABRIDGE
must resolve to:
{
"customer_id": "CUS-00002",
"legal_name": "ALPHABRIDGE SOLUTIONS"
}
The resolution engine uses:
Exact Matching
ALPHABRIDGE SOLUTIONS
Alias Matching
ALPHABRIDGE LTD
Fuzzy Matching
ALPHA BRIDGE
Embedding Similarity
For more difficult cases.
Step 8: Reconciliation Engine
Once entities are resolved:
{
"customer_id": "CUS-00002",
"invoice_number": "MFG-INV-000157"
}
the reconciliation engine validates:
- Customer ownership
- Contract relationships
- Invoice existence
- Amount consistency
Possible outcomes:
AUTO_RECONCILED
PARTIAL_MATCH
OVERPAYMENT
UNDERPAYMENT
REVIEW_REQUIRED
Step 9: API Layer
The final system exposes endpoints such as:
POST /reconcile/text
Input:
{
"narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}
Output:
{
"customer_id": "CUS-00002",
"invoice_number": "MFG-INV-000157",
"status": "AUTO_RECONCILED"
}
This allows integration with:
- ERP systems
- Accounting platforms
- Finance operations workflows
- AI agents
Lessons Learned
Building the model was not the hardest part.
The hardest parts were:
Data Quality
Poor data produces poor automation.
Taxonomy Design
The model only understands the concepts you define.
Canonical Data
Without canonical structures, downstream automation becomes fragile.
Entity Resolution
Extraction without resolution has limited business value.
Final Thoughts
Most enterprise automation projects focus on AI models.
In my experience, the real challenge is business understanding.
The architecture that matters most is:
Raw Data
↓
Canonical Data
↓
Taxonomy
↓
NER
↓
Resolution
↓
Decision Intelligence
↓
Automation
AI is only one layer in the stack.
The organizations that succeed with enterprise AI will be the ones that invest in data foundations, business taxonomies, and transaction intelligence before they invest in autonomous agents.
If you're building AI for enterprise operations, start with understanding before automation.













