Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation

Why We Built It

Most AI demos focus on chatbots, copilots, or AI agents.

However, one of the largest automation opportunities inside enterprises is much less glamorous:

Financial reconciliation.

Every day, finance teams receive thousands of transactions from bank statements.

A transaction may look like this:

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157

For a human accountant, the meaning is obvious.

For a machine, it's just text.

The challenge is transforming transaction narratives into structured business knowledge.

This article explains how I built a Transaction Intelligence System that converts raw MT950 bank statements into machine-readable entities that can be automatically reconciled against invoices, contracts, and customer records.

The Real Problem

Many people assume payment gateways solve reconciliation.

They don't.

Payment gateways solve payment collection.

Enterprise reconciliation requires answering different questions:

Which customer made the payment?
Which invoice is being settled?
Which contract governs the transaction?
Is this a partial payment?
Is the payment amount correct?

Those answers don't exist in the payment itself.

They exist in business context.

System Architecture

The architecture consists of multiple layers:

MT950 Statement
       ↓
Canonical Transformation
       ↓
Named Entity Recognition
       ↓
Entity Resolution
       ↓
Reconciliation Engine
       ↓
Automation API

Each layer solves a specific problem.

Step 1: Synthetic Enterprise Dataset Generation

One of the biggest challenges was obtaining training data.

Real enterprise financial data is typically unavailable due to privacy restrictions.

Instead, I generated synthetic datasets containing:

Customer Master

{
  "customer_id": "CUS-00002",
  "legal_name": "ALPHABRIDGE SOLUTIONS"
}

Contract Master

{
  "contract_id": "CNT-2024-587",
  "customer_id": "CUS-00002"
}

Invoice Master

{
  "invoice_number": "MFG-INV-000157",
  "contract_id": "CNT-2024-587"
}

MT950 Statements

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157

This created a complete ground-truth environment for training and evaluation.

Step 2: Canonical Transformation

Raw MT950 files are difficult to work with.

A transaction:

:61:240226C3979,85NTRFNONREF
:86:PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157

is transformed into a canonical structure:

{
  "transaction_id": "...",
  "currency": "EUR",
  "amount": 3979.85,
  "narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}

This becomes the standardized input for downstream processing.

Step 3: Taxonomy Design

Before training a model, we must define what matters.

The taxonomy includes:

COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE

Example:

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157

becomes:

{
  "COMPANY": "ALPHABRIDGE SOLUTIONS",
  "INVOICE": "MFG-INV-000157",
  "PAYMENT_TYPE": "PART PMT"
}

This taxonomy becomes the language of the system.

Step 4: Automated Prelabeling

Manual annotation does not scale.

Instead, I built a prelabel engine using:

Regular expressions
Master data lookups
Heuristic rules

Example:

invoice_pattern = r"[A-Z]{3}-INV-\d+"

This automatically generates initial annotations before human review.

The result:

Faster annotation
Higher consistency
Reduced labeling cost

Step 5: Doccano Annotation

Prelabeled data is imported into Doccano.

Human reviewers validate:

Company names
Invoice references
Contract identifiers
Purchase orders
Payment types

This creates the ground truth required for model training.

Step 6: Fine-Tuning a Financial NER Model

The training pipeline:

Doccano
    ↓
BIO Conversion
    ↓
IndoBERT
    ↓
Fine-Tuning

Target entities:

COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE

The objective is not generic NER.

The objective is enterprise transaction understanding.

Step 7: Entity Resolution

Entity extraction alone is not enough.

For example:

ALPHABRIDGE

must resolve to:

{
  "customer_id": "CUS-00002",
  "legal_name": "ALPHABRIDGE SOLUTIONS"
}

The resolution engine uses:

Exact Matching

ALPHABRIDGE SOLUTIONS

Alias Matching

ALPHABRIDGE LTD

Fuzzy Matching

ALPHA BRIDGE

Embedding Similarity

For more difficult cases.

Step 8: Reconciliation Engine

Once entities are resolved:

{
  "customer_id": "CUS-00002",
  "invoice_number": "MFG-INV-000157"
}

the reconciliation engine validates:

Customer ownership
Contract relationships
Invoice existence
Amount consistency

Possible outcomes:

AUTO_RECONCILED
PARTIAL_MATCH
OVERPAYMENT
UNDERPAYMENT
REVIEW_REQUIRED

Step 9: API Layer

The final system exposes endpoints such as:

POST /reconcile/text

Input:

{
  "narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}

Output:

{
  "customer_id": "CUS-00002",
  "invoice_number": "MFG-INV-000157",
  "status": "AUTO_RECONCILED"
}

This allows integration with:

ERP systems
Accounting platforms
Finance operations workflows
AI agents

Lessons Learned

Building the model was not the hardest part.

The hardest parts were:

Data Quality

Poor data produces poor automation.

Taxonomy Design

The model only understands the concepts you define.

Canonical Data

Without canonical structures, downstream automation becomes fragile.

Entity Resolution

Extraction without resolution has limited business value.

Final Thoughts

Most enterprise automation projects focus on AI models.

In my experience, the real challenge is business understanding.

The architecture that matters most is:

Raw Data
↓
Canonical Data
↓
Taxonomy
↓
NER
↓
Resolution
↓
Decision Intelligence
↓
Automation

AI is only one layer in the stack.

The organizations that succeed with enterprise AI will be the ones that invest in data foundations, business taxonomies, and transaction intelligence before they invest in autonomous agents.

If you're building AI for enterprise operations, start with understanding before automation.

Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation

Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation

Why We Built It

The Real Problem

System Architecture

Step 1: Synthetic Enterprise Dataset Generation

Customer Master

Contract Master

Invoice Master

MT950 Statements

Step 2: Canonical Transformation

Step 3: Taxonomy Design

Step 4: Automated Prelabeling

Step 5: Doccano Annotation

Step 6: Fine-Tuning a Financial NER Model

Step 7: Entity Resolution

Exact Matching

Alias Matching

Fuzzy Matching

Embedding Similarity

Step 8: Reconciliation Engine

Step 9: API Layer

Lessons Learned

Data Quality

Taxonomy Design

Canonical Data

Entity Resolution

Final Thoughts

Tags

Author

Stats

Published

You Might Also Like

The Principle of Least AI

. .. . ... . .... . .... . ... .

I'm not a developer, but I built a calendar app to fix my most annoying work task

Too cheap to be good? Think again.

The 80/20 Rule of AI Code — Why the Last 20% Takes 80% of Your Time

Internmaxxing vs. Old Man Shakes Fist at Cloud