Building NaijaShield: Behavioral Fraud Detection on Nigerian Mobile Money Rails

NIBSS reported ₦9.5 billion in mobile money fraud losses in 2023. That number sits against a backdrop of ₦100 trillion in total mobile money transaction volume for the same period — making fraud roughly 0.0095% of throughput by value, but concentrated in a pattern that standard rule-based systems consistently miss. When you build fraud detection for Nigerian rails, you are not solving a generic fintech problem. You are solving a problem shaped by infrastructure constraints, sparse labeled datasets, and behavioral patterns that differ structurally from the Western transaction graphs most ML fraud systems were trained on.

NaijaShield is the behavioral fraud detection layer we built at BAMG Studio to address this gap. This article walks through the architecture decisions, the dataset problem, and the results from our pilot deployment.

The Problem

Mobile money fraud on Nigerian rails does not look like card-not-present fraud in the US or APP fraud in the UK. The attack surface is different. Most Nigerian mobile money users operate through USSD sessions (short-lived, menu-driven, stateless from the network's perspective) or thin-client apps on low-memory Android devices running inconsistent OS versions. Session lengths are short. Connectivity is intermittent. And because ground-truth fraud labels are sparse — banks and fintechs rarely share confirmed fraud cases with researchers — training a supervised classifier from scratch produces a system that flags too broadly or not at all.

Rule-based engines — velocity checks, amount thresholds, known blacklists — catch maybe 40% of cases in our test environment before producing unacceptable false-positive rates that freeze legitimate customer accounts. On a rail where a blocked transfer might mean a trader in Kano cannot pay a supplier in Lagos before market close, false positives have real economic cost.

The question we set out to answer: can behavioral biometrics, engineered carefully for the Nigerian mobile context, provide a lower-latency, more precise fraud signal?

Architecture

NaijaShield's backend runs on Python with FastAPI handling the inference API layer. The choice of FastAPI over Flask was straightforward — async request handling matters when you are scoring transactions at the edge where P99 latency is a hard constraint.

Client App / USSD Gateway
        │
        ▼
  FastAPI Inference API  ←──→  Redis (Session State)
        │
        ▼
  Feature Engineering Pipeline
        │
        ▼
  XGBoost Anomaly Scorer
        │
        ▼
  Decision Engine (flag / challenge / allow)

Session State — Redis: Each user session accumulates behavioral features in a Redis hash with a TTL tied to the session window. This gives us sub-millisecond read/write on the feature vector and clean session isolation without database round-trips during scoring.

Behavioral Biometrics Signals:

Typing cadence: inter-keystroke timing variance across PIN entry and amount fields
Tap pressure: normalized tap force during touchscreen interactions (where available via device sensor APIs)
Session velocity: rate of screen transitions per second, time-on-field distributions

These signals are not foolproof in isolation. Their value is in combination — a session where tap pressure suddenly drops to zero (physical device swap), typing cadence variance collapses (bot or scripted input), and transaction amount sits at the 95th percentile of the user's historical distribution is a materially different risk profile than any single signal would suggest.

Scoring — XGBoost: We use XGBoost rather than a deep model for two reasons. First, inference latency on an EC2 t3.medium instance with XGBoost is under 8ms per transaction — well within the window where a challenge can be injected without degrading UX. Second, feature importance is interpretable, which matters for explaining decisions to compliance teams and, eventually, regulators.

Deployment — AWS: The inference API runs on EC2 with Lambda edge functions handling pre-scoring enrichment (device fingerprint lookup, IP geolocation against known VPN exit nodes). The Lambda layer keeps cold-path enrichment off the hot path, keeping median scoring latency under 35ms end-to-end.

Key Technical Challenge: The Labeled Data Problem

The hardest problem in this project was not model architecture. It was data.

Nigerian mobile money operators do not publish confirmed fraud datasets. The few labeled examples available come from internal bank reports, and they are noisy — chargebacks are not always fraud, and many fraud cases are never reported. When we began, our labeled fraud set contained fewer than 800 confirmed examples across 14 months of transaction logs. Training a supervised binary classifier on that produces a model that memorizes noise.

Our solution: semi-supervised learning with a small labeled seed, augmented via SMOTE (Synthetic Minority Oversampling Technique).

The pipeline works in two stages:

Stage 1 — Isolation Forest pre-filtering: An unsupervised Isolation Forest runs over the full unlabeled transaction corpus to surface anomalous sessions. Anomalous sessions are not automatically labeled fraud — they go into a review queue. This gives us a prioritized candidate set for human labeling without requiring exhaustive review.

Stage 2 — SMOTE augmentation on the confirmed set: Once the labeled seed reaches sufficient quality, SMOTE generates synthetic minority-class examples by interpolating in feature space between known fraud instances. This expands the effective training set while preserving the distributional structure of confirmed fraud.

from sklearn.ensemble import IsolationForest
from imblearn.over_sampling import SMOTE
import pandas as pd
import numpy as np

def engineer_velocity_features(df: pd.DataFrame, window_seconds: int = 300) -> pd.DataFrame:
    df = df.sort_values(['user_id', 'timestamp'])
    df['txn_count_window'] = (
        df.groupby('user_id')['timestamp']
        .transform(lambda x: x.expanding().apply(
            lambda ts: ((ts.iloc[-1] - ts) <= window_seconds).sum() - 1
        ))
    )
    user_median = df.groupby('user_id')['amount_ngn'].transform('median')
    df['amount_acceleration'] = df['amount_ngn'] / (user_median + 1e-6)
    df['time_since_last_txn'] = (
        df.groupby('user_id')['timestamp'].diff().fillna(999999)
    )
    session_device_counts = df.groupby('session_id')['device_fingerprint'].nunique()
    df['session_device_collision'] = df['session_id'].map(session_device_counts) > 1
    return df[['user_id', 'txn_count_window', 'amount_acceleration',
               'time_since_last_txn', 'session_device_collision']]

def train_anomaly_seed(feature_df: pd.DataFrame, contamination: float = 0.05):
    clf = IsolationForest(n_estimators=200, contamination=contamination, random_state=42, n_jobs=-1)
    feature_cols = ['txn_count_window', 'amount_acceleration', 'time_since_last_txn']
    clf.fit(feature_df[feature_cols])
    return clf

Results

Pilot deployment ran across approximately 180,000 transactions over 6 weeks on a subset of a Nigerian fintech partner's transaction volume.

34% reduction in false-positive fraud flags vs. the baseline rule-based engine
Fraud catch rate held stable within 2 percentage points of the baseline
Median scoring latency: 31ms end-to-end
Model retrain cycle: weekly, triggered by drift metric on session velocity distributions

The false-positive reduction is the number that matters commercially. Every incorrectly flagged transaction is a support ticket, a potential customer churn event, and — in markets where trust in digital finance is still being built — a reason for a user to revert to cash.

What's Next

Three areas on the roadmap:

USSD-native behavioral signals: Menu navigation timing and keypress sequence patterns carry behavioral signal that most fraud vendors ignore.
Federated model updates: Federated learning lets us improve the model without centralizing sensitive transaction data across fintechs.
Cross-rail feature sharing: Unifying behavioral profiles across NIP, USSD mobile money, and POS transactions closes an evasion path where fraudsters rotate between channels.

The labeled data problem does not go away. It gets managed. The system learns the specific behavioral grammar of Nigerian mobile money fraud — and that grammar is not the same one trained into any model built elsewhere.

Published by Peter Kolawole, Founder of BAMG Studio. Precision-built on the continent.

Building NaijaShield: Behavioral Fraud Detection on Nigerian Mobile Money Rails

The Problem

Architecture

Key Technical Challenge: The Labeled Data Problem

Results

What's Next

Tags

Author

Stats

Published

You Might Also Like

Some friends wanted to see how I use DigitalOcean. So I built them the smallest real app I could.

The LLM Visibility Tools Cost $79/Month. Mine is Open Source.

On programming languages, targets, and platforms

I almost added an em-dash remover to my LLM library. Then I tested whether local models even produce em-dashes.

My eval harness paid for itself on the first run: 0.57 0.96, two bugs no unit test could catch

Never lose a training run again: a checkpoint-and-resume playbook for ephemeral GPUs