Today we rolled out a major update to NextOCR, specifically optimized for Khmer printed books published before 1953.
This new model significantly improves OCR quality on historical Khmer documents, especially degraded scans, rare glyph forms, and old orthographic conventions.
As a quick demonstration, here is a page from a Khmer book printed before 1953.
Despite severe scan degradation, ink noise, and historical typography, the updated model preserves both structure and readability at a surprisingly high level.
This raises an interesting question:
Why can a specialized OCR system outperform large multimodal models like Gemini?
Many people assume larger models should automatically win.
More data.
More GPUs.
More engineers.
But our experience suggests something different.
Bigger AI does not automatically mean better OCR
Google and other Big Tech companies do not lack data.
In fact, they likely possess far more document data than we do.
They also have world-class infrastructure and elite AI researchers.
Yet historical OCR is not solved by scale alone.
OCR for old Khmer books is an unusually specialized problem.
The challenge is not merely text recognition.
It includes:
- degraded printing quality
- ink bleed
- page skew
- broken glyph structures
- historical spelling variation
- dense layout ambiguity
These problems require more than general multimodal intelligence.
They require domain-specific engineering.
Architecture matters
Our latest update improves performance because of architectural refinement across the OCR pipeline.
The system now performs better at:
- historical document restoration
- layout-aware segmentation
- glyph-sensitive recognition
- Khmer language correction
Small architectural decisions create large downstream gains.
Sometimes improving a single preprocessing stage improves final accuracy more than increasing model size.
Parameter count alone does not determine OCR performance.
Obsession is a competitive advantage
Architecture is only part of the story.
The deeper advantage is accumulated obsession.
Since 2013, OCR has been our continuous focus.
Over the years we have encountered thousands of edge cases.
A broken consonant.
A merged ligature.
A faint vowel mark.
A scan that appears unreadable.
Every failure taught us something.
Over time, those lessons became engineering intuition.
That intuition shaped the model.
Large organizations have extraordinary talent.
But they operate under roadmaps, priorities, quarterly goals, and resource allocation constraints.
A solo AI builder works differently.
Sometimes breakthroughs happen not during meetings, but at 2 AM.
A decoding failure keeps repeating in your head.
You revisit assumptions.
You rethink the algorithm.
You discover a better approach.
That type of uninterrupted focus is difficult to scale organizationally.
Experience becomes part of the model
In niche AI domains, experience is not separate from the model.
Experience becomes part of the architecture.
It influences:
- what data you curate
- what errors you prioritize
- what tradeoffs you accept
- what signals you preserve
This is especially true for historical OCR.
The newest NextOCR release reflects more than a model update.
It reflects 13 years of iteration.
Big Tech has more compute.
But compute alone does not solve everything.
The scarcest resource in AI may not be compute.
It may be sustained obsession.
And in historical OCR, obsession compounds.
Try the latest model: https://nextocr.org













