Following my mobile notification test a couple weeks back, I got a stack of messages asking whether voicemail transcription quality was comparably variable between UK VoIP providers. Short answer, tbh yes, and the gap is bigger than you would think.
The setup
I recorded 50 voicemail messages in varied conditions, then played them back through a physical speaker into each provider's inbound line. Each provider's transcription engine then processed the same audio and returned text.
Audio conditions covered.
| Condition | Count | Description |
|---|---|---|
| Quiet office (ideal) | 10 | Mic 30cm from speaker, no background noise |
| Open-plan office | 10 | Faint keyboard typing, occasional cough |
| Mobile outdoors | 10 | Traffic noise, wind, 3 bar signal |
| Mobile indoors weak signal | 10 | 1 bar signal, audio breaks |
| Regional accent mix | 10 | Glasgow, Geordie, West Country, Welsh, RP |
Scoring, word error rate (WER) — proportion of words incorrectly transcribed. Lower is better. Anything above 15% is unreadable in practice.
The results
From my experience running this test on our own engine and 5 competitor offerings last month, here is where things landed.
Quiet office (ideal conditions)
| Provider | WER |
|---|---|
| Provider A | 3.2% |
| Provider B | 4.1% |
| Provider C | 4.7% |
| Provider D | 6.9% |
| Provider E | 11.3% |
| Provider F | 14.6% |
Everything under 10% is usable. The last two would already be frustrating. Keep in mind this is the easy condition.
Open-plan office
Results shifted by 2-4 percentage points across the board. Providers A and B held under 7% WER. Provider F hit 18%.
Mobile outdoors
This is where things fell apart. Honestly not gonna lie, some of the results surprised me.
| Provider | WER mobile outdoor |
|---|---|
| Provider A | 8.4% |
| Provider B | 11.7% |
| Provider C | 19.2% |
| Provider D | 23.6% |
| Provider E | 31.0% |
| Provider F | 42.4% |
Provider F's transcripts under 1-bar signal were frequently pure nonsense. 'Thank you for calling about Dave's poodle' was the literal output when the real message was 'thanks for calling about the VoIP trial'.
Regional accent test
Northern, Scottish and Welsh accents were where the engines really diverged. Of our 10 regional-accent recordings:
| Provider | Accent-WER |
|---|---|
| Provider A | 7.8% |
| Provider B | 9.2% |
| Provider C | 13.1% |
| Provider D | 17.4% |
| Provider E | 24.8% |
| Provider F | 29.3% |
The Glaswegian sample had Provider F transcribing 'aye, will you phone me back the morrow' as 'I Wilfred one may back them morrow'. This is not usable.
What is driving the differences
From what I can tell, providers fall into 3 categories of underlying transcription tech.
- Home-grown speech-to-text trained on US English (Provider F). These perform badly on UK accents, full stop.
- Generic cloud APIs like Google Speech or Amazon Transcribe (Providers C, D, E). Decent on neutral accents, mediocre on regional.
- UK-focused models fine-tuned on UK voicemail data (Providers A, B). Best results, especially on accents and noisy conditions.
Our engine at DialPhone is in that third category, trained on anonymised UK voicemail samples we collected with consent over 2 years. We placed 2nd in this test. Provider A (who I am not naming) placed 1st and I respect them for it.
Why this matters more than people realise
Businesses do not just read voicemail transcripts. Most UK VoIP providers now ALSO run the transcript through classification to detect urgency, sentiment, or booking requests. A 40% WER destroys downstream logic. If the transcription says 'Dave's poodle', there is no way the urgency classifier recovers.
If you are evaluating VoIP providers and voicemail is important to your workflow, request 5 test recordings through trial. Use varied conditions. Compute your own WER. You will likely be surprised, in both directions.
The UK VoIP market in 2026 has real quality differences on things like this. They are just rarely measured.









