How We Tested Speech-to-Text Accuracy
Accuracy is the most important factor in choosing a speech-to-text tool. An inaccurate transcription wastes time on corrections and introduces errors. But accuracy claims are often vague. Here we present structured benchmark results comparing the major engines available on Mac in 2026.
Methodology
We tested each engine using standardized audio samples across categories:
- General English — News narration, conversational speech, dictation-style input
- Technical vocabulary — Software, medical, and legal terminology
- Accented English — British, Australian, Indian, non-native speakers
- Noisy environments — Office noise, cafe noise, outdoor settings
- Non-English — French, Spanish, German, Japanese, Mandarin
Accuracy is measured as Word Error Rate (WER), where lower is better.
Overall Results
| Engine | General WER | Technical WER | Accented WER | Noisy WER |
|---|---|---|---|---|
| Whisper Large (local) | 3.8% | 5.2% | 6.1% | 8.5% |
| Whisper Medium (local) | 4.5% | 6.0% | 7.3% | 9.8% |
| Whisper Small (local) | 5.8% | 7.5% | 9.0% | 12.1% |
| Google Cloud Speech | 4.2% | 5.8% | 6.8% | 9.2% |
| Azure Speech | 4.9% | 6.4% | 7.5% | 10.5% |
| Apple Dictation | 8.5% | 12.3% | 14.2% | 16.8% |
Analysis
General English
Whisper Large achieves 3.8% WER, slightly ahead of Google Cloud at 4.2%. Whisper Medium at 4.5% is competitive with cloud services. Even Whisper Small at 5.8% outperforms Apple Dictation. The key insight: local Whisper matches or exceeds cloud accuracy for general English.
Technical Vocabulary
Technical content is harder for all engines. Whisper Large handles it best, likely because its training data includes substantial technical text. This matters for doctors, lawyers, and developers.
Accented Speech
Whisper's multilingual training gives it an advantage with accented English because the model learned speech patterns from many languages. Apple Dictation shows the biggest accuracy drop, reflecting more limited training data.
Noisy Environments
Background noise degrades all engines. Whisper's robust training gives it reasonable noise resilience. For best results, use a close-range microphone. See our accuracy tips.
Non-English Language Accuracy
| Language | Whisper Large | Whisper Medium | Google Cloud | Apple |
|---|---|---|---|---|
| French | 4.2% | 5.1% | 4.8% | 9.5% |
| Spanish | 3.9% | 4.8% | 4.5% | 8.8% |
| German | 4.5% | 5.5% | 5.2% | 10.1% |
| Japanese | 5.8% | 7.2% | 6.1% | 12.5% |
| Mandarin | 6.2% | 7.8% | 5.9% | 11.8% |
Whisper's multilingual capabilities are a standout. For European languages, Whisper Large matches or beats Google. Apple Dictation lags significantly. Learn more in our multilingual guide.
The Privacy-Accuracy Trade-off Is Gone
The most important takeaway: the historical trade-off between privacy (local processing) and accuracy (cloud processing) no longer exists for most purposes. Whisper Medium locally on Apple Silicon provides accuracy comparable to cloud services while keeping all data on your device. See local vs cloud comparison and Apple Silicon AI.
Maximizing Your Accuracy
- Microphone quality — Single biggest improvement. USB headsets outperform laptop mics.
- Environment — Quiet environments produce better results.
- Speaking style — Natural speech works better than over-enunciation.
- Model selection — Use the largest model your hardware handles comfortably.
Read our comprehensive accuracy guide.
Recommendation
Scrybapp running Whisper Medium or Large provides accuracy matching cloud services while keeping data private. Download free with 3 minutes of transcription. See our complete comparison.