Technical10 min read

Whisper AI Models: Tiny vs Base vs Small vs Medium vs Large

Compare all Whisper AI model sizes for speech-to-text on Mac. Understand accuracy, speed, memory usage, and which model is best for your use case.

Scrybapp

Scrybapp Team

Understanding Whisper AI Model Sizes

OpenAI's Whisper is the speech recognition model that powers the best local dictation tools on Mac, including Scrybapp. But Whisper is not a single model — it comes in five sizes: Tiny, Base, Small, Medium, and Large. Each size represents a different trade-off between accuracy, speed, and resource usage. Choosing the right model can significantly impact your voice typing experience.

This guide provides a detailed comparison of every Whisper model size, including real-world benchmarks on Apple Silicon Macs, so you can make an informed decision.

The Five Whisper Models at a Glance

ModelParametersDisk SizeRAM UsageSpeedError Rate
Tiny39M~75 MB~150 MBFastestHigher
Base74M~140 MB~200 MBVery FastModerate
Small244M~460 MB~500 MBFastGood
Medium769M~1.5 GB~1.5 GBModerateVery Good
Large1550M~3 GB~3 GBSlowerBest

Tiny Model: Maximum Speed

Best For

The Tiny model is ideal for quick, casual dictation where speed matters more than perfect accuracy. Short messages in Slack, WhatsApp, and Discord are perfect use cases.

Performance on Apple Silicon

On M1 Macs, the Tiny model transcribes at roughly 10x real-time speed, meaning a 10-second clip processes in about 1 second. On M2 and M3 Macs, it is even faster. Resource usage is minimal at about 150 MB of RAM.

Accuracy Trade-offs

The Tiny model has the highest word error rate. It handles common English well but struggles with technical vocabulary, proper nouns, accented speech, and non-English languages. For casual messaging, this is usually acceptable.

Base Model: The Step Up

Best For

The Base model offers a noticeable accuracy improvement over Tiny with a modest speed penalty. Good for everyday communication that needs slightly better accuracy: emails, casual documents, and general-purpose dictation.

Performance

Processing speed is roughly 7-8x real-time on M1 Macs. RAM usage increases to about 200 MB. The improvement over Tiny is most noticeable in punctuation, capitalization, and uncommon words.

Small Model: The Sweet Spot

Best For

The Small model is our recommendation for most users. It provides genuinely good accuracy while remaining fast enough for real-time dictation on all Apple Silicon Macs. Professional emails in Gmail, documents in Google Docs or Word, and notes in Notion all work well.

Performance

On M1, the Small model processes at approximately 4-5x real-time. A 30-second dictation transcribes in about 6-7 seconds. On M2 and M3 chips, processing is faster. The Apple Silicon Neural Engine accelerates the Small model particularly well.

Why It Is the Default Recommendation

The Small model hits the optimal point on the accuracy-speed curve. It handles technical vocabulary, proper nouns, and accented speech significantly better than Tiny and Base, while remaining fast enough that transcription feels nearly instantaneous.

Medium Model: Professional Grade

Best For

Recommended for professional use cases where accuracy is critical: medical dictation, legal documentation, technical writing, and academic papers. Also significantly better for non-English languages.

Performance

Processing speed is roughly 2-3x real-time on M1 Macs and faster on newer chips. For a typical 30-second dictation, expect about 10-15 seconds of processing on M1. RAM usage is approximately 1.5 GB.

Large Model: Maximum Accuracy

Best For

The Large model provides the absolute best accuracy Whisper offers. Most valuable for challenging scenarios: heavy accents, noisy environments, rare languages, and highly specialized terminology. Book authors who dictate for extended periods may prefer it.

Performance

Processing is approximately 1-2x real-time on M1. On M2, M3, and M4 Macs with improved Neural Engines, the Large model becomes more practical. RAM usage is approximately 3 GB.

Hardware Considerations

On M1 Macs with 8 GB RAM, the Large model may cause memory pressure alongside other apps. On M2 Pro/Max or M3 Pro/Max with 16+ GB RAM, it runs comfortably.

Choosing the Right Model

Use CaseRecommendedWhy
Slack/Discord messagesTiny or BaseSpeed matters, errors easily corrected
Email and documentsSmallGood accuracy with fast processing
Professional/legal/medicalMedium or LargeAccuracy is critical
Non-English languagesMedium or LargeBetter multilingual recognition
Book dictationMedium or LargeSustained accuracy over long sessions
Coding documentationSmall or MediumTechnical vocabulary handling

Switching Between Models

Scrybapp makes it easy to switch between Whisper models from the menu bar. Use Small for everyday communication and switch to Medium or Large for important documents. This flexibility means you do not have to commit to a single model.

Performance Across Mac Generations

M1 (2020-2021)

All models run well. Small is the best default. Medium is usable with slight delay. Large works but may feel slow for rapid-fire dictation.

M2 (2022-2023)

Improved Neural Engine makes Medium feel nearly as fast as Small on M1. Large becomes practical for sustained use.

M3 and M4 (2023-2025)

Further Neural Engine improvements make even the Large model responsive. Medium is the natural default on these machines.

Get Started

Download Scrybapp with 3 minutes of free transcription and try different models. Start with Small, then experiment with Medium and Large. Read more about how Apple Silicon accelerates local AI and our tips for improving accuracy.

Try Scrybapp Free

Experience the fastest, most private speech-to-text on macOS. 3 minutes free, no sign-up required.

Download for macOS