Whisper AI Models: Tiny vs Base vs Small vs Medium vs Large

Understanding Whisper AI Model Sizes

OpenAI's Whisper is the speech recognition model that powers the best local dictation tools on Mac, including Scrybapp. But Whisper is not a single model — it comes in five sizes: Tiny, Base, Small, Medium, and Large. Each size represents a different trade-off between accuracy, speed, and resource usage. Choosing the right model can significantly impact your voice typing experience.

This guide provides a detailed comparison of every Whisper model size, including real-world benchmarks on Apple Silicon Macs, so you can make an informed decision.

The Five Whisper Models at a Glance

Model	Parameters	Disk Size	RAM Usage	Speed	Error Rate
Tiny	39M	~75 MB	~150 MB	Fastest	Higher
Base	74M	~140 MB	~200 MB	Very Fast	Moderate
Small	244M	~460 MB	~500 MB	Fast	Good
Medium	769M	~1.5 GB	~1.5 GB	Moderate	Very Good
Large	1550M	~3 GB	~3 GB	Slower	Best

Tiny Model: Maximum Speed

Best For

The Tiny model is ideal for quick, casual dictation where speed matters more than perfect accuracy. Short messages in Slack, WhatsApp, and Discord are perfect use cases.

Performance on Apple Silicon

On M1 Macs, the Tiny model transcribes at roughly 10x real-time speed, meaning a 10-second clip processes in about 1 second. On M2 and M3 Macs, it is even faster. Resource usage is minimal at about 150 MB of RAM.

Accuracy Trade-offs

The Tiny model has the highest word error rate. It handles common English well but struggles with technical vocabulary, proper nouns, accented speech, and non-English languages. For casual messaging, this is usually acceptable.

Base Model: The Step Up

Best For

The Base model offers a noticeable accuracy improvement over Tiny with a modest speed penalty. Good for everyday communication that needs slightly better accuracy: emails, casual documents, and general-purpose dictation.

Performance

Processing speed is roughly 7-8x real-time on M1 Macs. RAM usage increases to about 200 MB. The improvement over Tiny is most noticeable in punctuation, capitalization, and uncommon words.

Small Model: The Sweet Spot

Best For

The Small model is our recommendation for most users. It provides genuinely good accuracy while remaining fast enough for real-time dictation on all Apple Silicon Macs. Professional emails in Gmail, documents in Google Docs or Word, and notes in Notion all work well.

Performance

On M1, the Small model processes at approximately 4-5x real-time. A 30-second dictation transcribes in about 6-7 seconds. On M2 and M3 chips, processing is faster. The Apple Silicon Neural Engine accelerates the Small model particularly well.

Why It Is the Default Recommendation

The Small model hits the optimal point on the accuracy-speed curve. It handles technical vocabulary, proper nouns, and accented speech significantly better than Tiny and Base, while remaining fast enough that transcription feels nearly instantaneous.

Medium Model: Professional Grade

Best For

Recommended for professional use cases where accuracy is critical: medical dictation, legal documentation, technical writing, and academic papers. Also significantly better for non-English languages.

Performance

Processing speed is roughly 2-3x real-time on M1 Macs and faster on newer chips. For a typical 30-second dictation, expect about 10-15 seconds of processing on M1. RAM usage is approximately 1.5 GB.

Large Model: Maximum Accuracy

Best For

The Large model provides the absolute best accuracy Whisper offers. Most valuable for challenging scenarios: heavy accents, noisy environments, rare languages, and highly specialized terminology. Book authors who dictate for extended periods may prefer it.

Performance

Processing is approximately 1-2x real-time on M1. On M2, M3, and M4 Macs with improved Neural Engines, the Large model becomes more practical. RAM usage is approximately 3 GB.

Hardware Considerations

On M1 Macs with 8 GB RAM, the Large model may cause memory pressure alongside other apps. On M2 Pro/Max or M3 Pro/Max with 16+ GB RAM, it runs comfortably.

Choosing the Right Model

Use Case	Recommended	Why
Slack/Discord messages	Tiny or Base	Speed matters, errors easily corrected
Email and documents	Small	Good accuracy with fast processing
Professional/legal/medical	Medium or Large	Accuracy is critical
Non-English languages	Medium or Large	Better multilingual recognition
Book dictation	Medium or Large	Sustained accuracy over long sessions
Coding documentation	Small or Medium	Technical vocabulary handling

Switching Between Models

Scrybapp makes it easy to switch between Whisper models from the menu bar. Use Small for everyday communication and switch to Medium or Large for important documents. This flexibility means you do not have to commit to a single model.

Performance Across Mac Generations

M1 (2020-2021)

All models run well. Small is the best default. Medium is usable with slight delay. Large works but may feel slow for rapid-fire dictation.

M2 (2022-2023)

Improved Neural Engine makes Medium feel nearly as fast as Small on M1. Large becomes practical for sustained use.

M3 and M4 (2023-2025)

Further Neural Engine improvements make even the Large model responsive. Medium is the natural default on these machines.

Get Started

Download Scrybapp with $19 lifetime license, unlimited transcription and try different models. Start with Small, then experiment with Medium and Large. Read more about how Apple Silicon accelerates local AI and our tips for improving accuracy.

Whisper AI Models: Tiny vs Base vs Small vs Medium vs Large

Understanding Whisper AI Model Sizes

The Five Whisper Models at a Glance

Tiny Model: Maximum Speed

Best For

Performance on Apple Silicon

Accuracy Trade-offs

Base Model: The Step Up

Best For

Performance

Small Model: The Sweet Spot

Best For

Performance

Why It Is the Default Recommendation

Medium Model: Professional Grade

Best For

Performance

Large Model: Maximum Accuracy

Best For

Performance

Hardware Considerations

Choosing the Right Model

Switching Between Models

Performance Across Mac Generations

M1 (2020-2021)

M2 (2022-2023)

M3 and M4 (2023-2025)

Get Started

Get Scrybapp

Related articles

Cloud-Based Speech Recognition: How It Works, Risks, and Better Alternatives

The Future of Voice Interfaces on Desktop

The Rise of Vibe Coding: Voice-First Development