What Speech Recognition Model Should I Use?
ShareSpeak offers three speech recognition engines. Here's how to choose the best one for your teleprompter setup.
Quick Recommendation
- Mac with Apple M1 or newer: Use Native Apple Speech Recognition
- Mac (Intel) or general use: Use Gemini or the largest Whisper model
- Windows: Use Gemini
Available Speech Recognition Models
ShareSpeak provides three speech-to-text engines, each with its own strengths. Your best choice depends on your hardware, the languages you speak, and how you pace your speech.
Native Apple Speech Recognition
If you have a MacBook with an Apple M1 chip or newer (M2, M3, M4), this is the best option hands down. It runs entirely on-device using Apple's neural engine, delivering the fastest recognition speed, the best transcription quality, and the lowest CPU usage of all three models.
It supports true real-time speech recognition — the teleprompter follows your words as you speak with no delay.
Important: Make sure to set the correct language locale in the ShareSpeak settings. Native Apple Speech Recognition does not auto-detect languages — you need to specify which language you're speaking. Not all languages are supported.
Gemini (Google AI)
Gemini is a cloud-based model powered by Google AI. Its biggest advantage is that it processes all languages simultaneously — you don't need to configure a locale. It excels at recognizing regional languages and dialects (e.g., Sicilian, Basque, Catalan) that other engines may not support at all.
Like Apple Speech Recognition, Gemini supports real-time transcription — you can speak continuously without pauses.
Note: Since Gemini is a cloud service, it may occasionally have latency that depends on Google's servers, not on ShareSpeak itself. You also need a Google API key to use it.
Best choice for Windows users — since Apple Speech Recognition is not available on Windows, Gemini is the recommended model.
Whisper (OpenAI)
Whisper is an open-source speech recognition model by OpenAI that runs entirely on your machine. It comes in multiple sizes — from tiny to large. The smaller models are faster but work best with simple vocabulary, while the larger models handle specialized and technical words much better.
Model sizes:
- Tiny / Base — Fast, good for simple vocabulary and everyday language
- Small / Medium — Balanced speed and accuracy
- Large — Best accuracy, handles technical and specialized terms, higher CPU usage
Important: Whisper does not provide true real-time transcription. ShareSpeak uses silence detection to chunk your audio — it transcribes after detecting a pause in your speech. This means Whisper works best when you speak in measured sentences with natural pauses (e.g., speak for 5–15 seconds, pause briefly, then continue).
Model Comparison at a Glance
| Feature | Apple Speech | Gemini | Whisper |
|---|---|---|---|
| Real-time | Yes | Yes | No (silence chunks) |
| Speed | Fastest | Fast | Depends on model size |
| CPU Usage | Minimal | Minimal (cloud) | Medium to High |
| Languages | Set locale manually | All (auto-detect) | Many (set in config) |
| Rare Dialects | Limited | Excellent | Limited |
| Runs Locally | Yes | No (cloud) | Yes |
| API Key Needed | No | Yes | No |
| Platform | Mac M1+ only | Mac & Windows | Mac & Windows |
Real-time Performance Ranking
Native Apple Speech Recognition
Instant word-by-word tracking. Speak naturally at any pace.
Gemini
Real-time with occasional cloud latency. Speak continuously without pauses.
Whisper
Not true real-time. Best with measured speech and natural pauses between sentences.
Recommendations by Platform
Mac (Apple M1+)
Use Native Apple Speech Recognition. It's the fastest, lightest, and most accurate option. Just make sure to set the correct language in settings.
Mac (Intel)
Use Gemini for real-time multilingual recognition, or the largest Whisper model if you prefer offline processing and don't need true real-time.
Windows
Apple Speech Recognition is not available on Windows. Use Gemini for the best experience — it provides real-time recognition and supports all languages. If you don't have a Google API key, Whisper is a solid offline alternative.
When to Use Each Model
Apple Speech Recognition
You're on a Mac M1+ and need the fastest, most responsive teleprompter experience. Best for live presentations, video recording, and screencasting where latency matters.
Gemini
You speak multiple languages, use regional dialects, or are on Windows. Also great when you want to speak continuously without worrying about pauses.
Whisper
You prefer fully offline processing and have a natural speaking pace with pauses. The smallest model works well for everyday vocabulary; the largest model handles technical and specialized terms.
Still not sure which model to pick?
We're happy to help you find the best setup for your workflow:
