Private, offline dictation for Mac

What “offline” actually means in 2026

Most Mac dictation apps that advertise “offline mode” in 2026 mean one of two things: either transcription runs on your Mac but the AI cleanup pass routes through a cloud LLM, or the whole product still needs an internet connection but caches gracefully when you're briefly offline. Both are reasonable engineering tradeoffs. Neither is fully offline.

Fully offline means a specific architectural shape: audio is captured into RAM only, transcribed by a model running on your CPU/GPU, processed by an on-device LLM for cleanup, and typed into your editor, all without a single packet leaving the machine. You can turn the Wi-Fi off, fly to another country, or stand inside a Faraday cage, and dictation keeps working with the same quality.

The distinction matters because privacy policies are promises and architecture is a guarantee. A promise can change with a new TOS revision, a data breach, a subpoena, an acquisition, or a quiet misconfiguration. Architecture stays put. For work that touches client privilege, patient data, regulated information, or NDA-bound source code, the architectural question is the one the compliance team actually asks.

Private dictation vs offline dictation: same thing, different word

People search both ways. “Private dictation for Mac” and “offline dictation for Mac” pull up roughly the same shortlist of apps. That's because the two phrases describe the same property from different angles. “Offline” describes the data path: nothing leaves your machine. “Private” describes the consequence: no third party (cloud provider, AI vendor, eavesdropper on the wire) can read what you said. If a tool runs fully on-device, it is offline by definition and private by consequence.

The reverse isn't true. A “private dictation app” that sends audio to a cloud server with a strong privacy policy is private by promise, not by architecture. The server can still log your audio, get breached, or change its policy at the next funding round. That's the distinction Speechcap's Pro on-device mode draws clearly: your dictation is private because the audio physically never reaches another computer, not because someone wrote a paragraph about it.

Which Mac dictation apps actually run fully offline?

As of the 2026 mid-year landscape, the answer is uncomfortably short:

App	Transcription on-device	AI cleanup on-device	Truly offline?
Speechcap	Yes (Pro)	Yes (Pro) · IBM Granite 4 Micro	Yes, both stages
Superwhisper	Yes	No, typically routes through OpenAI	Partial
MacWhisper	Yes (file transcription)	Partial (post-process pass)	Partial · different category
Wispr Flow	No, cloud-only	No, cloud-only	No
Willow Voice	No, primarily cloud	No, cloud-only	No
Apple Dictation	Optional (lower quality)	No AI cleanup at all	Partial · accuracy drops

What stays on your Mac

With Speechcap Pro's on-device engine selected (Settings → Transcription → On-device), the following data classes never reach our servers:

Audio. Captured into RAM only. Never written to disk, never uploaded, never reaches our servers, never reaches a third-party transcription provider. Discarded after transcription.
Transcripts. The raw Whisper output is processed in-memory by the on-device cleanup model. The cleaned text is typed into your editor and saved to a local history database on your Mac. No cloud copy exists.
AI cleanup prompts. The prompts wrapping your transcript for the cleanup model never leave the machine. Neither the system prompt nor your transcript get sent to any LLM provider.
Custom vocabulary. Your personal word list (names, acronyms, jargon) lives in a local file. Cloud sync is opt-in and clearly labelled; toggle it off in Settings → Vocabulary if you want true air-gap operation.
Dictation history. Every transcribed session is searchable in the app's local History tab. It's a local SQLite database; nothing leaves your Mac.

The only network calls Speechcap makes in on-device mode are: license verification (an intermittent ping with your account ID, no content), and optional crash diagnostics if you haven't disabled them. Both are toggleable in Settings.

Who this is for

Offline dictation isn't about being a stereotypical privacy maximalist. It's about specific workflows where the architectural answer is the only acceptable one.

Lawyers and legal professionals

Attorney-client privilege survives only if privileged content isn't shared with third parties. Dictating draft contracts, depositions, or case notes through a cloud dictation app risks waiving privilege depending on jurisdiction and the cloud provider's subprocessor chain. On-device dictation removes that exposure entirely.

Healthcare and medical practice

We're not yet HIPAA-certified; that's a Q3 2026 target for Speechcap Pro. But the architecture (no audio uploaded, no PHI transcripts in our systems) is the foundation HIPAA compliance will eventually attest to. Practitioners who can't wait for the certification often use Speechcap today on the basis that no PHI reaches our servers in the first place.

Journalists and investigative reporters

Source protection means assuming any cloud you touch is subpoena-able. Reporters dictating notes from a sensitive interview, drafts of a story with an anonymous source, or even just internal editorial messages benefit from architecture that doesn't create discoverable artifacts. Offline dictation produces nothing for a future legal demand to discover.

Security-conscious developers and founders

Dictating draft commit messages that mention unannounced features, PR descriptions for an unfiled patent, founder notes for a future fundraise: these are routine and routinely sensitive. On-device dictation lets you keep voice-as-input as a daily tool without growing the surface area of where your unannounced work exists.

The technical architecture, briefly

Two models do the work, both running on your Mac:

Whisper Large v3 for transcription

Speechcap uses OpenAI's open-source Whisper Large v3 model for speech recognition. On M-series Apple Silicon, transcription runs at 4–8× realtime, so a 30-second dictation transcribes in roughly 4 seconds, end-to-end. The model is ~1.5 GB on disk, memory-mapped at runtime, unloaded when the app is idle. Accuracy on typical English (including accented variants) lands at 95–98% per our internal testing across the supported language set.

IBM Granite 4 Micro for cleanup

The AI cleanup pass (removing fillers, fixing punctuation, applying context-aware formatting, handling in-flight transforms) runs through a 3-billion-parameter on-device LLM. We ship IBM's Granite 4 Micro because it has the best speed-to-quality tradeoff we've measured for the cleanup task on consumer Apple hardware. The model is ~2 GB, loaded on first use, cached for the session.

Hardware requirements

Both models run comfortably on any Apple Silicon Mac (M1 and newer). On Intel Macs, performance is noticeably slower, usable but not snappy. Memory: 16 GB Macs run both models in parallel without pressure; 8 GB Macs work but you'll notice the swap pressure if you have many other apps open. Storage: ~4 GB for both models combined; downloaded once on first launch.

What you trade by going fully offline

Honest tradeoffs, because no architectural choice is free:

Cold-start latency. On the very first dictation after launching Speechcap, the cleanup model takes 2–3 seconds to load. Subsequent dictations are instant. Cloud apps don't have this because they keep models warm on shared infrastructure.
Slightly less polished cleanup on complex prompts. A 3B-parameter local model isn't GPT-4-class. For typical dictation cleanup (fillers, punctuation, light reformatting) it's indistinguishable. For very intricate transforms (e.g. “rewrite this in the style of Hemingway”) you can occasionally tell.
Battery cost. Running both models on-device draws more battery than a network call would. On a MacBook Air, expect ~5–8% extra battery use across a full day of heavy dictation vs cloud mode. On a plugged-in Mac, zero impact.
Disk space. ~4 GB used by the two models. Not nothing, but not significant relative to modern Mac storage.

For most workflows the tradeoffs are imperceptible. The privacy upside is permanent.

Pricing

On-device dictation isn't a paid add-on. It's Speechcap's free plan: unlimited on-device transcription, AI cleanup, transforms, and translation, no card required. Pro (from $3/month, PPP-localised in 89 markets, $6/month top tier) adds the cloud engine — instant setup, no model downloads, fastest latency. New users get a 14-day Pro trial with no credit card.

Compared to other Mac dictation apps, Speechcap Pro is roughly half the price of Wispr Flow and Willow Voice ($12–15/month) and undercuts Superwhisper's monthly tier ($8.49). The architectural difference (full on-device) plus the pricing difference is the central pitch.

Try Speechcap Pro free for 14 days. No card.

Start the trial

Frequently asked questions

What does "offline dictation" actually mean for a Mac app?

It depends on the app. The honest definition: every stage of the dictation pipeline (audio capture, transcription, AI cleanup, and text injection into your editor) runs on the user's Mac with no cloud round-trip. Some apps run transcription locally but still send the transcript to a cloud LLM for cleanup; that's partially offline, not fully offline. Speechcap runs both stages on your Mac — on the free plan; nothing leaves the device.

Which Mac dictation apps work without an internet connection?

Speechcap's free plan runs entirely on your Mac: full Whisper transcription and AI cleanup both local. Superwhisper runs transcription locally but their AI cleanup pass typically routes through a cloud LLM (OpenAI). MacWhisper runs file transcription on-device but has limited live-dictation features. Wispr Flow, Willow Voice, and Apple Dictation's cloud mode all require an internet connection. Apple Dictation has an on-device option but at noticeably lower accuracy.

Is on-device dictation as accurate as cloud-based dictation in 2026?

On modern Apple Silicon Macs, yes. Whisper Large v3 running locally on M1 / M2 / M3 / M4 produces 95–98% accuracy on typical English, comparable to cloud transcription. The accuracy gap that justified cloud-only dictation in 2019 has effectively closed. The remaining quality gaps are about cleanup quality (small on-device LLMs vs frontier cloud models), not transcription itself.

How much disk space and RAM does on-device dictation use?

Whisper Large v3 is about 1.5 GB on disk. The on-device cleanup model (Speechcap uses IBM Granite 4 Micro, ~2 GB) adds another ~2 GB. At runtime, models are memory-mapped and use ~3 GB of RAM when actively dictating. Models are unloaded when idle. On an 8 GB Mac that's tight; 16 GB is comfortable; 24 GB or more leaves room for everything else.

Can I dictate on a plane, in a SCIF, or behind a corporate firewall?

Yes. That's the whole point of full-offline mode. Toggle Wi-Fi off, hold your push-to-talk key, speak. Speechcap will transcribe, AI-clean, and type the result into whatever app you're focused on without any network access. The only thing you lose offline is cloud-synced custom vocabulary; the local list still works.

Why does on-device matter if the company has a privacy policy?

Privacy policies are promises. On-device architecture is a guarantee. A privacy-policy promise can be broken by a future policy change, a data breach, a subpoena, a misconfigured server, or an acquiring company with different priorities. On-device processing removes the data from those failure modes by design. For attorney-client privileged work, patient notes, NDA'd source code, or executive communications, the architectural answer matters more than the policy answer.

Speechcap Labs · 2026-06-01All Mac dictation apps compared →

Offline dictation for Mac that's actually offline.