EngineeringJune 2, 20267 min read

Dictation for non-native English speakers: what most apps get wrong.

If English is your second language, your dictation output reflects your first. The stative-verb-in-continuous habit from Hindi or Tamil. 'Since' for duration that's literal Mandarin. 'Discuss about' that Spanish and Portuguese share. Most dictation apps treat these as the speaker's wording and preserve them. We disagree, a polished dictation tool should produce text a native English reader wouldn't pause on. Here's what we rebuilt to make that real.

Why most dictation apps preserve non-native patterns

The post-transcription cleanup step in a dictation app exists to fix the typing of the words, filler words, punctuation, capitalisation, obvious grammar. Most products draw the line there. The reasoning sounds fair: the model shouldn't 'change what the user said.' In practice, that boundary is set by what feels natural to whoever tuned the prompt. Native English speakers don't notice 'I am knowing him since five years' as something to fix; they read past it as someone's stylistic choice. Non-native English speakers wrote that sentence because their first language structures duration and stative verbs that way.

We hit this when we benchmarked our own cleanup against a 140-case suite of non-native dictation patterns and watched it score 51%. The model preserved 'do the needful', 'kindly revert', 'I am having a doubt about it' verbatim. Technically correct. Practically, it was producing text the speaker couldn't send without re-editing.

The 12 patterns Speechcap now normalises

Each pattern below is grounded in a real first language (or several). All of them get normalised in the v3 prompt while keeping the speaker's intent intact.

1. Stative verbs in continuous tense

Common in: Hindi, Tamil, Bengali, Mandarin.

'I am knowing him since five years.' → 'I have known him for five years.'
'She is having two kids and a dog.' → 'She has two kids and a dog.'
'I am liking the new logo.' → 'I like the new logo.'
'The customer is wanting a refund.' → 'The customer wants a refund.'

2. 'Since' for duration

Common in: Hindi (direct translation of से), Mandarin, Tamil. In native English, 'since' marks a starting point; 'for' marks a duration.

'The build is failing since two hours.' → 'The build has been failing for two hours.'
'I am working in this company since five years.' → 'I have been working at this company for five years.'

3. Indian-English business idioms

Common in: Indian English at work, where these are taught explicitly but read as quaint or awkward to most native readers outside South Asia.

'Kindly do the needful and revert back.' → 'Please handle this and reply.'
'Please intimate me about the schedule.' → 'Please let me know about the schedule.'
'We should prepone the standup.' → 'We should reschedule the standup earlier.'
'I passed out from IIT Kanpur in 2015.' → 'I graduated from IIT Kanpur in 2015.'
'I am out of station next week.' → 'I am out of town next week.'

4. Wrong prepositions

Common across most non-English L1s, prepositions are notoriously language-specific.

'I want to discuss about the new pricing.' → 'I want to discuss the new pricing.'
'The customer is angry on the support team.' → 'The customer is angry at the support team.'
'She is married with a doctor.' → 'She is married to a doctor.'
'She has reached at the office already.' → 'She has reached the office already.'
'I was waiting from one hour.' → 'I was waiting for an hour.'

5. Uncountable nouns pluralised

Common in: Indian English, French, Spanish, Italian. English treats certain nouns as mass nouns; many L1s don't.

'I need few advices.' → 'I need some advice.'
'The document has many informations.' → 'The document has a lot of information.'
'We got good feedbacks from the customer.' → 'We got good feedback from the customer.'
'She has done lot of researches.' → 'She has done a lot of research.'
'We need more equipments.' → 'We need more equipment.'

6. Question word order

Common in: Hindi, Mandarin, Indian English.

'What you want to do?' → 'What do you want to do?'
'Where you are going for the offsite?' → 'Where are you going for the offsite?'
'Why she is not joining?' → 'Why isn't she joining?'

7. Greeting / introduction conventions

Common in: Indian English.

'Myself Rohit, I am the new backend engineer.' → 'I\'m Rohit, the new backend engineer.'
'What is your good name?' → 'What is your name?'
'I am having a doubt about the pricing.' → 'I have a question about the pricing.'

8. Self-corrections

Universal, when you correct yourself mid-sentence, the polished output should keep only the correction.

'The meeting is on Tuesday, no wait, Wednesday, at 4 PM.' → 'The meeting is on Wednesday at 4 PM.'

9. Wordiness and emphatic 'only'

Common in: Indian English, Tamil, Telugu. The 'only' emphatic doesn't translate cleanly.

'I called him two times yesterday.' → 'I called him twice yesterday.'
'I am working from home from yesterday only.' → 'I am working from home since yesterday.'
'I am in office today only.' → 'I am in the office today.'

10. Subject-verb agreement

Common across most L1s without conjugation systems that match English's.

'She don\'t know what she\'s doing.' → 'She doesn\'t know what she\'s doing.'
'Everyone are excited about the new office.' → 'Everyone is excited about the new office.'
'The team have decided to push the launch.' → 'The team has decided to push the launch.'

11. Homophones and contractions

Universal, these are mistakes even native speakers make.

'Their going to the conference next weak.' → 'They\'re going to the conference next week.'
'Your welcome to join the demo.' → 'You\'re welcome to join the demo.'
'We should of caught this earlier.' → 'We should have caught this earlier.'

12. Articles

Common in: most languages without an article system (Russian, Mandarin, Hindi, Japanese). The fix has to be conservative, over-correcting reads worse than under-correcting.

'I will send email to the client tomorrow.' → 'I will send an email to the client tomorrow.'
'I am developer working on dictation app.' → 'I\'m a developer working on a dictation app.'
'We need to fix bug before the launch.' → 'We need to fix the bug before the launch.'

How we benchmarked it

We hand-wrote 140 dictation cases across 10 pattern categories (articles, tense, agreement, prepositions, word order, idioms, question formation, uncountables, confusables, and multi-error realistic dictations). Each case has explicit pass criteria, the cleanup output must contain a specific fix and must not preserve the original error pattern.

We tested six prompt variants against the same 140 cases, ranging from the previous production prompt (which scored 51%) to a fully-loaded version with 21 examples (which scored 85% on L2 but lost 4 percentage points on the fluent-English regression bench). The shipping version is the variant that holds the fluent-English score at 99% while pushing L2 to 78%, same as a senior writer's first-pass copyedit, in roughly 1.5 seconds.

Who this is for

If any of these describe your dictation experience, the v3 prompt is built for you:

You write professional English at work but your colleagues quietly fix your prepositions when they edit.
You catch yourself re-reading your dictated messages before sending to remove patterns that 'sound translated.'
You learned English in school in India, Vietnam, China, Brazil, Egypt, the Philippines, Russia, or another non-Anglo country.
You've tried Wispr Flow / SuperWhisper / Apple Dictation and noticed the output preserves patterns your manager or editor would rewrite.

How it works (briefly)

On Speechcap Pro, both the transcription (Whisper Large v3) and the cleanup pass (IBM Granite 4 Micro) run on your Mac. Your audio is never sent to a server. The cleanup model loads once at app launch (~1.9 GB on disk, ~2.5 GB RAM resident) and runs in roughly 1.5 seconds per dictation on M1 Pro. The same v3 prompt also runs in the cloud path on Cloudflare Workers AI for users on the Free tier, quality is identical, only the latency and privacy guarantees differ.

If you want to read more about the architecture, the offline-first design lives at speechcap.com/offline-dictation-mac. The prompt engineering work, the variants we tested, the failure modes we measured, the trade-offs we made, is documented in the bench/ directory of the open-source dictation app repo.

Sources & further reading

Cambridge, World Englishes journal ↗Academic reference for the variations between Indian English, Singaporean English, and other Englishes discussed in the post.
OpenAI Whisper ↗Reference for the transcription model underlying Speechcap's on-device transcription.
IBM Granite 4 H-Micro model card ↗Reference for the cleanup model running on-device in Speechcap.

Frequently asked questions

Does Speechcap change my voice or make me sound less authentic?

No, it changes structural errors that mark patterns as non-native, not stylistic choices. 'I think we should ship on Friday' stays as 'I think we should ship on Friday.' 'I am thinking we should ship on Friday' becomes 'I think we should ship on Friday' because 'think' is a stative verb that doesn't take continuous in English. Voice and intent are preserved; the typing matches what a native reader expects.

Which first languages does this help most with?

Speakers of Hindi, Tamil, Bengali, Marathi, Telugu, Mandarin, Cantonese, Vietnamese, Spanish, Portuguese, Russian, Arabic, Tagalog, and Indonesian see the largest benefit. Some patterns (like 'discuss about') are shared across many L1s; others (like 'do the needful', 'prepone') are specifically Indian-English. The cleanup is conservative, it normalises clear pattern-errors, not regional accent or word choice.

Will it 'over-correct' my English?

It's tuned not to. The benchmark explicitly measures fluent-English preservation alongside L2 normalisation. The shipping version scores 99% on fluent dictation, meaning native-English input passes through with only the standard cleanup (punctuation, capitalisation, filler removal). If you write fluently, you'll get standard cleanup. If you write with non-native patterns, you'll get those normalised.

Does the cleanup send my audio anywhere?

On Speechcap Pro, no, both transcription and AI cleanup run entirely on your Mac. On the Free tier, the audio is sent to a server, transcribed via Whisper, and the transcript is sent to a cloud-hosted Granite 4 Micro for cleanup. Free is functionally identical to Pro in output quality; the difference is where the work happens.

Can I see the exact rules it applies?

Yes, the cleanup prompt is the same code that runs in production. The post above lists the 12 patterns it explicitly normalises, with before/after examples. The full prompt and the 140-case bench it was tuned against are both visible in the Speechcap dictation app's source code (bench/l2-english-bench.mjs and src-tauri/src/cleanup_prompts.rs).

Does this work for accents, or only text patterns?

Only text patterns. Transcription accuracy across accents is Whisper's job, Whisper Large v3 handles non-native English speakers' accents well in our testing across Indian, Hindi-accented, Mandarin-accented, and Brazilian-Portuguese-accented English. The cleanup pass works on the already-transcribed text; it doesn't know what your accent sounded like.

Speechcap Labs · June 2, 2026← All posts