Skip to main content
Founding FuelFounding Fuel

When Thinking Starts Making Noise

As speech-to-text tools become eerily good, they are changing not just how we write, but how we think, remember, and inhabit private space

29 May 2026· 6 min read

TL;DR

The advent of eerily accurate speech-to-text tools is reshaping how we write, think, and inhabit private spaces. The article posits that speaking and writing are distinct cognitive acts; keyboards introduce a crucial friction, slowing thought for deliberate editing, whereas dictation collapses this distance, capturing ideas more immediately. While earlier software was unreliable, advanced tools like WisprFlow have transformed the experience. Users find themselves adapting quickly, with dictation fostering a more conversational, spontaneous flow of thought—sometimes revealing arguments mid-sentence, other times exposing the fragility of unarticulated ideas. This shift marks a significant evolution in how we externalise our cognition.
When Thinking Starts Making Noise
Voice software excels at thought dumps, preserving ideas before they disappear. But structured thinking still demands the friction of the keyboard.

I had originally intended to dictate this piece to make a point on the speech software ecosystem because I like it. At some point, I stopped.

By way of background, there is something charming about dictating that dates back to my growing up years. The harried boss, the assistant scribbling in shorthand, the legendary editor dictating a front-page story without pausing. So when speech software became good enough to actually use, I made an ambitious promise to my team: I would dictate this entire piece.

I’ve been using speech software for a while now. Watching entire paragraphs emerge after you’re done talking feels like magic. And with a dash of AI, the software learns, does better, and comes up with frameworks to create interview scaffolding or an architecture on which to drape a narrative. This is certainly way more than what an assistant who took notes in shorthand could do.

But here’s what happened instead. Soon after I started, I caught myself lowering my voice instinctively and whispering into the microphone despite being alone in my room. The sentences wouldn’t flow as seamlessly as I imagined those famous editors dictating narratives. I fluttered between the keyboard and voice.

And then it occurred to me: speech and writing are fundamentally different cognitive acts. The keyboard slows the mind down, inserting a vital friction between impulse and expression. Speech collapses that distance, capturing thought before discipline can entirely intervene—unless you learn to talk slowly and deliberately.

Why Speaking Feels Like Thinking

Some years ago, tools such as Siri, Google Voice Typing, and Otter had given many of us a rough preview of what this future might look like. Frankly, it didn’t work for most of us because Indian accents confused them. Most people eventually returned to keyboards because keyboards, however tiring, remained dependable.

Then, a friend pointed me to WisprFlow. The difference was dramatic enough to alter behaviour almost immediately. I began pacing while speaking into the machine. Ideas that would ordinarily disappear somewhere between instinct and execution suddenly survived long enough to become usable prose. For the first time, typing itself began feeling mechanical in a way I had never consciously noticed before.

What surprised me was not that the software worked. It was how quickly the body adapted once it did. Within days, I had begun reorganising parts of my work around dictation. I would pace while speaking into the laptop. Sometimes I would begin with only a vague sense of direction and discover the argument halfway through the sentence itself. At other times, the opposite would happen. Thoughts that sounded intelligent inside the head collapsed embarrassingly in open air the moment they were spoken aloud.

Writing on a keyboard and speaking into a machine are not identical cognitive acts, even when they appear to produce the same output.

That difference mattered because writing on a keyboard and speaking into a machine are not identical cognitive acts, even when they appear to produce the same output. Typing imposes resistance. Fingers slow thought down just enough for editing to happen almost unconsciously. Speech moves faster than caution. The software captures thought before discipline has entirely intervened.

A founder friend who now dictates almost everything from investor notes to internal memos described the experience to me in unexpectedly physical terms: “Typing felt like manufacturing. Talking felt like thinking.”

The dictated passages were longer, looser, and more conversational than my written ones. Occasionally they also carried more life. For years, professional writing operated on the assumption that polish signalled intelligence. With spoken language, you could hear uncertainty entering the sentence in real time.

The shift is visible once you pay attention to it. Founders increasingly announce products through rambling videos filmed while walking instead of carefully composed blog posts. Nikita Bier, now head of product at X and the brain behind viral apps such TBH and Gas, is a case in point—he prefers the spoken ‘post’. His learnings have shown that modern social products often behave less like permanent infrastructure and more like films—they capture an immediate cultural moment, make their impact, and move on.

Then there are journalists who dictate notes immediately after interviews rather than waiting to process them later at a desk. WhatsApp, for millions of users, has quietly become a voice-first platform where speaking for forty seconds often feels easier than typing for four minutes.

Part of what makes the transition so seductive is that it does not initially feel technological at all. It feels bodily. After spending ten or twelve hours a day writing, editing, replying to messages, transcribing interviews, and navigating screens, one begins noticing the quiet strain accumulating underneath knowledge work. Fingers stiffen. Wrists tighten. Shoulders ache in the background.

That was roughly where Ben Butterworth found himself before he built DuckType, a speech-to-text app. I came across him on LinkedIn. When we spoke, Butterworth did not sound like someone intoxicated by artificial intelligence. If anything, he sounded wary of what the industry was becoming.

His own dependence on dictation systems had begun after repetitive strain injuries and arm surgery made typing increasingly difficult. Over time, he dictated hundreds of thousands of words into machines because he had little choice. Meetings, notes, fragments of thought, and reminders captured while walking around. Speech gradually became easier than typing.

But the more dependent he became on these systems, the more uneasy he grew about what exactly they were capturing along the way.

The Whisper Problem

The unease Butterworth had described first struck me at home. I was dictating into the laptop while someone moved around elsewhere in the house—not listening intentionally, just existing in the same physical space. Halfway through a sentence involving something personal, I lowered my voice instinctively and then stopped speaking altogether.

Typing had never created this problem. Nobody knows what you are writing while your fingers move across a keyboard. A person sitting three feet away may have no idea whether you are drafting a resignation letter, writing to a lawyer, or simply ordering groceries. Modern life quietly depends on that invisibility. The keyboard allowed human beings to think privately while physically surrounded by others. In this sense, keyboards are not just input devices. They are privacy technologies; they are cognitive technologies.

Voice changes the architecture of that arrangement completely.

Once I became conscious of this discomfort, I began noticing behaviour around me differently. In public places, people still mostly inhabit silence through keyboards. Hundreds of individuals sit packed together while mentally occupying entirely separate worlds. Somebody may be negotiating a funding round. Somebody else may be rewriting a legal notice before sending it out. None of that leaks into the room because typing conceals thought while it is still forming. Speech externalises it.

Certain thoughts, I noticed over time, I no longer wanted to dictate aloud. Not because they were scandalous—they were not—but because they weren’t fully formed. A sentence typed onto a screen still feels partly private while it is forming; you can stop midway, delete, reshape, and retreat. Speech offers no such shelter. The moment words leave the mouth, they acquire presence, even when directed only at a machine.

I began noticing this particularly while working through difficult ideas. Good writing often depends on allowing imperfect thoughts enough room to exist temporarily before judgment intervenes. Half-formed arguments, contradictions, emotional reactions—things one may later refine, reverse, or abandon altogether. Dictation altered this habit of thinking.

The software captured everything with equal enthusiasm: clarity, confusion, irritation, and drift. Sentences that would ordinarily die quietly on a keyboard survived long enough to stare back at me from the screen. Sometimes that was useful. Occasionally it was embarrassing. More than once, I caught myself self-editing before speaking rather than after writing.

It reminded me about Butterworth’s comments on the wider ecosystem and his unexpectedly sharp critique. He pointed to instances where software appeared capable of capturing contextual screen information beyond simple dictation. He spoke about products treating user conversations casually, as though convenience automatically entitled a platform deeper access into people’s lives. At one point, he described parts of the industry as “user hostile”.

The phrase stayed with me because a deeper discomfort had already begun creeping in quietly underneath the productivity gains. Are we trusting these systems with too much of the raw material from our evolving minds?

The Things That Do Not Disappear

Human beings have historically treated speech and writing very differently. Speech is fleeting—it disappeared. Writing endured. One belonged to the moment; the other belonged to memory. Conversations remained fluid precisely because they dissolved into air. Writing carried consequence because permanence attached itself to text.

Voice AI quietly collapses that boundary. Now speech too becomes searchable, retrievable, and archivable. The unfinished idea, the frustrated reaction muttered while pacing the room, the moment of uncertainty halfway through a sentence—everything survives longer than intended once machines begin listening continuously.

Perhaps that is why these systems feel psychologically heavier than ordinary software. A keyboard never captured hesitation in your breathing. Speech systems do. They absorb cadence alongside language, fatigue alongside instruction, emotion alongside meaning.

And unlike human beings, machines do not naturally forget.

That thought stayed with me while working on this piece long after the rest of the house had gone quiet. I had been dictating notes into the laptop for a long while, pacing intermittently, and trying to figure how to ‘rewrite’ what I thought hadn’t been articulated right. There were times when the tone would change because I felt the text didn't read right. But the machine was transcribing everything anyway.

Those moments felt absurd. By the end of that night, I stopped pacing. Certain passages simply refused to settle spoken aloud; others felt too exposed once they acquired physical sound. I rested my hands back on the laptop and began typing again.

It was in that quiet shift that the boundary finally drew itself clearly for me. When it comes to the heavy lift of structured thinking—where every sentence must build a logical escalator for the reader—I am ultimately unwilling to use dictation. The keyboard, with all its mechanical friction, remains the essential tool for architecture, editing, and discipline.

But when it comes to the raw, unstructured chaos of a thought dump—where the goal is simply to empty the mind without the baggage of grammar, flow, or punctuation—voice software is unparalleled. It captures life before the filter can kill it.

We originally adopted keyboards because they allowed us to inhabit silence—to think privately while physically surrounded by the world. We are now adopting voice tools that dissolve that very privacy, and we are doing it voluntarily, one frictionless convenience at a time.

By the time I finished, most of this essay had been written on the keyboard. I noticed that only afterward.

Charles Assisi

Co-founder and Director | Founding Fuel

Charles Assisi is an award-winning journalist with two decades of experience to back him. He is co-founder and director at Founding Fuel, and co-author of the book The Aadhaar Effect. He is a columnist for Hindustan Times, one of India's most influential English newspaper. He is vocal in his views on journalism and what shape it ought to take in India. He speaks on the theme at various forums and is often invited by various organizations to teach their teams how to write.

In his last assignment, he wore two hats: That of Managing Editor at Forbes India and Editor at ForbesLife India. As part of the leadership team, his mandate was to create a distinctive business title in a market many thought was saturated. When Forbes India was finally launched after much brainstorming and thinking through, it broke through the ranks and got to be recognized as the most influential business magazine in the country. He did much the same thing with ForbesLife India where he broke from convention and launched the title to critical acclaim.

Before that, he was National Technology Editor and National Business Editor at the Times of India, during the great newspaper wars of 2005. He was part of the team that ensured Times of India maintained top dog status in Mumbai on the face of assaults by DNA and Hindustan Times.

His first big gig came in his late twenties when German media house Vogel Burda marked its India debut with CHIP a wildly popular technology magazine. He was appointed Editor and given a free run to create what he wanted. During this stint, he worked and interacted with all of Vogel Burda's various newsrooms across Europe and Asia.

Charles holds a Masters in Economics from Mumbai Universtity and an MBA in Finance. Along the way he earned the Madhu Valluri Award for Excellence in Journalism and the Polestar Award for Excellence in Business Journalism.

In his spare time, he reads voraciously across the board, but is biased towards psychology and the social sciences. He dabbles in various things that catch his fancy at various points. But as fancies go, many evaporate as often as they fall on him.

Beyond the noise is the signal.

FF Insights: Sharpen your edge, Monday–Friday.
FF Life: Culture, ideas and perspectives you won't find elsewhere — Saturday.

Founding Fuel is sustained by readers who value depth, context, and independent thinking.

If this essay helped you think more clearly, you may choose to support our work.

Illustration of supportersIllustration of supporters

Readers also liked

When Trust Becomes Infrastructure
·Artificial Intelligence

When Trust Becomes Infrastructure

Part 4 of The Future of Work and Agentic AI series: As agents begin working together, trust shifts from intelligence to standards, governance, and control.

AB
Arjo Basu
DM
Debleena Majumdar

Arjo Basu & Debleena Majumdar