Is Voice the AI slop fix?

“Computer”, “computer?” said Scotty in one of those Star Trek films only to be met with silence and having to resort to a keyboard. How quaint. Using voice to talk to machines has always been something out of science fiction.Till now, that is.Voice I would argue is not a new interface; it is the oldest one, and returning to it may be the most effective way to make AI-generated content feel human.

For almost all of human history, voice was how humans communicated, transferred knowledge and recorded history. The written word achieving parity when looked at over the course of thousands of years of human history is a blip that was given a boost by the fact that the keyboard was the only reliable interface with the machine.

This is not AI inventing something new; it's returning us to something ancient, and generative AI has given us the potential to make the machine meet us where we naturally are.

Generative AI and diffusion models can remove much of the friction of moving between voice, text, images, video, and back again. It matters because images, too, have been a form of communication throughout human history. A picture, after all, speaks a thousand words and we can now communicate ideas much more readily across the media spectrum.

From talking to typing and back again

The most brilliant salesperson I had the pleasure of employing was dyslexic, and I recall the frustration in using the Dragon voice solutions, at the time the market leader in voice-to-text.

All of those solutions captured words and needed you to even dictate the grammar by saying "comma" or "space"; they were effectively a voice keyboard. That is not useful.

The new tooling from our own voice-to-text workflow to tools like Wispr Flow and Willow Voice do much more and even for someone who types at over 100 words per minute, they not only beat my typing speed but also enable the pace of content curation and communication that is needed in today's work.

Our Take: Wispr Flow vs Willow Voice

When we were looking for a voice solution to augment our voice-centric content workflows, we came across Wispr Flow. Not wanting to evaluate just one solution, we also tested Willow Voice. Here is our top-level finding.

In essence, there is very little difference between the two. The user experience and interface are almost identical. In our evaluation, Wispr Flow edged it for a couple of reasons.

Willow Voice tended to cut out mid-dictation during longer sessions. The custom style feature in Wispr Flow proved very useful for our workflows, and it is not something Willow Voice offers yet. Pricing and security features are identical across both.

This feels like a feature-development race, with other similar products entering the market that will push everyone forward. Even if a solution you choose is missing a certain feature today, it will probably appear quite quickly.

Which one you choose is probably entirely subjective. But the reasons above are why we opted for Wispr Flow.

As a significant bonus, it opens up opportunities for people who struggle with the written word - how many storytellers have we missed in modern times?

Returning to a voice-first genesis of content opens up huge possibilities. Perhaps the biggest one is as an antidote to automated AI slop.

Volume without authenticity fails

Typing is what we are forced to do to communicate ideas, sell stuff, engage on social media and so on. Generative AI will easily out-generate a human on speed to word count. But that has never been the real ‘game’. The ‘game’ is to tell stories that are convincing, and that needs authenticity.

It is the place where the AI engines fall down completely because, no matter what you do, the text almost never feels authentic.

Comic contrasts AI avoidance’s competitive disadvantage with generic AI’s disconnected, wordy output.

‍

But, add human voice to that, and things immediately change. This site is an experiment in how AI changes business operations for an independent niche publication, and we demonstrated that an AI engine delivers volume and velocity of content but that it fails spectacularly on one key metric. Engagement.

It was only when we melded that volume and velocity system with our voice-first system that we started getting engagement numbers we were happy with.

That production flow is built from tools we have composed ourselves, combined with, in our case, Wispr Flow.

Solving the ‘deslopification’ of AI

Slopification originates from a place of going, "Hey, here are four lines. I'd like you to write an essay about it," which is what I call ‘word extrusion’.

However, not leveraging AI in your content creation flow is commercial suicide, as is using platforms that claim to do this for you because they, by and large, come from the position described above. The CVO is not creating more words; it's reaching authenticity quickly and systematically.

Only by integrating our original form of storytelling into our AI systems do we have content that cuts through the noise.