Medical audio can go catastrophically wrong audiologists hear

Anyone who has used voice-transcription audio software knows it can be glitchy. In my interview notes for a recent article for Cosmos, for example, otter.ai repeatedly rendered the term “hearables” as “hair balls.”

Often this is simply humorous. We know what it meant; no harm done.
Not so with medical transcription technology, which allows doctors to dictate notes directly into patients’ records. There, an error could be catastrophic.

In theory, says Kelly Scott, a primary-care physician in Portland, Oregon, voice transcription is a useful way to speed things up. “But you need to be careful about what it actually thinks you said,” she says.

One problem with medical audio says Bożena Kostek a researcher from Gdańsk University of Technology, Poland, who was speaking at the online meeting of the American Society of Acoustics, is that audio transcription programs tend to be disproportionately trained on English speakers.

But it’s more than that, she says. There are also problems with specialized medical terms.

“The biggest is acronyms,” she says.  An abbreviation that means one thing to a cardiologist might mean something entirely different to an oncologist.
Another problem is that voice notes are often dictated in noisy conditions, such as hospital rooms. “In the medical environment, there is a lot of noise,” Kostek says. “People speaking at the same time, and a lot of beeps.”
Doctors tying to dictate notes, she says, may raise their voices to combat the noise, but that can create a new problem called Lombard speech, in which they not only change their volume, but also timbre, vowel duration, and a number of other things—all of which are anathema to audio-transcription programs.

The best way to combat these problems, Kostek says, is to focus on clear enunciation—and of course, avoid acronyms that might have different meanings in different contexts.

At the same, time, she says, it’s important to avoid over-enunciating because, “that might be too much for the machine learning model.”


Detecting deepfake audio can be hit and miss for humans

Sign up to our weekly newsletter

Please login to favourite this article.