A next-gen speech synthesiser

Scientists hoping to construct the next generation of speech synthesisers for the vocally impaired have learned to decode the brain’s neural impulses well enough to produce comprehensible speech.

Not that the scientists have learned to read thoughts. Rather, they are measuring brain activity related to the movements made by the jaw, larynx, lips, and tongue when people are attempting to speak. Using this, they are then deploying computer algorithms to recreate the intended speech by translating the brain activity used to produce it.

The goal, says Edward Chang, a neurosurgeon at the University of California, San Francisco, US, is to help restore the ability to speak to people who have lost it due to ailments such as stroke, paralysis of the vocal chords, amyotrophic lateral sclerosis (ALS), or speech-destroying surgeries.

Current speech-generating technologies, such as the one that was used by the late physicist Stephen Hawking, rely on tiny movements of non-paralysed muscles, such as those in the head or eyes, to control a cursor that can then be used to type.

They work, but are painfully slow.

“The state of the art with these devices is on the order of five to 10 words per minute,” Chang says. He and his colleagues are aiming for “natural rates of communication”, which average 150 words per minute.

The research was done with five epilepsy patients who were undergoing a treatment in which electrodes were inserted onto their brain surface, beneath their skulls. During their hospital stays, they agreed to have their brain activity monitored as they read aloud sentences provided by the researchers, or portions of stories such as Alice in Wonderland.

The scientists then developed a two-stage computer program to figure out how the volunteers’ brain activity correlated to what they were saying.

One stage correlated their brain patterns to the movements of their vocal tracts as they were speaking. The other attempted to convert the movements into words and sentences.

The two-stage approach was challenging, but appears to have greatly enhanced the outcome, says Chethan Pandarinath, a biomedical engineer at Emory University and Georgia Institute of Technology, US, who was not part of the study team.

The simulated speech was then played back to listeners who were asked to write down what they though was being said.

They results were far from perfect, but showed that much of the synthesised speech was indeed comprehensible.

“In one test, 70% of the words were correctly transcribed,” says Josh Chartier, a bioengineering graduate student in Chang’s research group.

“Interestingly,” he adds, “many of the mistaken words were similar [in] sound — for example, ‘rodent’ and ‘rabbit’ — so the gist of the sentence was able to be understood.”

The sentences also tended to be short, with limited context and sometimes unexpected topics such as, “Count the number of teaspoons of soy-sauce that you add,” or, “Shaving cream is a popular item on Halloween”.

They were constructs that almost seemed made to challenge the skill of the listener, although they were actually designed to train the computer algorithms to detect the complexities of English language speech without forcing the study participants to recite hours and hours of repetitive sentences.

And, imperfect as it is, it’s very exciting.

“It’s proof of the principle that it’s possible to generate synthetic speech directly from the brain’s speech centre,” Chang says.

More importantly, one participant was asked both to speak and mime some of the test sentences by making the same movements they would make in saying them, but without sound.

When the brain patterns related to this type of silent speech were fed into the speech synthesiser, they too were successfully synthesised.

The result wasn’t quite as accurate, but it was a crucial test of whether the process can work for its intended beneficiaries.

“This is an interesting finding in the context of future speech prostheses for people unable to speak,” says Blaise Yvert, a neurotechnology researcher at the Grenoble Alps University, France.

That said, Yvert says, the ability to reproduce mimed speech needs to be confirmed in more than one volunteer. Then it needs to be more rigorously tested by asking volunteers to imagine speaking, without miming the relevant motions, “which is the case of paralysed people for whom speech prostheses are intended”.

And at the moment, the new speech-synthesis process has other limitations.

One is that it’s not been tested in any language other than English, though that shouldn’t be a problem, according to Gopala Krishna Anumanchipalli, a postdoctoral researcher in neurosurgery who is also on Chang’s team.

“We [all] share a vocal tract with the same [anatomical] plan,” he says, “so the movements in all languages are similar.”

A more important need is to make the resulting speech sound more natural and intelligible. “There is a lot of engineering going on to improve it,” Chang says. “It’s a rapidly developing field.”

Also important to find out is if the process works for the people who need it, rather than research volunteers who have no problem speaking normally.

“There’s a fundamental question about whether the same algorithms will work in the population that cannot speak,” Chang says.

Nor is it expected to work for everyone, including people who have had a stroke or other brain injury that has damaged their speech centre.

“We are talking about restoring communication where the speech centres are still working, but they can’t get their words out,” Chang says.

A separate question is whether the process can be made to work for people who have never been able to speak before, as opposed to those who once learned how to do so, but then lost the ability.

“That’s an open question,” Chang says, adding that such people may have to learn to speak from scratch. “That will be very exciting to see if this kind of virtual vocal tract can help people who have never spoken before,” he says.

Chang’s research, along with Pandarinath’s commentary on it, both appear in the journal Nature.

Listen to an example of the speech synthesiser in action here.

Please login to favourite this article.