AI noise cancelling headphones could let us pick which sounds to hear

Anyone who owns noise-cancelling headphones will have experienced the blocking out of sounds you’d much rather be hearing, but now a devilishly clever piece of technology has come to our aid.

Whether the problem is with a co-worker trying to get your attention from their desk, or the sound of a car horn as you step off the curb to cross the road, hearing the right noise, at the right time, can be vital.

But at the moment we can’t choose what sounds our headphones cancel out. It’s all or nothing. For example in a busy city if you want to hear a bird sound, it’s nigh on impossible.

“Understanding what a bird sounds like and extracting it from all other sounds in an environment requires real-time intelligence that today’s noise cancelling headphones haven’t achieved,” says Shyam Gollakota, a professor in the School of Computer Science & Engineering at the University of Washington (UW) in the US.

To address this gap, Gollakota and his team have now developed deep-learning algorithms that let users pick which sounds filter through their headphones in real time. 

They call the proof-of-concept system “semantic hearing”.

The team presented its findings earlier this month at the Association for Computing Machinery (ACM) Symposium on User Interface Software and Technology (UIST) 2023 in San Francisco.

The semantic hearing system works by streaming audio from headphones to a connected smartphone, which cancels out all environmental sounds. Then, headphone wearers can select which sounds they want to include through voice commands or a smartphone app.

There are 20 classes of sounds to choose from – such as sirens, baby cries, speech, vacuum cleaners and bird chirps – and only the selected sounds will be played through the headphones.

So, someone taking a walk outside could block out construction noise, yet still hear car horns or emergency sirens.

“The challenge is that the sounds headphone wearers hear need to sync with their visual senses. You can’t be hearing someone’s voice two seconds after they talk to you. This means the neural algorithms must process sounds in under a hundredth of a second,” explains Gollakota.

This means that the semantic hearing system must process sounds on a device, such as a connected smartphone, instead of on cloud servers.

Sounds coming from different directions also arrive in people’s ears at different times. So, the system must keep these delays, and other spatial cues, so that people can still meaningfully perceive sounds in their environment.

The system has been tested in offices, streets, and parks, and was able to extract target sounds while removing all other real-world noise.

However, in some cases it struggled to distinguish between sounds that share many properties, such as vocal music and human speech. Further training on more real-world data may improve these outcomes.

This initial demonstration was carried out with wired headsets connected to a smartphone, but researchers say it’s likely feasible that they will be able to extend the system to wireless headsets.

They plan to release a commercial version of the system in the future.

Support cosmos today

Cosmos is a not-for-profit science newsroom that provides free access to thousands of stories, podcasts and videos every year. Help us keep it that way. Support our work today.

Please login to favourite this article.