BdSound works with several devices, operative systems and scenarios, but with the same goal: make audio sound clear. We even named our technology suite after this goal: S2C, which means Simply Sounds Clear.

Our S2C suite is a full set of proprietary technologies and IP that our engineers use with care and expertise to always provide the best audio experience to our customers, and to their customers.

We often need to deal with noisy environments, and we need to use our technologies to remove noise.

In this series of posts, we will describe such BdSound proprietary IPs. The first episode described solutions based on acquisition from multiple microphones through a set of techniques called beamforming. In the second episode we described the basic processing steps for performing noise reduction with just one microphone. In this episode we will describe our noise reduction solution based on Artificial Intelligence.

Previously in this series

In the previous episode we described standard noise reduction techniques and their effectiveness in removing noise with a clear pattern, named stationary noises. We also described the limitations of this kind of basic technology when it needs to address non-stationary noises, which are very short noises or noises without a clear pattern.

In this example, the driver is accelerating, hence the noise pattern changes and is not clear. Toward the middle, the car hits a speed bump, hence a short noise.

A car experiencing a stationary noise and hitting a speed bump, with a short impulse in the waveform

Standard noise reduction techniques have issues reducing scenarios with an abrupt change of the noise pattern

Car with noise and speed bump

This scenario is a nightmare for a traditional noise reduction algorithm, and you can hear it by yourself how much it struggles.

Car with noise and speed bump – reduced with standard techniques

We can hear that there is some residual of noise, and that the system is slow to adapt to the changing noise.

In order to address also this kind of scenarios, our IP combines basic and advanced processing with the help of Artificial Intelligence. But, first of all: what is artificial intelligence?

Mimic the human brain

There is a simple reason why we, as humans, are good at dealing with noise and understand voice even in harsh conditions: we are smart. Our brain is more powerful that any computer. It does more than filter out noise: it focuses on all the characteristics of audio waves that makes them sound like voice. It is so good that sometimes it even tricks itself into thinking you heard a voice, when it was actually a random or unrelated sound.

Human auditory system is able to filter out noises and focus on voice

Human auditory system is able to filter out noises and focus on voice.

What it matters the most is that we, as humans, never had to actively learn to recognize voice. From its birth, babies hear hundreds of hours of sounds and people talking until their brain learn, from such examples, to recognize the aforementioned characteristics of audio waves.

Systems based on Artificial Intelligence (AI) address tasks in a similar fashion. Instead of having engineers code the solution, AI systems automatically learns it from examples. In particular, the most effective type of AI techniques are so dedicated to mimicking the functioning of the human brain that they are named after brain’s neurons: Artificial Neural Networks or Deep Neural Networks.

The power of deep learning

In our case, we want to borrow the ability of the human brain of isolating the voice even in harsh noisy conditions as the one we mentioned. For this task, we employ Deep Neural Networks, which are named deep because of their structure, which involves several layers of computations stacked together.

The first layer receives an input, which is usually a representation of the world: in our example, the audio signal. The first layer processes this input in some mysterious way and outputs something. Then, this output is taken from the next layer, processed and passed to the next layer, and so on and so forth. Our goal is to remove noise or, more precisely, to identify the component of voice in the audio signal. Each layer, therefore, processes the audio signal toward this goal, behaving like a gate and letting voice’s components go further while stopping noise.

A deep neural network acts like a series of gates, each filtering out the noise.

At the end of this computational chain, the last layer will produce the final output, which in our case is the clean voice information. If we think of the input as the bottom layer and the output as the top layer, we have several layers in the middle that makes the structure deep.

But how does the AI know what to keep and what to discard in its processing? This is done by training the AI, which means to presenting it with hundreds of hours of recordings in noisy conditions and the same recordings without noise, so the AI will learn the relationship between the two.

Since the network is deep and must learn, this process is commonly named deep learning.

The challenges of AI

Introducing AI in audio product is challenging for a number of reasons:

  1. AI requires to perform a lot of computations, just like our brain does, so it is hard to implement it into small microprocessor with a limited amount of computational power. We dedicated a huge effort for finding the trade-off between computational feasibility and audio quality, and we are constantly updating our solutions as soon as new microprocessors with more computational power come to the market;
  2. our brain often relies on a small delay between hearing and understanding voice, for example by waiting for the end of a word. Our solution cannot do the same because is designed to process audio on real time, with the shortest possible processing delay;
  3. the more noise we reduce, the higher the distortion we introduce to the voice. We found the perfect balance between voice distortion and noise reduction, because we know that a distorted audio is as annoying as a noisy one.

And here it is, the result of our processing, with our solution able to remove the noise, even that caused by the speed bump.

Car with noise and speed bump – reduced with AI-based Noise Reduction

In conclusion

We live in a noisy world, and our voice communication is always affected by noise.

In BdSound we employ the best mix of beamforming, advanced noise reduction techniques and artificial intelligence to provide our customers with the best technology for noise reduction. Thousands of people use our solutions on a daily basis when calling from their car, when giving commands to their home appliances, when talking with colleagues through videoconference systems.

Want to learn more and exploit the benefits of BdSound AI based Noise Reduction technology for your amazing product? Contact us.