Voice Assistants (VA) are now integrated into an increasing number of products, used in different contexts and for different purposes.
However, most prepackaged VA solutions are designed to work out of the box and in all environments; as a result, they are not optimized for any scenario in particular. For this reason a customized Acoustic Front End (AFE) is essential for adapting a Voice Assistant to the specific use case, thereby increasing automatic speech recognition (ASR) performance and, ultimately, providing a better user-experience. Consequently, all types of Voice Assistants (pre-built, white-label, custom) allow to use a custom AFE in the complete solution.

An Acoustic Front End is crucial for building a voice assistant in challenging audio scenarios.

In many cases, such as coffee makers or washing machines, in addition to the noise of the environment, the host devices themselves represent a source of noise that, being closer to the microphone, heavily affects the intelligibility of the voice. In these cases, VAs have to work in noisy conditions, resulting in greater difficulty in speech recognition. Noise reduction algorithms are used to reduce the impact of background and device-generated noise on the speech signal, effectively improving VA performance.

The acoustics of the room in which Voice Assistants are placed largely affects the quality of speech recognition. Large and empty rooms cause reverberation, which is the lingering of sound long after its emission. The persistence of sound overlaps with the new sentence being spoken making them more difficult to understand for the VA. Acoustic Front Ends include de-reverberation algorithms that face this issue by estimating the reverberating part of sound and removing it.

When there are multiple people talking, the VA has to understand who to listen to. Using more than one microphone helps the Voice Assistant focus on the user by filtering the sound based on its location using a set of techniques called spatial filtering or beamforming. These algorithms can be used to perform source separation on a single voice among those of interfering speakers. In some scenarios, spatial filtering can be used to focus on a specific position or direction. For example, a Voice Assistant in a vending machine can focus on the customer in front of it while ignoring the sound coming from the surroundings.

Users often invoke their Voice Assistant when it is already playing audio, such as a response to a previous query or music. This operation is called barge-in and, to guarantee a natural interaction with the Voice Assistant, the Acoustic Front End must be equipped with the technology able to filter out the signal played through the speakers from the signal coming into the microphone.

All of these technologies included in the AFE can optimize the performance of a VA by adapting it to the type of context and use case. This guarantees a smooth conversation even under the most challenging conditions.

Do you want to know more?

This post is a part of our white paper The whys and hows of voice enabling your product“.
In this white paper we explain how to choose and integrate a voice assistant into a product in order to make informed choices and fulfill customers’ needs.

Use this form to receive it directly in your mailbox!