< Back to previous page

Publication

Signal Processing Algorithms for EEG-based Auditory Attention Decoding

Book - Dissertation

One in five experiences hearing loss. The World Health Organization estimates that this number will increase to one in four in 2050. Luckily, effective hearing devices such as hearing aids and cochlear implants exist with advanced noise suppression and speaker enhancement algorithms that can significantly improve the quality of life of people suffering from hearing loss. State-of-the-art hearing devices, however, underperform in a so-called `cocktail party' scenario, when multiple persons are talking simultaneously. In such a situation, the hearing device does not know which speaker the user intends to attend to and thus which speaker to enhance and which other ones to suppress. Therefore, a new problem arises in cocktail party problems: determining which speaker a user is attending to, referred to as the auditory attention decoding (AAD) problem. The problem of selecting the attended speaker could be tackled using simple heuristics such as selecting the loudest speaker or the one in the user's look direction. However, a potentially better approach is decoding the auditory attention from where it originates, i.e., the brain. Using neurorecording techniques such as electroencephalography (EEG), it is possible to perform AAD, for example, by reconstructing the attended speech envelope from the EEG using a neural decoder (i.e., the stimulus reconstruction (SR) algorithm). Integrating AAD algorithms in a hearing device could then lead to a so-called `neuro-steered hearing device'. These traditional AAD algorithms are, however, not fast enough to adequately react to a switch in auditory attention, and are supervised and fixed over time, not adapting to non-stationarities in the EEG and audio data. Therefore, the general aim of this thesis is to develop novel signal processing algorithms for EEG-based AAD that allow fast, accurate, unsupervised, and time-adaptive decoding of the auditory attention. In the first part of the thesis, we compare different AAD algorithms, which allows us to identify the gaps in the current AAD literature that are partly addressed in this thesis. To be able to perform this comparative study, we develop a new performance metric - the minimal expected switch duration (MESD) - to evaluate AAD algorithms in the context of adaptive gain control for neuro-steered hearing devices. This performance metric resolves the traditional trade-off between AAD accuracy and time needed to make an AAD decision and returns a single-number metric that is interpretable within the application-context of AAD and allows easy (statistical) comparison between AAD algorithms. Using the MESD, we establish that the most robust currently available AAD algorithm is based on canonical correlation analysis, but that decoding the spatial focus of auditory attention from the EEG holds more promise towards fast and accurate AAD. Moreover, we observe that deep learning-based AAD algorithms are hard to replicate on different independent AAD datasets. In the second part, we address one of the main signal processing challenges in AAD: unsupervised and time-adaptive algorithms. We first develop an unsupervised version of the stimulus decoder that can be trained on a large batch of EEG and audio data without knowledge of ground-truth labels on the attention. The unsupervised stimulus decoder is iteratively retrained based on its own predicted labels, resulting in a self-leveraging effect that can be explained by interpreting the iterative updating procedure as a fixed-point iteration. This unsupervised but subject-specific stimulus decoder, starting from a random initial decoder, outperforms a supervised subject-independent decoder, and, using subject-independent information, even approximates the performance of a supervised subject-specific decoder. We also extend this unsupervised algorithm to an efficient recursive time-adaptive algorithm, when EEG and audio are continuously streaming in, and show that it has the potential to outperform a fixed supervised decoder in a practical use case of AAD. In the third part, we develop novel AAD algorithms that decode the spatial focus of auditory attention to provide faster and more accurate decoding. To this end, we use both a linear common spatial pattern (CSP) filtering approach and its nonlinear extension using Riemannian geometry-based classification (RGC). The CSP method achieves a much higher accuracy compared to the SR algorithm at a very fast decision rate. Furthermore, we show that the CSP method is the preferred choice over a similar convolutional neural network-based approach, and is also applicable on different directions of auditory attention, in a three-class problem with different angular domains, using only EEG channels close to the ears, and when generalizing to data from an unseen subject. Lastly, the RGC-based extension further improves the accuracy at slower decision rates, especially in the multiclass problem. To summarize, in this thesis we have developed crucial building blocks for a plug-and-play, time-adaptive, unsupervised, fast, and accurate AAD algorithm that could be integrated with a low-latency speaker separation and enhancement algorithm, and a wearable, miniaturized EEG system to eventually lead to a neuro-steered hearing device.
Publication year:2022
Accessibility:Open