< Back to previous page

Project

PhD position in audio signal processing

The spatial information contained in recorded microphone signals has a fundamental role in the characterization of acoustic environments. Furthermore, it can be exploited in digital audio processing frameworks for enhancing a desired source signal, benefiting applications such as hands-free telephony, hearing devices, human-machine interfaces, and acoustic monitoring systems. Despite its acknowledged relevance, accurately estimating and effectively employing spatial information can present significant challenges, due to the diverse acoustic conditions of practical scenarios or limitations imposed by microphone setups.

This thesis focuses on the development and evaluation of audio processing methods for estimating and applying spatial information while addressing a selection of challenges encountered in different applications. The estimation of spatial information is, in this thesis, limited to the problem of source localization. Firstly, two approaches for single-channel source localization are presented, motivated by the limitations in spatial audio analysis capabilities imposed by single-microphone setups, or devices in which simultaneous access to multiple microphone signals is unreliable. Both approaches estimate the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. The solution to a group-sparse regularized optimization problem is used for estimating direction-dependent PSD values relative to a given angular dictionary, and locating peaks in the estimated PSD vector. The methods' performance is evaluated through a series of simulations in which different setup conditions are considered, ranging from different types of model mismatch to variations in the acoustic scene and microphone directivity pattern.

Building on the insights gained from evaluating the single-channel localization methods, a multi-channel source localization method is also presented in this thesis. This method estimates the DOA of multiple broadband sound sources through a similar group-sparse optimization problem, modeling an observed broadband steered response power (SRP) map as a linear combination of PSDs instead. Simulation results demonstrate that the proposed method outperforms conventional methods in certain multi-source scenarios and maintains comparable performance to others while being more computationally efficient. Moreover, it shows superior performance in cases with closely located sources.

After addressing the problem of source localization, this thesis shifts its focus to exploiting spatial information for signal enhancement. The particular task of speech enhancement with a microphone array embedded in an unmanned aerial vehicle (UAV) is considered, which entails highly adverse acoustic conditions, involving high levels of ego-noise. Firstly, an experimental methodology is presented for measuring ego-noise emissions. Then, the feasibility of employing spatial filtering-based methods for speech enhancement in such a challenging scenario is investigated. A speech enhancement method using ego-noise references is presented, based on the prior-knowledge multi-channel Wiener filter (PK-MWF) and speech presence probability (SPP) estimates. Additionally, the method's development is extended to explicitly incorporate estimated DOA information on the target speaker, yielding an alternative implementation. The proposed approaches are evaluated over experimental recordings obtained with a drone operating at constant thrust, in a simulated hovering condition, and results indicate the importance of reliable speech activity estimates in the performance of spatial filtering-based methods such as the PK-MWF. It is shown that employing the DOA-based implementation of the PK-MWF filter significantly improves the signal-to-noise ratio (SNR) of the filtered signal when compared to the result obtained when employing the PK-MWF with SPP estimated from one of the embedded microphones, whereas the perceptual improvement is compromised. Furthermore, it is shown that combining both DOA and SPP-based implementations of PK-MWF can improve performance in terms of perceptual speech quality.

This thesis also presents a research valorization plan, indicating how the research outcomes can be transferred into varying application contexts, while also outlining the potential socio-economic value they could generate in an industrial setting.

Date:15 Mar 2019 →  23 Feb 2024
Keywords:signal processing, scanning radar, audio signal processing, audio engineering, audio analysis, spatial audio, directional microphones
Disciplines:Wireless communication and positioning systems, Computer vision, Analogue and digital signal processing, Audio and speech computing, Audio and speech processing
Project type:PhD project