Aalborg East Campus

Department of Electronic Systems

PhD defence by Shuai Tao

On Monday, October 6, 2025, Shuai Tao will defend his PhD thesis: "DNN-Guided Speech Processing: Speech Presence Probability Estimation and its Applications". After the defence, a small reception will be held at Fredrik Bajers Vej 7, A4-108.

Aalborg East Campus

Fredrik Bajers Vej 7A4-106,
9220 Aalborg East.

  • 06.10.2025 09:30 - 14:30

  • English

  • On location

Aalborg East Campus

Fredrik Bajers Vej 7A4-106,
9220 Aalborg East.

06.10.2025 09:30 - 14:30

English

On location

Department of Electronic Systems

PhD defence by Shuai Tao

On Monday, October 6, 2025, Shuai Tao will defend his PhD thesis: "DNN-Guided Speech Processing: Speech Presence Probability Estimation and its Applications". After the defence, a small reception will be held at Fredrik Bajers Vej 7, A4-108.

Aalborg East Campus

Fredrik Bajers Vej 7A4-106,
9220 Aalborg East.

  • 06.10.2025 09:30 - 14:30

  • English

  • On location

Aalborg East Campus

Fredrik Bajers Vej 7A4-106,
9220 Aalborg East.

06.10.2025 09:30 - 14:30

English

On location

Abstract

Living in an environment with various sounds can easily distort the speech signal, degrading speech quality and intelligibility. To restore speech from ambient noise, speech enhancement techniques have been developed for several decades and have been successfully deployed in various applications, such as hearing aids and mobile communication. Three mainstream speech enhancement approaches are the statistics-based, deep learning-based, and hybrid methods, all of which effectively preserve speech and reduce noise. Nevertheless, many challenges still remain. Firstly, accurately estimating statistical parameters is challenging for statistics-based methods, especially in complex acoustic scenarios such as low signal-to-noise ratio conditions. Secondly, most deep learning-based methods suffer from a lack of transparency and explainability, resulting in limited controllability. Thirdly, due to the influence of various factors on the performance of the hybrid method, it exhibits limited robustness and generalization capability.

In this dissertation, a novel speech enhancement method is developed, integrating deep learning and statistical estimation, providing an explainable and controllable solution. To address the issue of inaccurate statistical estimation, a deep neural network (DNN) model is employed to guide the estimation process. Accordingly, a highly efficient DNN model is designed to ensure accurate statistical estimation while maintaining low model complexity. With an accurately estimated set of statistics, an optimal filter is employed to achieve a balance between noise attenuation and speech distortion. Therefore, each component of the proposed method is transparent and explainable, making it controllable. Furthermore, to enable effective adaptation of each system component, a well-designed statistics update strategy is applied across diverse application scenarios, improving the robustness and generalization capabilities of the system.

As a key statistic in statistics-based methods, the noise power spectral density (PSD) is essential for implementing linear and spatial filtering. Therefore, to estimate the noise PSD, the speech presence probability (SPP), which probabilistically determines speech presence, is employed to update the noise PSD with minimum mean-squared error. In our proposed speech enhancement framework, we aim to improve SPP estimation accuracy using a DNN while maintaining low model complexity. The estimated SPP then guides the estimation of the noise PSD and its covariance matrix, facilitating both single-channel and multi-channel speech enhancement via linear and spatial filtering, respectively. Furthermore, to enhance multi-channel speech enhancement performance, an effective integration strategy is proposed, combining spatial and linear filtering, both guided by the learning-based SPP estimate. Finally, a novel DNN model is proposed to estimate SPP in an array-agnostic framework integrated with spatial filtering, enabling array-agnostic speech enhancement.

Attendees

in the defence
Assessment committee
  • Associate Professor Cumhur Erkut (Chair), Department of Architecture, Design and Media Technology, Aalborg University, Denmark.
  • Associate Professor Konrad Kowalczyk, AGH University of Krakow, Poland.
  • Professor Reinhold Haeb-Umbach, Paderborn University, Germany.
PhD supervisors
  • Professor Mads Græsbøll Christensen, Department of Electronic Systems, Aalborg University, Denmark.
  • Associate Professor Jesper Rindom Jensen, Department of Electronic Systems, Aalborg University, Denmark.
Moderator
  • Associate Professor Christian Sejer Pedersen, Department of Electronic Systems, Aalborg University, Denmark.