具有預測理解度之預訓練模型和零點控制波束成形技術的雙通道語音增強系統

丁文淵; Wen-Yuan Ting

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88819

標題:	具有預測理解度之預訓練模型和零點控制波束成形技術的雙通道語音增強系統 A Two-channel Speech Enhancement System with a Pre-trained Intelligibility Prediction Model and Null-steering Beamforming
作者:	丁文淵 Wen-Yuan Ting
指導教授:	蘇柏青 Borching Su
關鍵字:	波束成形,零點控制,短時客觀理解度,STOI-Net, beamforming,null-steering,STOI,STOI-Net,
出版年 :	2023
學位:	碩士
摘要:	波束成形技術時常用於許多多通道語音增強系統中，以抑制降低語音理解度的指向性干擾訊號。傳統波束成形器通常基於「波達方向」(direction-of-arrival, DOA)、「能量頻譜密度」(power spectral density, PSD)、「相對轉移函數」(relative transfer function, RTF)、共變異數矩陣等參數的精準估計值來進行最佳化。但精準估計這一些參數有時候是一件很不容易的任務。在這一本論文中，我們提出了一個新的波束成形框架，此框架是基於一個能預測訊號的「短時客觀理解度」(short-time objective intelligibility, STOI) 的預訓練模型：STOI-Net 來提升吵雜語音訊號的理解度。該方法稱作「具理解度意識的零點控制波束成形技術」(intelligibility-aware null-steering beamforming, IANS)。吵雜語音訊號會先送進一群零點控制波束成形器來產生一序列的訊號。這一些訊號會再送進STOI-Net 來決定何者具有最高的理解度。實驗結果顯示我們可以利用一個雙麥克風陣列搭配我們提出的方法在多個情境中提升語音訊號的理解度。其STOI 增強效果類似於在已知目標以及干擾訊號之DOA 的狀況下所產生的波束成形結果。 Beamforming technology is commonly used in many multi-channel speech enhancement systems to suppress directional interfering signals that degrade speech intelligibility. Traditional beamformers are usually optimized based on accurate estimations of parameters such as the direction-of-arrival (DOA), power spectral densities, relative transfer functions, and covariance matrices. However, accurately estimating these parameters could be a challenging task. In this thesis, a novel beamforming framework is proposed to enhance the intelligibility of noisy speech signals based on a pre-trained short-time objective intelligibility (STOI) prediction model, STOI-Net. This framework is referred to as intelligibility-aware null-steering beamforming (IANS). The noisy speech signal is first sent into a set of null-steering beamformer to generate a set of signals. These signals are then sent into STOI-Net which determines the signal corresponding to the highest intelligibility. Experiment results show that our proposed method, using a two-channel microphone array, is capable of generating intelligibility-enhanced speech signals in multiple scenarios. These signals have STOI scores similar to those generated using beamforming methods given the DOAs of the speech and interfering signals.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88819
DOI:	10.6342/NTU202302638
全文授權:	同意授權(全球公開)
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	7.32 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。