基於 Mamba 之語音增強模型

趙容; Rong Chao

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96570

Title:	基於 Mamba 之語音增強模型 Speech Enhancement Based on the Mamba Architecture
Authors:	趙容 Rong Chao
Advisor:	鄭文皇 Wen-Huang Cheng
Co-Advisor:	曹昱 Yu Tsao
Keyword:	一致性損失,Mamba,語音增強,選擇性狀態空間模型, consistency loss,Mamba,speech enhancement,selective state-space model,
Publication Year :	2024
Degree:	碩士
Abstract:	本研究探討了 Mamba 的應用，並應用於語音增強（Speech Enhancement）任務中。Mamba 是一種可擴展的狀態空間模型（SSM），無需使用注意力（Attention）機制的架構。我們將 Mamba 整合到多種基於迴歸（regression）的 SE 模型中（稱為 SEMamba），並在多種配置下進行測試，包括基礎、進階、前因性（causal）和非前因性（non-causal）模型。此外，我們評估了基於訊號層次距離的損失函數以及以評量為導向的方法。實驗結果顯示，在 VoiceBank-DEMAND 數據集上，進階非前因性 SEMamba 配置達到了 3.55 的 PESQ 分數，表現具競爭力。不僅如此，若 SEMamba 與感知對比拉伸（PCS）技術結合，能突破現有的 PESQ 最佳紀錄，達到 3.69 分。值得注意的是，進階非前因性 SEMamba 模型與同類 Transformer 基礎的 SE 方法相比，浮點運算量（FLOPs）減少了約 12%。最後，SEMamba 在作為自動語音識別（ASR）的預處理步驟時也表現出色，結果與近期的頂尖 SE 方法相當。 This study explores the application of Mamba, a scalable state-space model (SSM) that operates without attention mechanisms, for the task of speech enhancement (SE). Specifically, we integrate Mamba into various regression-based SE models (referred to as SEMamba) across multiple configurations, including basic, advanced, causal, and non-causal. Additionally, both signal-level distance-based loss functions and metric-oriented approaches are evaluated. Experimental results demonstrate that SEMamba achieves a competitive PESQ score of 3.55 on the VoiceBank-DEMAND dataset in the advanced, non-causal setup. Moreover, combining SEMamba with Perceptual Contrast Stretching (PCS) establishes a new peak PESQ score of 3.69, setting a state-of-the-art benchmark. Notably, the advanced non-causal SEMamba models show a reduction in FLOPs by approximately 12% compared to equivalent Transformer-based SE methods. Lastly, SEMamba also proves effective as a pre-processing step for automatic speech recognition (ASR), yielding results that rival recent SE approaches.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96570
DOI:	10.6342/NTU202500114
Fulltext Rights:	同意授權(限校園內公開)
metadata.dc.date.embargo-lift:	2030-01-14
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-1.pdf Restricted Access	30.07 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets