Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96570
Title: 基於 Mamba 之語音增強模型
Speech Enhancement Based on the Mamba Architecture
Authors: 趙容
Rong Chao
Advisor: 鄭文皇
Wen-Huang Cheng
Co-Advisor: 曹昱
Yu Tsao
Keyword: 一致性損失,Mamba,語音增強,選擇性狀態空間模型,
consistency loss,Mamba,speech enhancement,selective state-space model,
Publication Year : 2024
Degree: 碩士
Abstract: 本研究探討了 Mamba 的應用,並應用於語音增強(Speech Enhancement)任務中。Mamba 是一種可擴展的狀態空間模型(SSM),無需使用注意力(Attention)機制的架構。我們將 Mamba 整合到多種基於迴歸(regression)的 SE 模型中(稱為 SEMamba),並在多種配置下進行測試,包括基礎、進階、前因性(causal)和非前因性(non-causal)模型。此外,我們評估了基於訊號層次距離的損失函數以及以評量為導向的方法。實驗結果顯示,在 VoiceBank-DEMAND 數據集上,進階非前因性 SEMamba 配置達到了 3.55 的 PESQ 分數,表現具競爭力。不僅如此,若 SEMamba 與感知對比拉伸(PCS)技術結合,能突破現有的 PESQ 最佳紀錄,達到 3.69 分。值得注意的是,進階非前因性 SEMamba 模型與同類 Transformer 基礎的 SE 方法相比,浮點運算量(FLOPs)減少了約 12%。最後,SEMamba 在作為自動語音識別(ASR)的預處理步驟時也表現出色,結果與近期的頂尖 SE 方法相當。
This study explores the application of Mamba, a scalable state-space model (SSM) that operates without attention mechanisms, for the task of speech enhancement (SE). Specifically, we integrate Mamba into various regression-based SE models (referred to as SEMamba) across multiple configurations, including basic, advanced, causal, and non-causal. Additionally, both signal-level distance-based loss functions and metric-oriented approaches are evaluated. Experimental results demonstrate that SEMamba achieves a competitive PESQ score of 3.55 on the VoiceBank-DEMAND dataset in the advanced, non-causal setup. Moreover, combining SEMamba with Perceptual Contrast Stretching (PCS) establishes a new peak PESQ score of 3.69, setting a state-of-the-art benchmark. Notably, the advanced non-causal SEMamba models show a reduction in FLOPs by approximately 12% compared to equivalent Transformer-based SE methods. Lastly, SEMamba also proves effective as a pre-processing step for automatic speech recognition (ASR), yielding results that rival recent SE approaches.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96570
DOI: 10.6342/NTU202500114
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2030-01-14
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-1.pdf
  Restricted Access
30.07 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved