請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91760
標題: | 基於語音基石模型之語者自動分段標記系統 Improved Speaker Diarization Based on Speech Foundation Models |
作者: | 李高迪 Ko-Tik Lee |
指導教授: | 李宏毅 Hung-Yi Lee |
關鍵字: | 語者自動分段標記,語音基石模型, speaker diarization,speech foundation model, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 當前,語者自動分段標記系統 (speaker diarization) 主要運用三種方法:階段性、端到端和端到端-階段性混合系統。端到端系統在某些資料集上顯著優於其他 方法,引起廣泛關注。然而,這種系統可能在實際應用中面臨泛用性限制,而階段性系統的潛力可能被低估。與此同時,近期的語音基石模型 (speech foundation model) 在多項語音任務中表現出色,顯示出其廣泛應用的潛力。然而,在語者自動分段標記方面,對其應用尚未深入探討。 因此,本研究旨在將語音基石模型應用於語者自動分段標記相關任務,進行性能比較並進行表現基準化。同時,針對階段性系統存在的問題,提出了改進方法,例如具緩衝區意識的話語開始點偵測和聚類純化,顯著提升了其性能。最後,透過域外評估方法,證實了端到端-階段性混合系統的泛用性問題,並提出了改進方法。本論文改進後的階段性和端到端-階段性混合系統在多個資料集上實現了與最先進技術相當甚至更優越的表現。 Currently, speaker diarization systems primarily employ three methods: incremental, end-to-end, and hybrid incremental end-to-end systems. The end-to-end approach has shown significant superiority over other methods in certain datasets, garnering widespread attention. However, this system might face limitations in real-world applications, potentially underestimating the potential of incremental systems. Simultaneously, recent advancements in speech foundation models have showcased outstanding performance across multiple speech tasks, indicating their broad applicability. Nevertheless, their application specifically in speaker diarization remains insufficiently explored. Therefore, this study aims to apply speech foundation models to tasks related to speaker diarization, conducting performance comparisons and standardization. Additionally, addressing issues present in incremental systems, proposed enhancements such as collar-aware speech onset detection and cluster outlier handling significantly improved their performance. Finally, through out-of-domain evaluations, the limitations of the hybrid systems were confirmed, along with proposed solutions for improvement. The refined incremental and hybrid systems in this paper achieved comparable or even superior performance to state-of-the-art methods across multiple datasets |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91760 |
DOI: | 10.6342/NTU202400179 |
全文授權: | 同意授權(限校園內公開) |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-1.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.52 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。