請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15418
標題: | 用於邊緣裝置之低耗能關鍵詞擷取系統的研究 A Study on Small-footprint Keyword Spotting for Edge Devices |
作者: | Chuan-You Lin 林傳祐 |
指導教授: | 張智星(Jyh-Shing Jang) |
關鍵字: | 語音喚醒,時延神經網路,對抗式訓練,邊緣裝置, Small-footprint Keyword Spotting,Time Delay Neural Networks,Adversarial Training,Edge Device, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 語音喚醒技術需要具有低耗能特性以利運行在計算資源限制的環境。換句話說,在低耗能限制下我們需要在精準度與延遲時間之間取得平衡。為了達成這些條件,端到端(end-to-end)模型比傳統大詞彙語音辨識(large vocabulary continuous speech recognition, LVCSR)方法更為合適,因為它使用較少記憶體。前人最好的方法深度殘差網路(ResNets)雖已達到很好的精準度,但模型仍然使用超過兩十萬個參數。為了解決這個問題,本篇論文提出以時延神經網路(time-delay neural networks, TDNNs)搭配對抗式訓練(adversarial training)之模型,使模型生成具有較少語者資訊的音素特徵,來達到好的精準度以及減少模型所需參數。本篇論文使用公開的資料集Google Speech Commands來訓練及衡量模型的表現。我們最好的模型使用一萬個參數(達到深度殘差網路參數的96%減少),且錯誤率(error rate) 4.3%與其4.2%相距不大。除了參數量以外,運行時間也是一個重要的衡量標準,因此我們也將模型放入手機裝置來比較所有方法的表現,包含運行時間。基於在手機裝置上的測試,我們能夠決定最適合需求的模型。 Small-footprint keyword spotting needs to use only small memory to run on computationally constrained environment. In other words, we need to strike a balance between accuracy and latency, under the constraint of small memory. To achieve this, end-to-end model is more suitable than Large Vocabulary Continuous Speech Recognition System (LVCSR) since it usu-ally requires less memory. Previous state-of-the-art work based on ResNets achieved good accuracy on keyword spotting, but the model still used more than 200K parameters. To address this issue, this thesis presents a time de-lay neural networks (TDNNs) with adversarial training, which can generate phonetic features with less speaker information to achieve better accuracy and reduce number of model parameters. We used publicly available Google Speech Commands dataset to train and evaluate our models in this study. The best model of our study has 10K number of parameters (achieving 96% re-ductions of the ResNets model) with error rate 4.4%, which is comparable to the ResNets model’s 4.2%. In addition to the number of parameters, latency is also an important metrics, so we put our models on a mobile device to compare all their performance, including latency. Based on this performance test on mobile phones, we can determine the model that suits our needs for various applications. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15418 |
DOI: | 10.6342/NTU202001860 |
全文授權: | 未授權 |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-2607202010280300.pdf 目前未授權公開取用 | 3.4 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。