Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77349
標題: 基於注意力機制之低功耗端對端語音辨識加速器
A Low-Power Attention-Based End-to-End Speech Recognizer
作者: 劉議隆
Yi-Long Liou
指導教授: 劉宗德
Tsung-Te Liu
關鍵字: 注意力機制,端對端,語音辨識,窗口演算法,壓縮流程,注意力機制資料流,低功耗,加速器,晶片,
Attention,End-to-End,Speech Recognition,Window Algorithm,Compression Flow,Attention Dataflow,Low Power,Accelerator,ASIC,
出版年 : 2019
學位: 碩士
摘要: 基於注意力機制的編碼器解碼器端對端語音辨識系統,例如:聽,注意和拼(Listen, Attend and Spell),將傳統的語音辨識系統(Automatic Speech Recognition System, ASR)中的聲學模型(Acoustic Model), 發音模型(Pronunciation Model)和語言模型(Language Model)由一個單一的深度神經網路組成,這給了我們一個機會,可以將語音辨識的整個模型實現在一顆晶片上,然而注意力模型的權重數量仍然太多,需要將大部分的權重放在晶片外(Off-chip)的動態隨機存取存儲器(Dynamic Random Access Memory, DRAM)上,需要時才讀入晶片內,我們希望可以將所有權重都放入晶片內(On-chip)的靜態隨機存取存儲器(Static Random Access Memory, SRAM),因為從晶片外拿取權重是非常消耗能量的,所以在這篇論文內我們運用了數個壓縮模型的方法,分別是修改型門控遞歸單元(Revised GRU),奇異值分解(Singular Value Decomposition),權重修剪(Weight Pruning),權重分享(Weight Sharing)壓縮模型,以便將所有權重放入晶片內,且因為注意力機制在硬體實現上有兩個根本的問題,一個問題是注意力機制的計算量太高,且需要將整個句子都讀入機器後才能做辨識,所以在此篇論文,我們提出了一個單向的窗口演算法(Window Algorithm)來大幅降低計算量,以利於硬體實現,我們使用TIMIT資料庫來驗證結果,增加了2.23%的錯誤率,但減少了98%的參數量。我們也提出了一個適合實現在硬體上的注意力機制資料流(Attention Dataflow),結合所有提出的軟體與硬體最佳化的技巧,使基於注意力機制的序列到序列端對端語音辨識系統加速器總共降低了92.1%的功率消耗,在台積電28奈米製程下,操作在100MHz時,消耗的功率為6.99毫瓦(mW),操作在50MHz時,消耗的功率為3.72毫瓦(mW),操作在2.5MHz時,消耗的功率為725微瓦(uW)。
Attention-based encoder-decoder such as Listen, Attend and Spellcite{1}, subsume the acoustic (Acoustic Model, AM), pronunciation (Pronunciation Model, PM) and language model (Language Model, LM) components of traditional automatic speech recognition (ASR) system into a single neural network. This give us an opportunity to implement the entire model of ASR on a single chip. However, there are several problems. First, the parameter size of attention-based model is still too much. Most of the weight parameters of model need to be placed on off-chip DRAM and the parameters will be read from off-chip DRAM when used. We wish we can put all of weight of model on on-chip SRAM since taking weight from off-chip DRAM is very energy intensive. In this paper, we use several methods to compress the size of weight. We modify the origin GRU cell to reduce parameters and make it more suitable for hardware implementation. Second, we use singular value decomposition(SVD), weight pruning and weight sharing to further reduce the size of parameters. The second problem is that the computing effort of attention mechanism is too high and the entire sentence needs to be read into the model before decoding. In this paper, we propose a window algorithm to greatly reduce the amount of calculation. We use TIMIT data set to verify our result. We reduced the size of parameter by 98% with increase 2.23% phoneme error rate. We also propose an attention dataflow which is suitable for attention model to implement on hardware. We complete a Attention-based encoder-decoder end-to-end speech recognizer which power is 6.99mW when system operate at 100MHz, power is 3.72mW when system operate at 100MHz and power is 725uW when system operate at 2.5MHz.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77349
DOI: 10.6342/NTU201901792
全文授權: 未授權
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-2.pdf
  未授權公開取用
10.42 MBAdobe PDF
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved