Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77349
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor劉宗德zh_TW
dc.contributor.advisorTsung-Te Liuen
dc.contributor.author劉議隆zh_TW
dc.contributor.authorYi-Long Liouen
dc.date.accessioned2021-07-10T21:57:36Z-
dc.date.available2024-07-25-
dc.date.copyright2019-07-26-
dc.date.issued2019-
dc.date.submitted2002-01-01-
dc.identifier.citation[1] D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio. End-to-end attention-based large vocabulary speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4945–4949, March 2016.
[2] W. Chan, N. Jaitly, Q. Le, and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4960–4964, March 2016.
[3] Y. Chen, T. Krishna, J. Emer, and V. Sze. 14.5 eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In 2016 IEEE International Solid-State Circuits Conference (ISSCC), pages 262–263, Jan 2016.
[4] C. Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R. J. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski, and M. Bacchiani. State-of the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778, April 2018.
[5] K. Cho, B. van Merriënboer, Ç. Gülçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguistics.
[6] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv e-prints, page arXiv:1406.1078, Jun 2014.
[7] J. Chorowski, D. Bahdanau, K. Cho, and Y. Bengio. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results. arXiv e-prints, page arXiv:1412.1602, Dec 2014.
[8] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attentionbased models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 577–585. Curran Associates, Inc., 2015.
[9] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML’06, pages 369–376, New York, NY, USA, 2006. ACM.
[10] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2016.
[11] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation,9:1735–80, 12 1997.
[12] M. Price, J. Glass, and A. P. Chandrakasan. A 6 mw, 5,000-word real-time speech recognizer using wfst models. IEEE Journal of Solid-State Circuits, 50(1):102–112, Jan 2015.
[13] M. Price, J. Glass, and A. P. Chandrakasan. A low-power speech recognizer and voice activity detector using deep neural networks. IEEE Journal of Solid-State Circuits, 53(1):66–75, Jan 2018.
[14] M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio. Light gated recurrent units for speech recognition. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(2):92–102, April 2018.
[15] J. Xue and J. a. Li. Restructuring of deep neural network acoustic models with singular value decomposition. January 2013.
[16] S. Yin, P. Ouyang, S. Zheng, D. Song, X. Li, L. Liu, and S. Wei. A 141 uw, 2.46 pj/neuron binarized convolutional neural network based self-learning speech recognition processor in 28nm cmos. In 2018 IEEE Symposium on VLSI Circuits, pages 139–140, June 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77349-
dc.description.abstract基於注意力機制的編碼器解碼器端對端語音辨識系統,例如:聽,注意和拼(Listen, Attend and Spell),將傳統的語音辨識系統(Automatic Speech Recognition System, ASR)中的聲學模型(Acoustic Model), 發音模型(Pronunciation Model)和語言模型(Language Model)由一個單一的深度神經網路組成,這給了我們一個機會,可以將語音辨識的整個模型實現在一顆晶片上,然而注意力模型的權重數量仍然太多,需要將大部分的權重放在晶片外(Off-chip)的動態隨機存取存儲器(Dynamic Random Access Memory, DRAM)上,需要時才讀入晶片內,我們希望可以將所有權重都放入晶片內(On-chip)的靜態隨機存取存儲器(Static Random Access Memory, SRAM),因為從晶片外拿取權重是非常消耗能量的,所以在這篇論文內我們運用了數個壓縮模型的方法,分別是修改型門控遞歸單元(Revised GRU),奇異值分解(Singular Value Decomposition),權重修剪(Weight Pruning),權重分享(Weight Sharing)壓縮模型,以便將所有權重放入晶片內,且因為注意力機制在硬體實現上有兩個根本的問題,一個問題是注意力機制的計算量太高,且需要將整個句子都讀入機器後才能做辨識,所以在此篇論文,我們提出了一個單向的窗口演算法(Window Algorithm)來大幅降低計算量,以利於硬體實現,我們使用TIMIT資料庫來驗證結果,增加了2.23%的錯誤率,但減少了98%的參數量。我們也提出了一個適合實現在硬體上的注意力機制資料流(Attention Dataflow),結合所有提出的軟體與硬體最佳化的技巧,使基於注意力機制的序列到序列端對端語音辨識系統加速器總共降低了92.1%的功率消耗,在台積電28奈米製程下,操作在100MHz時,消耗的功率為6.99毫瓦(mW),操作在50MHz時,消耗的功率為3.72毫瓦(mW),操作在2.5MHz時,消耗的功率為725微瓦(uW)。zh_TW
dc.description.abstractAttention-based encoder-decoder such as Listen, Attend and Spellcite{1}, subsume the acoustic (Acoustic Model, AM), pronunciation (Pronunciation Model, PM) and language model (Language Model, LM) components of traditional automatic speech recognition (ASR) system into a single neural network. This give us an opportunity to implement the entire model of ASR on a single chip. However, there are several problems. First, the parameter size of attention-based model is still too much. Most of the weight parameters of model need to be placed on off-chip DRAM and the parameters will be read from off-chip DRAM when used. We wish we can put all of weight of model on on-chip SRAM since taking weight from off-chip DRAM is very energy intensive. In this paper, we use several methods to compress the size of weight. We modify the origin GRU cell to reduce parameters and make it more suitable for hardware implementation. Second, we use singular value decomposition(SVD), weight pruning and weight sharing to further reduce the size of parameters. The second problem is that the computing effort of attention mechanism is too high and the entire sentence needs to be read into the model before decoding. In this paper, we propose a window algorithm to greatly reduce the amount of calculation. We use TIMIT data set to verify our result. We reduced the size of parameter by 98% with increase 2.23% phoneme error rate. We also propose an attention dataflow which is suitable for attention model to implement on hardware. We complete a Attention-based encoder-decoder end-to-end speech recognizer which power is 6.99mW when system operate at 100MHz, power is 3.72mW when system operate at 100MHz and power is 725uW when system operate at 2.5MHz.en
dc.description.provenanceMade available in DSpace on 2021-07-10T21:57:36Z (GMT). No. of bitstreams: 1
ntu-108-R05943051-1.pdf: 10671736 bytes, checksum: 83454139f3f2d7d5224632c437622d3a (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents口試委員審定書 iii
致謝 v
Acknowledgements vii
摘要 ix
Abstract xi
1 緒論 1
1.1 研究動機 1
1.2 研究貢獻 2
1.3 論文架構 2
2 背景知識介紹 3
2.1 傳統自動語音辨識系統介紹 3
2.1.1 系統架構 3
2.1.2 前端訊號處理 4
2.1.3 聲學模型 6
2.1.4 字典 7
2.1.5 語言模型 7
2.2 深度神經網路介紹 8
2.2.1 前饋神經網路 8
2.2.2 遞歸神經網路 10
3 端對端語音辨識系統 13
3.1 系統架構 13
3.2 鏈結式時間分類算法[9] 14
3.3 基於注意力機制的序列到序列編碼器解碼器算法[5] 16
3.3.1 序列到序列編碼器解碼器算法 16
3.3.2 基於注意力機制的序列到序列編碼器解碼器算法 16
3.3.3 基於注意力機制的序列到序列編碼器解碼器語音辨識 18
3.4 傳統語音辨識與端對端語音辨識的比較 19
3.5 鏈結式時間分類算法與基於注意力機制的序列到序列編碼器解碼器算法比較 20
4 端對端語音辨識系統的缺點與挑戰 21
4.1 注意力對齊問題 21
4.2 高度計算量問題 22
4.3 即時性問題 23
4.4 硬體實現問題 23
4.4.1 深度神經網路硬體模型尺寸過大問題 23
4.4.2 傳統一般性處理器資料流問題 24
5 現有的改進方法與硬體實現 27
5.1 時間步長縮減[1] [2] 27
5.2 設置窗口大小[1] [8] 28
5.3 修改型門控遞歸單元[14] 29
5.4 奇異值分解[15] 30
5.5 權重分享[10] 31
5.6 硬體實現 32
6 所提出的演算法與硬體實現 35
6.1 所提出的模型架構 35
6.2 所提出的窗口演算法 37
6.2.1 所提出的窗口演算法 37
6.2.2 所提出的窗口演算法分析與比較 38
6.3 所提出的模型壓縮方法 40
6.3.1 壓縮流程 40
6.3.2 修改型門控遞歸單元 41
6.3.3 單向或雙向網路及神經元數量分析 43
6.3.4 奇異值分解[15] 43
6.3.5 權重剪枝[10] 44
6.3.6 權重分享[10] 45
6.3.7 壓縮結果總結 46
6.4 所提出的硬體實現方法 47
6.4.1 定點數模擬 47
6.4.2 整體硬體架構 48
6.4.3 處理單元陣列模組架構 50
6.4.4 處理單元包裝模組架構 51
6.4.5 激勵函式模組架構 52
6.4.6 點對點相乘模組架構 54
6.4.7 注意力機制模組架構 55
6.4.8 所提出的資料流流程 57
6.4.9 所提出的資料流分析與比較 61
6.5 所提出的硬體效能與功率消耗最佳化分析與比較 63
6.5.1 單元面積分布分析 63
6.5.2 功率消耗分布分析 65
6.5.3 功率最佳化技巧分析 67
6.5.4 論文硬體效能比較 68
7 結論與未來改進方向 69
參考文獻 71
-
dc.language.isozh_TW-
dc.subject注意力機制zh_TW
dc.subject端對端zh_TW
dc.subject語音辨識zh_TW
dc.subject窗口演算法zh_TW
dc.subject壓縮流程zh_TW
dc.subject注意力機制資料流zh_TW
dc.subject低功耗zh_TW
dc.subject加速器zh_TW
dc.subject晶片zh_TW
dc.subjectASICen
dc.subjectAttentionen
dc.subjectEnd-to-Enden
dc.subjectSpeech Recognitionen
dc.subjectWindow Algorithmen
dc.subjectCompression Flowen
dc.subjectAttention Dataflowen
dc.subjectLow Poweren
dc.subjectAcceleratoren
dc.title基於注意力機制之低功耗端對端語音辨識加速器zh_TW
dc.titleA Low-Power Attention-Based End-to-End Speech Recognizeren
dc.typeThesis-
dc.date.schoolyear107-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee闕志達;李宏毅zh_TW
dc.contributor.oralexamcommitteeTzi-Dar Chiueh;Hung-yi Leeen
dc.subject.keyword注意力機制,端對端,語音辨識,窗口演算法,壓縮流程,注意力機制資料流,低功耗,加速器,晶片,zh_TW
dc.subject.keywordAttention,End-to-End,Speech Recognition,Window Algorithm,Compression Flow,Attention Dataflow,Low Power,Accelerator,ASIC,en
dc.relation.page73-
dc.identifier.doi10.6342/NTU201901792-
dc.rights.note未授權-
dc.date.accepted2019-07-25-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-2.pdf
  未授權公開取用
10.42 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved