Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78431
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊家驤
dc.contributor.authorYi-Cheng Linen
dc.contributor.author林邑政zh_TW
dc.date.accessioned2021-07-11T14:56:39Z-
dc.date.available2025-02-24
dc.date.copyright2020-02-24
dc.date.issued2020
dc.date.submitted2020-02-18
dc.identifier.citation[1] D. S. Pavan Kumar, “Keras Interface for Kaldi ASR,” Sep. 2017. [Online]. Available:https://github.com/dspavankumar/keraskaldi. [Accessed: 18Jan2020].
[2] Michael Price, James Glass, and Anantha P. Chandrakasan, “A scalable speech recognizerwith deepneuralnetworkacoustic models and voiceactivatedpower gating,”IEEE International SolidStateCircuits Conference (ISSCC), pp. 244245,Feb. 2017.
[3] Michael Price, James Glass, and Anantha P. Chandrakasan, “A LowPowerSpeechRecognizer and Voice Activity Detector Using Deep Neural Networks,” IEEE Journalof SolidStatesCircuits (JSSC), vol. 53, no. 1, pp. 6675,Jan. 2018.
[4] Yajie Miao, “Kaldi+PDNN: Building DNNbasedASR Systems with Kaldi andPDNN,” arXiv:1401.6984, Jan. 2014.
[5] Leo Liu, “Acoustic Models for Speech Recognition Using Deep Neural NetworksBased on Approximate Math,” Ph.D. dissertation, Dept. of Electrical Engineeringand Computer Science, Massachusetts Institute of Technology, Cambridge, MA,2015.
[6] Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and YunLiang, “CLSTM:Enabling Efficient LSTM using Structured Compression Techniqueson FPGAs,” ACM/SIGDA International Symposium on FieldProgrammableGate Arrays (FPGA), pp. 1120,Feb. 2018.
[7] Chang Gao, Daniel Neil, Enea Ceolini, ShihChiiLiu, and Tobi Delbruck,“DeltaRNN: A PowerefficientRecurrent Neural Network Accelerator,”ACM/SIGDA International Symposium on FieldProgrammableGate Arrays(FPGA), pp. 2130,Feb. 2018.
[8] Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan,Yunxin Liu, Ming Wu, and Lintao Zhang, “Efficient and Effective Sparse LSTMon FPGA with BankBalancedSparsity,” ACM/SIGDA International Symposium onFieldProgrammableGate Arrays (FPGA), pp. 6372,Feb. 2019.
[9] Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, DongliangXie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally, “ESE:Efficient Speech Recognition Engine with Sparse LSTM on FPGA,” ACM/SIGDAInternational Symposium on FieldProgrammableGate Arrays (FPGA), pp. 7584,Feb. 2017.
[10] Taesup Moon, Heeyoul Choi, Hoshik Lee, and Inchul Song, “RNNDROP: A noveldropout for RNNS in ASR,” IEEE Workshop on Automatic Speech Recognition andUnderstanding (ASRU), pp. 6570,Dec. 2015.
[11] Mirco Ravanelli, Titouan Parcollet, and Yoshua Bengio, “The PyTorchKaldiSpeechRecognition Toolkit,” International Conference on Acoustics, Speech and SignalProcessing (ICASSP), 2019.
[12] Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and ChangshuiZhang, “Learning Efficient Convolutional Networks Through Network Slimming,”IEEE International Conference on Computer Vision (ICCV), pp. 27362744,Oct.2017.
[13] Yihui He, Xiangyu Zhang, and Jian Sun, “Channel Pruning for AcceleratingVery Deep Neural Networks,” IEEE International Conference on Computer Vision(ICCV), pp. 13891397,Oct. 2017.
[14] JianHaoLuo, Jianxin Wu, and Weiyao Lin, “ThiNet: A Filter Level Pruning Methodfor Deep Neural Network Compression,” IEEE International Conference on ComputerVision (ICCV), pp. 50585066,Oct. 2017.
[15] Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang, “Soft FilterPruning for Accelerating Deep Convolutional Neural Networks,” International JointConference on Artificial Intelligence (IJCAI), pp. 22342240,July 2018.
[16] Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus, “ExploitingLinear Structure Within Convolutional Networks for Efficient Evaluation,”International Conference on Neural Information Processing Systems (NIPS), vol. 1,pp. 12691277,Dec. 2014.
[17] Marc Masana, Joost van de Weijer, Luis Herranz, Andrew D. Bagdanov, and JoseM. Alvarez, “DomainAdaptiveDeep Network Compression,” IEEE InternationalConference on Computer Vision (ICCV), pp. 42894297,Oct. 2017.
[18] ShuChangZhou, YuZhiWang, He Wen, QinYaoHe, and YuHengZou, “BalancedQuantization: An Effective and Efficient Approach to Quantized Neural Networks,”Journal of Computer Science and Technology (JCST), pp. 667682,2017.
[19] Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo, “WeightedEntropyBasedQuantizationfor Deep Neural Networks,” IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 54565464,June 2017.
[20] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi, “XNORNet:ImageNet Classification Using Binary Convolutional Neural Networks,” EuropeanConference on Computer Vision (ECCV), pp. 525542,Oct. 2016.
[21] Zefan Li, Bingbing Ni, Wenjun Zhang, Xiaokang Yang, and Wen Gao, “PerformanceGuaranteed Network Acceleration via HighOrderResidual Quantization,” IEEEInternational Conference on Computer Vision (ICCV), pp. 25842592,Oct. 2017.
[22] Yiwen Guo, Anbang Yao, Hao Zhao, and Yurong Chen, “Network Sketching: ExploitingBinary Structure in Deep CNNs,” IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pp. 59555963,July 2017.
[23] Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong Wang,and Hongbin Zha, “Alternating MultibitQuantization for Recurrent Neural Networks,”International Conference on Learning Representations (ICLR), Jan. 2018.
[24] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, AndrewHoward, Hartwig Adam, and Dmitry Kalenichenko “Quantization and Training ofNeural Networks for Efficient IntegerArithmeticOnlyInference,” IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), pp. 27042713,June2018.
[25] Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, and Yoshua Bengio, “LightGated Recurrent Units for Speech Recognition,” IEEE Transactions on EmergingTopics in Computational Intelligence (TETCI), vol. 2, no. 2, pp. 92102,Apr. 2018.
[26] Alex Graves, Navdeep Jaitly, and AbdelrahmanMohamed, “Hybrid speech recognitionwith Deep Bidirectional LSTM,” IEEE Workshop on Automatic SpeechRecognition and Understanding (ASRU), pp. 273278,Dec. 2013.
[27] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and LiangChiehChen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” IEEEConference on Computer Vision and Pattern Recognition (CVPR), pp. 45104520,June 2018.
[28] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek,Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz,Jan Silovsky, Georg Stemmer, and Karel Vesely “The Kaldi Speech RecognitionTool,” IEEE Workshop on Automatic Speech Recognition and Understanding(ASRU), 2011.
[29] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer “Automaticdifferentiation in PyTorch,” Conference on Neural Information ProcessingSystems (NIPs), 2017.
[30] John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, and DavidS. Pallett, “DARPA TIMIT acousticphoneticcontinous speech corpus CDROM,”NASA STI/Recon technical report, NIST speech disc 11.1,1993.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78431-
dc.description.abstract語音辨識是將語音訊號轉換成文字的技術,目前已廣泛應用於可攜式裝置、智慧家庭、機器人、自動駕駛等。為了達到更好的辨識度,語音辨識系統可採用雙向遞迴神經網路,但需要大量的資料計算與儲存。傳統的雙向遞迴神經網路通常使用長短期記憶模型,本論文採用輕量門控遞迴單元作為其運算核心,可以在維持準確率的同時,有效地減少模型的參數量與運算複雜度。然而,執行語音辨識時,雙向遞迴神經網路仍耗費大多數的運算時間,因此本研究整合數種神經網路壓縮技術,包含比例係數剪枝、多位元叢聚與線性量化,以大幅地降低神經網路模型大小及所需運算量。其中比例係數剪枝將擁有較小比例係數的神經元及所有對應權重進行剪枝,可降低88%的記憶體與運算量。多位元叢聚將權重分類成數個群體,並在儲存時使用索引指數標記,進一步降低84%的權重記憶體。線性量化減少輸入與權重的位元數,可降低91%的運算量。最終該神經網路壓縮技術可以在只降低1%的準確率下減少98%的模型大小,僅需593KB的儲存空間即可在TIMIT語料庫上達到15.3%的音素錯誤率。本論文所提出之加速器硬體架構以可程式化邏輯陣列進行驗證,和文獻上最佳設計相比,達到3.5-8.2倍的吞吐量,而在只輸入單一筆語音訊號的情形下,每秒運算幀數達到2.4-16.5倍,並具有最低的音素錯誤率。zh_TW
dc.description.abstractSpeech recognition is a technique that converts speech signals to texts. It has been extensively used for a variety of applications, including wearable devices, home automation, robots, and self-driving cars. To enhance the accuracy, bidirectional recurrent neural networks (BRNN) are adopted for speech recognition. This introduces an immense amount of computational complexity and a huge model size. In contrast to the conventional long short-term memory (LSTM), light gated recurrent unit (Light GRU) is adopted in this work to achieve a high accuracy but with fewer parameters. Since the BRNN dominates the execution time for speech recognition, this work presents a network compression scheme that consists of scaling factor pruning (SFP), multi-bit clustering (MBC), and linear quantization (LQ). SFP prunes the neurons based on trainable scaling factors, and compress both the model size and computational complexity by 88%. MBC classifies weights into groups using indices to further compress the model size by 84%. LQ decreases the bitwidth of inputs and weights, and further reduces the computational complexity by 91%. Based on the proposed network compression scheme, the model size is reduced by 98%, with only a 1% drop in accuracy. The phone error rate for the TIMIT corpus remains at 15.3% with a model size of merely 593 KB. Compared to current state-of-the-art designs, the presented BRNN FPGA (Field-Programmable Gate Array) accelerator achieves a 3.5-to-8.2x higher throughput and a 2.4-to-16.5x higher frame rate at a batch size of one, despite the lowest phone error rate.en
dc.description.provenanceMade available in DSpace on 2021-07-11T14:56:39Z (GMT). No. of bitstreams: 1
ntu-109-R06943023-1.pdf: 6615229 bytes, checksum: 87c424eef02bcfc199f8e33e3444ae05 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents口試委員會審定書 ii
誌謝 iii
摘要 iv
ABSTRACT v
Contents vii
List of Figures ix
List of Tables x
1 INTRODUCTION 1
2 PRELIMINARIES 5
2.1 Speech Recognition System 5
2.2 Feature Extraction 5
2.3 RNN Model 6
2.3.1 Light GRU 7
2.3.2 Weight-sharing Bidirectional RNNs 9
2.3.3 RNN Model in Speech Recognition 9
2.4 Linguistic Search Algorithm 11
3 NETWORK COMPRESSION 14
3.1 Scaling Factor Pruning 15
3.1.1 Structured Pruning 15
3.1.2 Network Slimming 15
3.1.3 Improvement and Implementation 16
3.1.4 Experimental Result 18
3.2 Multi-bit Clustering 19
3.2.1 Algorithm 19
3.2.2 Improvement and Implementation 21
3.2.3 Experimental Result 22
3.3 Linear Quantization 22
3.4 Effect of Network Compression Scheme 24
4 BRNN HARDWARE ACCELERATOR 26
4.1 Overall Architecture 26
4.2 On-chip Memory 27
4.3 LiGRU Engine 28
4.4 Hidden State Buffer 30
4.5 Computation Flow 31
5 EXPERIMENTAL VERIFICATION 33
5.1 System Setup 33
5.2 Performance Evaluation 34
5.3 Hardware Implementation 34
5.4 Design Comparison 36
6 CONCLUSION 38
References 39
dc.language.isoen
dc.subject可程式化邏輯陣列實現zh_TW
dc.subject語音辨識zh_TW
dc.subject遞迴神經網路zh_TW
dc.subject雙向遞迴神經網路zh_TW
dc.subject輕量門控遞迴單元zh_TW
dc.subject神經網路壓縮zh_TW
dc.subjectFPGA implementationen
dc.subjectspeech recognitionen
dc.subjectrecurrent neural network (RNN)en
dc.subjectbidirectional RNN (BRNN)en
dc.subjectlight GRUen
dc.subjectnetwork compressionen
dc.title應用於語音辨識之雙向遞迴神經網路硬體加速器設計與實現zh_TW
dc.titleDesign and Implementation of a Bidirectional RNN FPGA Accelerator for Speech Recognitionen
dc.typeThesis
dc.date.schoolyear108-1
dc.description.degree碩士
dc.contributor.oralexamcommittee簡韶逸,劉宗德,闕河鳴
dc.subject.keyword語音辨識,遞迴神經網路,雙向遞迴神經網路,輕量門控遞迴單元,神經網路壓縮,可程式化邏輯陣列實現,zh_TW
dc.subject.keywordspeech recognition,recurrent neural network (RNN),bidirectional RNN (BRNN),light GRU,network compression,FPGA implementation,en
dc.relation.page43
dc.identifier.doi10.6342/NTU202000421
dc.rights.note有償授權
dc.date.accepted2020-02-19
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電子工程學研究所zh_TW
dc.date.embargo-lift2025-02-24-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-109-R06943023-1.pdf
  未授權公開取用
6.46 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved