應用於語音辨識之雙向遞迴神經網路硬體加速器設計與實現

Yi-Cheng Lin; 林邑政

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78431

Title:	應用於語音辨識之雙向遞迴神經網路硬體加速器設計與實現 Design and Implementation of a Bidirectional RNN FPGA Accelerator for Speech Recognition
Authors:	Yi-Cheng Lin 林邑政
Advisor:	楊家驤
Keyword:	語音辨識,遞迴神經網路,雙向遞迴神經網路,輕量門控遞迴單元,神經網路壓縮,可程式化邏輯陣列實現, speech recognition,recurrent neural network (RNN),bidirectional RNN (BRNN),light GRU,network compression,FPGA implementation,
Publication Year :	2020
Degree:	碩士
Abstract:	語音辨識是將語音訊號轉換成文字的技術，目前已廣泛應用於可攜式裝置、智慧家庭、機器人、自動駕駛等。為了達到更好的辨識度，語音辨識系統可採用雙向遞迴神經網路，但需要大量的資料計算與儲存。傳統的雙向遞迴神經網路通常使用長短期記憶模型，本論文採用輕量門控遞迴單元作為其運算核心，可以在維持準確率的同時，有效地減少模型的參數量與運算複雜度。然而，執行語音辨識時，雙向遞迴神經網路仍耗費大多數的運算時間，因此本研究整合數種神經網路壓縮技術，包含比例係數剪枝、多位元叢聚與線性量化，以大幅地降低神經網路模型大小及所需運算量。其中比例係數剪枝將擁有較小比例係數的神經元及所有對應權重進行剪枝，可降低88%的記憶體與運算量。多位元叢聚將權重分類成數個群體，並在儲存時使用索引指數標記，進一步降低84%的權重記憶體。線性量化減少輸入與權重的位元數，可降低91%的運算量。最終該神經網路壓縮技術可以在只降低1%的準確率下減少98%的模型大小，僅需593KB的儲存空間即可在TIMIT語料庫上達到15.3%的音素錯誤率。本論文所提出之加速器硬體架構以可程式化邏輯陣列進行驗證，和文獻上最佳設計相比，達到3.5-8.2倍的吞吐量，而在只輸入單一筆語音訊號的情形下，每秒運算幀數達到2.4-16.5倍，並具有最低的音素錯誤率。 Speech recognition is a technique that converts speech signals to texts. It has been extensively used for a variety of applications, including wearable devices, home automation, robots, and self-driving cars. To enhance the accuracy, bidirectional recurrent neural networks (BRNN) are adopted for speech recognition. This introduces an immense amount of computational complexity and a huge model size. In contrast to the conventional long short-term memory (LSTM), light gated recurrent unit (Light GRU) is adopted in this work to achieve a high accuracy but with fewer parameters. Since the BRNN dominates the execution time for speech recognition, this work presents a network compression scheme that consists of scaling factor pruning (SFP), multi-bit clustering (MBC), and linear quantization (LQ). SFP prunes the neurons based on trainable scaling factors, and compress both the model size and computational complexity by 88%. MBC classifies weights into groups using indices to further compress the model size by 84%. LQ decreases the bitwidth of inputs and weights, and further reduces the computational complexity by 91%. Based on the proposed network compression scheme, the model size is reduced by 98%, with only a 1% drop in accuracy. The phone error rate for the TIMIT corpus remains at 15.3% with a model size of merely 593 KB. Compared to current state-of-the-art designs, the presented BRNN FPGA (Field-Programmable Gate Array) accelerator achieves a 3.5-to-8.2x higher throughput and a 2.4-to-16.5x higher frame rate at a batch size of one, despite the lowest phone error rate.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78431
DOI:	10.6342/NTU202000421
Fulltext Rights:	有償授權
metadata.dc.date.embargo-lift:	2025-02-24
Appears in Collections:	電子工程學研究所

Files in This Item:

File	Size	Format
ntu-109-R06943023-1.pdf Restricted Access	6.46 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets