利用向量量化壓縮卷積神經網路之實作及加速器設計

Yi-Heng Wu; 吳奕亨

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170

標題:	利用向量量化壓縮卷積神經網路之實作及加速器設計 Compressing Convolutional Neural Network by Vector Quantization : Implementation and Accelerator Design
作者:	Yi-Heng Wu 吳奕亨
指導教授:	簡韶逸(Shao-Yi Chien)
關鍵字:	卷積神經網路,向量量化,加速器, Convolutional Neural Network,Vector Quantization,Accelerator,
出版年 :	2017
學位:	碩士
摘要:	近年來，卷積神經網路(Convolutional Neural Network)在許多電腦視覺相關的領域都有突破性的成果，但由於其龐大的運算量和硬體需求，主要都是在圖形處理器上運行，無法有效的在行動裝置等小型裝置上實作。而在卷積神經網路中，又以卷積層(Convolutional layer)和完全連接層(Fully-connected layer)所需要的硬體資源最多。因此在本論文中，我們針對這兩種網路層進行加速，我們的方法可分為兩個部分。我們首先利用向量量化(Vector Quantization)演算法來壓縮卷積神經網路，實作結果顯示我們可將原本龐大的卷積神經網路壓縮到原本的10\%以下，同時減少三到四倍的運算量。接著針對壓縮後的卷積神經網路，我們設計硬體來加速同時減少他所需的記憶體讀取量。其中包含利用不同的記憶體讀取計畫來減少記憶體的讀取量，以及利用引索儲存器(Index buffer)來增加運算單元的使用率。我們的方法同時優化了卷積層和完全連接層執行，並且透過設計空間探索(Design Space Exploration)以及軟體模擬，我們的設計達到比現今的卷積神經網路加速器更低的記憶體讀取量以及更快的速度。 In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170
DOI:	10.6342/NTU201702871
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	2.31 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。