利用向量量化壓縮卷積神經網路之實作及加速器設計

Yi-Heng Wu; 吳奕亨

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170

Title:	利用向量量化壓縮卷積神經網路之實作及加速器設計 Compressing Convolutional Neural Network by Vector Quantization : Implementation and Accelerator Design
Authors:	Yi-Heng Wu 吳奕亨
Advisor:	簡韶逸(Shao-Yi Chien)
Keyword:	卷積神經網路,向量量化,加速器, Convolutional Neural Network,Vector Quantization,Accelerator,
Publication Year :	2017
Degree:	碩士
Abstract:	近年來，卷積神經網路(Convolutional Neural Network)在許多電腦視覺相關的領域都有突破性的成果，但由於其龐大的運算量和硬體需求，主要都是在圖形處理器上運行，無法有效的在行動裝置等小型裝置上實作。而在卷積神經網路中，又以卷積層(Convolutional layer)和完全連接層(Fully-connected layer)所需要的硬體資源最多。因此在本論文中，我們針對這兩種網路層進行加速，我們的方法可分為兩個部分。我們首先利用向量量化(Vector Quantization)演算法來壓縮卷積神經網路，實作結果顯示我們可將原本龐大的卷積神經網路壓縮到原本的10\%以下，同時減少三到四倍的運算量。接著針對壓縮後的卷積神經網路，我們設計硬體來加速同時減少他所需的記憶體讀取量。其中包含利用不同的記憶體讀取計畫來減少記憶體的讀取量，以及利用引索儲存器(Index buffer)來增加運算單元的使用率。我們的方法同時優化了卷積層和完全連接層執行，並且透過設計空間探索(Design Space Exploration)以及軟體模擬，我們的設計達到比現今的卷積神經網路加速器更低的記憶體讀取量以及更快的速度。 In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170
DOI:	10.6342/NTU201702871
Fulltext Rights:	有償授權
Appears in Collections:	電子工程學研究所

Files in This Item:

File	Size	Format
ntu-106-1.pdf Restricted Access	2.31 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets