利用向量量化壓縮卷積神經網路之實作及加速器設計

Yi-Heng Wu; 吳奕亨

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Yi-Heng Wu	en
dc.contributor.author	吳奕亨	zh_TW
dc.date.accessioned	2021-06-17T01:22:07Z	-
dc.date.available	2017-08-11
dc.date.copyright	2017-08-11
dc.date.issued	2017
dc.date.submitted	2017-08-10
dc.identifier.citation	[1] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in ACM Sigplan Notices, vol. 49, no. 4. ACM, 2014, pp. 269–284.   [2] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient re- configurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.   [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing sys- tems, 2012, pp. 1097–1105.   [4] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in Neural Information Processing Systems, 2015, pp. 1135–1143.   [5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.   [6] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 2016, pp. 243–254.   [7] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng, “Quantized convolutional neural networks for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820–4828.   [8] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.   [9] R.Girshick,J.Donahue,T.Darrell,andJ.Malik,“Richfeaturehierarchiesforaccu- rate object detection and semantic segmentation,” in Proceedings of the IEEE con- ference on computer vision and pattern recognition, 2014, pp. 580–587.   [10] J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.   [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.   [12] K.He,X.Zhang,S.Ren,andJ.Sun,“Deepresiduallearningforimagerecognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.   [13] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter, “Self-normalizing neural networks,” 2017.   [14] Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, “Shidiannao: Shifting vision processing closer to the sensor,” in ACM SIGARCH Computer Architecture News, vol. 43, no. 3. ACM, 2015, pp. 92–104.   [15] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., “Dadiannao: A machine-learning supercomputer,” in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Com- puter Society, 2014, pp. 609–622.   [16] B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain surgeon,” in Advances in neural information processing systems, 1993, pp. 164–171.   [17] S.SrinivasandR.V.Babu,“Data-freeparameterpruningfordeepneuralnetworks,” arXiv preprint arXiv:1507.06149, 2015.   [18] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.   [19] Y. Sun, X. Wang, and X. Tang, “Sparsifying neural network connections for face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pat- tern Recognition, 2016, pp. 4856–4864.   [20] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen, “Cambricon-x: An accelerator for sparse neural networks,” in Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 2016, pp. 1–12.   [21] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, “Cnvlutin: ineffectual-neuron-free deep neural network computing,” in Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016, pp. 1–13.   [22] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep learning with limited numerical precision,” in Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, pp. 1737–1746.   [23] S.Anwar,K.Hwang,andW.Sung,“Fixedpointoptimizationofdeepconvolutional neural networks for object recognition,” in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 1131–1135.   [24] S.-Y.C.Bin-SyhYu,“Architecturedesignofconvolutionalneuralnetworksforface detection on fpga platforms,” in Master Thesis. National Taiwan University, 2016.   [25] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet clas- sification using binary convolutional neural networks,” in European Conference on Computer Vision. Springer, 2016, pp. 525–542.   [26] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting lin- ear structure within convolutional networks for efficient evaluation,” in Advances in Neural Information Processing Systems, 2014, pp. 1269–1277.   [27] W. Chen, J. Wilson, S. Tyree, K. Weinberger, and Y. Chen, “Compressing neural networks with the hashing trick,” in International Conference on Machine Learning, 2015, pp. 2285–2294.   [28] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.   [29] B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández- Lobato, G.-Y. Wei, and D. Brooks, “Minerva: Enabling low-power, highly-accurate deep neural network accelerators,” in Proceedings of the 43rd International Sympo- sium on Computer Architecture. IEEE Press, 2016, pp. 267–278.   [30] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “Neu- flow: A runtime reconfigurable dataflow processor for vision,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Con- ference on. IEEE, 2011, pp. 109–116.   [31] D.Liu,T.Chen,S.Liu,J.Zhou,S.Zhou,O.Teman,X.Feng,X.Zhou,andY.Chen, “Pudiannao: A polyvalent machine learning accelerator,” in ACM SIGARCH Com- puter Architecture News, vol. 43, no. 1. ACM, 2015, pp. 369–381.   [32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.   [33] N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers et al., “In-datacenter performance analysis of a tensor processing unit,” arXiv preprint arXiv:1704.04760, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/67170	-
dc.description.abstract	近年來，卷積神經網路(Convolutional Neural Network)在許多電腦視覺相關的領域都有突破性的成果，但由於其龐大的運算量和硬體需求，主要都是在圖形處理器上運行，無法有效的在行動裝置等小型裝置上實作。而在卷積神經網路中，又以卷積層(Convolutional layer)和完全連接層(Fully-connected layer)所需要的硬體資源最多。因此在本論文中，我們針對這兩種網路層進行加速，我們的方法可分為兩個部分。我們首先利用向量量化(Vector Quantization)演算法來壓縮卷積神經網路，實作結果顯示我們可將原本龐大的卷積神經網路壓縮到原本的10\%以下，同時減少三到四倍的運算量。接著針對壓縮後的卷積神經網路，我們設計硬體來加速同時減少他所需的記憶體讀取量。其中包含利用不同的記憶體讀取計畫來減少記憶體的讀取量，以及利用引索儲存器(Index buffer)來增加運算單元的使用率。我們的方法同時優化了卷積層和完全連接層執行，並且透過設計空間探索(Design Space Exploration)以及軟體模擬，我們的設計達到比現今的卷積神經網路加速器更低的記憶體讀取量以及更快的速度。	zh_TW
dc.description.abstract	In recent years, deep convolutional neural networks~(CNNs) achieve ground-breaking success in many computer vision research fields. Due to the large model size and tremendous computation of CNNs, they cannot be efficiently executed in small devices like mobile phones. Although several hardware accelerator architectures have been developed, most of them can only efficient address one of the two major layers in CNN, convolutional~(CONV) and fully connected~(FC) layers. In this thesis, based on algorithm-architecture-co-exploration, our architecture targets at executing both layers with high efficiency. Vector quantization technique is first selected to compress the parameters, reduce the computation, and unify the behaviors of both CONV and FC layers. To fully exploit the gain of vector quantization, we then propose an accelerator architecture for quantized CNN. Different DRAM access schemes are employed to reduce DRAM access, subsequently reduce power consumption. We also design a high-throughput processing element architecture to accelerate quantized layers. Compare to state-of-the-art accelerators for CNN, the proposed architecture achieves 1.2--5x less DRAM access and 1.5--5x higher throughput for both CONV and FC layers.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:22:07Z (GMT). No. of bitstreams: 1 ntu-106-R04943005-1.pdf: 2362631 bytes, checksum: bb9851df68b013c62cbdbb3cd69793ab (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	摘要(iii) Abstract(v) 1. Introduction(1) 1.1 Convolutional Neural Network(1) 1.2 Motivation(1)   1.3 Hardware Design Challenge(2)   1.4 Contribution(3)   1.5 Thesis Organization(4)   2. Background and Related Works(5) 2.1 Layers(5) 2.2 Neural Network Model Refinement(7) 2.2.1 Network Pruning(7)   2.2.2 Quantization(7)   2.2.3 Training from scratch(8)   2.2.4 Summary(10)   2.3 Neural Network accelerators (10) 2.3.1 Diannao(ASPLOS’14)[1] (10) 2.3.2 Eyeriss(ISCA’16)[2] (12) 2.3.3 Accelerators for compressed network (15) 3. Vector Quantization(17) 3.1 Introduction(17) 3.2 Comparison(17) 3.3 Vector Quantization(18) 3.4 Testing on Vector Quanitzed Layer(20) 3.5 Error Correction(21) 3.6 Experiment(24) 4. Architecture design(27)   4.1 Introduction(27) 4.2 Architecture Overview(28) 4.3 DRAM Access Scheme(28) 4.4 Processing Element (PE) (32)   5. Design space exploration (DSE) (35)   5.1 Specification(35) 5.2 DRAM Access Analysis(35) 5.3 Computation Time Analysis(38)   6. Conclusion(41) Bibliography(43)
dc.language.iso	en
dc.title	利用向量量化壓縮卷積神經網路之實作及加速器設計	zh_TW
dc.title	Compressing Convolutional Neural Network by Vector Quantization : Implementation and Accelerator Design	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	楊家驤(Chia-Hsiang Yang),盧奕璋(Yi-Chang Lu),黃朝宗(Chao-Tsung Huang)
dc.subject.keyword	卷積神經網路,向量量化,加速器,	zh_TW
dc.subject.keyword	Convolutional Neural Network,Vector Quantization,Accelerator,	en
dc.relation.page	47
dc.identifier.doi	10.6342/NTU201702871
dc.rights.note	有償授權
dc.date.accepted	2017-08-10
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	2.31 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。