基於向量量化之節能神經網路加速器架構設計

Yi-Min Chih; 池翊忞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72627

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Yi-Min Chih	en
dc.contributor.author	池翊忞	zh_TW
dc.date.accessioned	2021-06-17T07:02:14Z	-
dc.date.available	2021-01-20
dc.date.copyright	2021-01-20
dc.date.issued	2020
dc.date.submitted	2021-01-11
dc.identifier.citation	H. Lee, Y.-H. Wu, Y.-S. Lin, and S.-Y. Chien, “Convolutional neural networkaccelerator with vector quantization,” inProceedings of 2019 IEEE Interna-tional Symposium on Circuits and Systems (ISCAS), 2019, pp. 1–5. J. Cheng, J. Wu, C. Leng, Y. Wang, and Q. Hu, “Quantized CNN: A uni-fied approach to accelerate and compress convolutional networks,”IEEETransactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp.4730–4743, 2018. R. Krishnamoorthi, “Quantizing deep convolutional networks for efficientinference: A whitepaper,”ArXiv:1806.08342, 2018. S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deepneural networks with pruning, trained quantization and huffman coding,”ArXiv:1510.00149, 2015. K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,”ArXiv:1409.1556, 2014. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification withdeep convolutional neural networks,”Communications of the ACM, vol. 60,no. 6, pp. 84–90, 2017. Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficientreconfigurable accelerator for deep convolutional neural networks,”IEEEJournal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017. Y. Bengio, N. L ́eonard, and A. Courville, “Estimating or propagat-ing gradients through stochastic neurons for conditional computation,”ArXiv:1308.3432, 2013. J. Fang, A. Shafiee, H. Abdel-Aziz, D. Thorsley, G. Georgiadis, and J. H. Has-soun, “Post-training piecewise linear quantization for deep neural networks,”inProceedings of European Conference on Computer Vision.Springer,2020, pp. 69–86. J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H. Yoo, “UNPU: A 50.6tops/wunified deep neural network accelerator with 1b-to-16b fully-variable weightbit-precision,” inProceedings of 2018 IEEE International Solid - State Cir-cuits Conference - (ISSCC), 2018, pp. 218–220. C. Lin, C. Cheng, Y. Tsai, S. Hung, Y. Kuo, P. H. Wang, P. Tsung, J. Hsu,W. Lai, C. Liu, S. Wang, C. Kuo, C. Chang, M. Lee, T. Lin, and C. Chen, “7.1a 3.4-to-13.3tops/w 3.6tops dual-core deep-learning accelerator for versatileai applications in 7nm 5g smartphone soc,” inProceedings of 2020 IEEEInternational Solid- State Circuits Conference - (ISSCC), 2020, pp. 134–136. Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A fast and extensible dramsimulator,”IEEE Computer Architecture Letters, vol. 15, no. 1, pp. 45–49,2016. K. Wang, Z. Liu, Y. Lin, J. Lin, and S. Han, “HAQ: Hardware-aware auto-mated quantization with mixed precision,” inProceedings of 2019 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR), 2019, pp.8604–8612. H. J ́egou, M. Douze, and C. Schmid, “Product quantization for nearest neigh-bor search,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 33, no. 1, pp. 117–128, 2011. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, andD. Kalenichenko, “Quantization and training of neural networks for efficientinteger-arithmetic-only inference,” inProceedings of 2018 IEEE/CVF Con-ference on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713. J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei, “Imagenet:A large-scale hierarchical image database,” inProceedings of 2009 IEEEConference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. Q. Shang, Y. Fan, W. Shen, S. Shen, and X. Zeng, “Single-port sram-basedtranspose memory with diagonal data mapping for large size 2-d dct/idct,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22,no. 11, pp. 2423–2427, 2014. Y. Chen, T. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible acceleratorfor emerging deep neural networks on mobile devices,”IEEE Journal onEmerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp.292–308, 2019. J. Jo, S. Kim, and I. Park, “Energy-efficient convolution architecture based onrescheduled dataflow,”IEEE Transactions on Circuits and Systems I: RegularPapers, vol. 65, no. 12, pp. 4196–4207, 2018. K. Chang and T. Chang, “VWA: Hardware efficient vectorwise accelerator forconvolutional neural network,”IEEE Transactions on Circuits and Systems I:Regular Papers, vol. 67, no. 1, pp. 145–154, 2020.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72627	-
dc.description.abstract	線性量化(linear quantization)是在神經網絡推論系統(inference systems of neural networks)實現中常用的壓縮模型技術，廣泛用於VLSI硬件體系結構設計中。其中，可以使用量化训练(quantization-aware training)透過端到端微調訓練改善量化模型後的正確率下降問題。另一方面，較少被提及的向量量化(vector quantization)雖然有快速推論的方法，但是缺乏了端到端的微調訓練技術，導致正確率下降。在這本篇論文中，我們提出一個向量量化訓練(vector-quantization-aware training)技術，可以使用任何向量量化參數對模型端到端的微調訓練。另外，我們也結合了無優化線性量化進一步對模型進行壓縮，也代表者其他改進的線性量化方法也可以改進我們的方法以達到更好的結果。此外，我們設計了一個基於向量量化的高效能低延遲硬體架構，並且可以支援卷積層(convolution layer)、深度卷積層(depthwise convolution layer)及全連接層(fully connected layer)。	zh_TW
dc.description.abstract	For the implementation of inference systems of neural networks, linear quantization (LQ) is a common technique to compress the model and is widely employed on VLSI hardware architecture design. The quantization-aware training (QAT) also improves the degradation of the quantized model with end-to-end finetuning. On the other hand, vector quantization (VQ), which is a multi-dimensional non-linear quantization method, is rarely mentioned in literatures. The efficient inference scheme with VQ has been proposed, but the accuracy drop is significant because of the lack of end-to-end finetuning technique for VQ. In this thesis, we propose the vector quantization-aware training (VQAT) technique that can end-to-end finetune the model with any VQ parameters. In addition, we combine the vanilla LQ to compress the model further, meaning that any other improved LQ method can also be combined with the proposed VQAT+LQ scheme to achieve a better result. We also design a hardware architecture that can efficiently perform vector quantized neural networks with high performance and low latency on convolution (CONV), depthwise (DW), and fully connected (FC) layers. Experimental results show that the VQAT+LQ scheme can compress the model by 1.16x to 1.18x compared with Quantized CNN (QCNN) but still have a lower accuracy drop than QCNN. Also, in our proposed hardware architecture, the VQAT+LQ further reduces the DRAM access by 1.1x to 1.3x compared with the VQAT.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T07:02:14Z (GMT). No. of bitstreams: 1 U0001-1101202116382500.pdf: 6154458 bytes, checksum: b0b32720c0d8c60d08738ed95a011f87 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Abstract i List of Figures vii List of Tables ix 1 Introduction 1 1.1 Model compression 1 1.2 DNN Accelerator 2 1.3 Challenge 3 1.4 Contribution 3 1.5 Thesis Organization 3 2 Related Work 5 2.1 Quantized-CNN 5 2.2 CNN Accelerator with Vector Quantization 6 3 Vector Quantization Aware Training 9 3.1 Vector Quantization 9 3.1.1 Scalar Quantization 9 3.1.2 Product Quantization 10 3.2 Linear Quantization 11 3.2.1 Asymmetric Linear Quantization 12 3.2.2 Symmetric Linear Quantization 13 3.2.3 Quantization Aware Training 13 3.3 Vector quantization aware training 15 3.4 Joint Linear Quantization and Vector Quantization 17 3.5 Depthwise Convolution Issue 17 3.6 Experiment 18 3.6.1 Compared with QCNN 18 3.6.2 Compared with LQ and HAQ 19 4 Hardware Architecture214.1Architecture overview 21 4.2 Tile base Dataflow 22 4.3 Processing Element 25 4.4 Dispatcher 26 4.4.1 Adder Tree 26 4.4.2 Psum SRAM layout 26 4.4.3 Reduce Tile Size 29 4.4.4 Reload psum from DRAM 30 4.4.5 Stride 30 4.4.6 Fully Connected Layer 30 4.4.7 Virtual Psum Row Optimization 31 4.5 Initializer and Allocator 31 4.6 Interconnected Network 32 4.7 Dual Core 34 4.8 Determine hardware parameter 34 5 Hardware implementation Results 37 5.1Experiment Setup 37 5.2 Implementation Result 37 5.3 Area and Power Breakdown 38 5.4 DRAM access 40 6 Conclusion 43 Reference 45
dc.language.iso	en
dc.title	基於向量量化之節能神經網路加速器架構設計	zh_TW
dc.title	Energy-Efficient Neural Network Accelerator with Vector Quantization	en
dc.type	Thesis
dc.date.schoolyear	109-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃朝宗(Chao-Tsung Huang),劉宗德(Tsung-Te Liu),莊永裕(Yung-Yu Chuang),楊家驤(Chia-Hsiang Yang)
dc.subject.keyword	向量量化,線性量化,神經網路,加速器,硬體架構,	zh_TW
dc.subject.keyword	vector quantization,linear quantization,neural network,accelerator,hardware architecture,	en
dc.relation.page	47
dc.identifier.doi	10.6342/NTU202100044
dc.rights.note	有償授權
dc.date.accepted	2021-01-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-1101202116382500.pdf 目前未授權公開取用	6.01 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。