請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74577
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 簡韶逸(Shao-Yi Chien) | |
dc.contributor.author | En-Ho Shen | en |
dc.contributor.author | 沈恩禾 | zh_TW |
dc.date.accessioned | 2021-06-17T08:43:42Z | - |
dc.date.available | 2019-08-18 | |
dc.date.copyright | 2019-08-18 | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019-08-07 | |
dc.identifier.citation | [1] E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “Lognet: Energy-efficient neural networks using logarithmic computation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 5900–5904. v, 6
[2] S.Han,H.Mao,andW.Dally,“Deepcompression: Compressingdeepneural networks with pruning, trained quantization and huffman coding,” 10 2016. v, 2, 6 [3] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks,” arXiv e-prints, p. arXiv:1603.05279, Mar 2016. v, 3, 6, 8, 18, 37, 44 [4] S. Migacz, “8-bit inference with tensorrt.” [Online]. Available: http://on-demand.gputechconf.com/gtc/2017/presentation/ s7310-8-bit-inference-with-tensorrt.pdf v, 8, 9, 44 [5] Y. Chen, T. Krishna, J. Emer, and V. Sze, “14.5 eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” in 2016 IEEE International Solid-State Circuits Conference (ISSCC), Jan 2016, pp. 262–263. [Online]. Available: http://people.csail.mit.edu/emer/slides/2016. 02.isscc.eyeriss.slides.pdf v, 10 [6] B. Moons and M. Verhelst, “A 0.3-2.6 TOPS/W precision-scalable processor for real-time large-scale convnets,” CoRR, vol. abs/1606.05094, 2016. [Online]. Available: http://arxiv.org/abs/1606.05094 v, 11, 26, 45 [7] B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst, “14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequencyscalable convolutional neural network processor in 28nm fdsoi,” in 2017 IEEE International Solid-State Circuits Conference (ISSCC), Feb 2017, pp. 246–247. v, 11, 26, 45 [8] H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. K. Kim, V. Chandra, and H. Esmaeilzadeh, “Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks,” CoRR, vol. abs/1712.01507, 2017. [Online]. Available: http://arxiv.org/abs/1712.01507 v, 12 [9] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv e-prints, p. arXiv:1512.03385, Dec 2015. 1 [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 1, 3 [11] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE:EfficientInferenceEngineonCompressedDeepNeuralNetwork,”arXiv e-prints, p. arXiv:1602.01528, Feb 2016. 1, 2, 25 [12] Y. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energyefficient dataflow for convolutional neural networks,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), June 2016, pp. 367–379. 2, 10, 12, 25, 26, 45 [13] J. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural network compression,” CoRR, vol. abs/1707.06342, 2017. [Online]. Available: http://arxiv.org/abs/1707.06342 2 [14] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017. [Online]. Available: http://arxiv.org/abs/1704.04861 2 [15] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” CoRR, vol. abs/1707.01083, 2017. [Online]. Available: http://arxiv.org/abs/1707.01083 2 [16] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep convolutional neural networks for object recognition,” 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1131– 1135, 2015. 3, 6 [17] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li, “Imagenet large scale visual recognition challenge,” CoRR, vol. abs/1409.0575, 2014. [Online]. Available: http://arxiv.org/abs/1409.0575 3 [18] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/ 6 [19] F. Li and B. Liu, “Ternary weight networks,” CoRR, vol. abs/1605.04711, 2016. [Online]. Available: http://arxiv.org/abs/1605.04711 6 [20] M. Courbariaux and Y. Bengio, “Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1,” CoRR, vol. abs/1602.02830, 2016. [Online]. Available: http://arxiv.org/abs/1602.02830 6 [21] A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/∼kriz/ cifar.html 6 [22] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167, 2015. [Online]. Available: http://arxiv.org/abs/1502.03167 19 [23] johnjohnlin, “Nicotb, a python-verilog co-simulation framework.” [Online]. Available: https://github.com/johnjohnlin/nicotb 44 | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74577 | - |
dc.description.abstract | 近年來神經網路(DNN)在各項人工智慧應用獲得廣大成功與進步。但是通常這樣的模型需要笨重且耗電的通用顯示卡(GPGPU)幫助運算,不適合在行動裝置等使用電池的設備使用。在這篇論文,我們提出一個超大型積體電路設計,專門運算經過量化(quantization)的低運算精確度捲積神經網路(CNN),大大減少跨系統資料傳輸造成的能量消耗,特別適合行動裝置上的神經網路加速。我們首先提出一個簡單且有效的神經網路量化演算法,一個有著高度資料重複利用率且適合這樣經過量化神經網路的資料流動策略。為了發揮量化資料的最大潛力,我們設計了一個專門運算低運算精確度資料的乘法加法樹結構,接著提出了一個晶片內緩存記憶體結構與資料重新排列方法,用以減少任何不必要的資料存取浪費和記憶體分塊衝突(bank-conflict),最後是一組接受從緩存記憶體廣播(broadcast)到各個運算單位的資料,並且將完成的結果依序傳回全域緩存記憶體(global buffer)的核心運算單元陣列。我們提出的架構能支援絕大部分的捲積神經網路構造,並且能夠重置(reconfigure)適當的運算資料精確度,適應各種量化的神經網路結構。最後的設計使用了 180KB 的晶片內記憶體,和 1340K 的邏輯閘 | zh_TW |
dc.description.abstract | Deep neural networks (DNNs) shows promising results on various AI application tasks. However such networks typically are executed on general purpose GPUs with bulky size in form factor and hundreds of watt in power consumption, which unsuitable for mobile applications. In this thesis, we present a VLSI architecture able to process on quantized low numeric-precision convolution neural networks (CNNs), cutting down on power consumption from memory access and speeding the model up with limited area budget,particularlyfitformobiledevices.We first propose a quantization re-trainig algorithm for trainig low-precision CNN, then a dataflow with high data reuse rate with a specially data multiplication accumulation strategy specially designed for such quantized model. To fully utilize the efficiency of computation with such low-precision data, we design a micro-architecture for low bit-length multiplication and accumulation, then a on-chip memory hierarchy and data re-alignment flow for power saving and avoiding buffer bank-conflicts, and a PE array designed for taking broadcast-ed data from buffer and sending out finished data sequentially back to buffer for such dataflow. The architecture is highly flexible for various CNN shaped and re-configurable for low bit-length quantized models. The design synthesised with a 180KB on-chip memory capacity and a 1340k logic gate counts area, the implementation resultshows state-of-the-art hardware efficiency. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T08:43:42Z (GMT). No. of bitstreams: 1 ntu-108-R05943011-1.pdf: 17371880 bytes, checksum: 17625a0b89ce521d1fe05e93ed8fbcf0 (MD5) Previous issue date: 2019 | en |
dc.description.tableofcontents | Abstract i
List of Figures v List of Tables ix 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work 5 2.1 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 fixed point quantisation . . . . . . . . . . . . . . . . . . . 6 2.1.2 ternary to binary quantisation . . . . . . . . . . . . . . . 6 2.1.3 8-bit quantization on modern models . . . . . . . . . . . 8 2.2 Hardware design . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Dataflow optimization: row stationary . . . . . . . . . . . 10 2.2.2 Precision reconfigurable and sub-word parallelism arithmetic unit . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Bit-level re-configurable arithmetic unit . . . . . . . . . . 12 3 Low numeric precision convolution neural network 15 3.1 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . 15 3.2 Low Precision CNN . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Quantization Loss Minimization Threshold Selection . . . . . . . 19 3.4 Computational consideration and data re-packing . . . . . . . . . 21 3.4.1 Data re-packing . . . . . . . . . . . . . . . . . . . . . . . 21 4 ProposedArchitecture 25 4.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 Output row stationary dataflow . . . . . . . . . . . . . . . 27 4.1.2 Data tiling . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.3 Data re-alignment and buffer hierarchy . . . . . . . . . . 30 4.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.1 PE processing pipeline . . . . . . . . . . . . . . . . . . . 32 4.2.2 Sub-word accumulation operation and re-configurable arithmetic logic unit . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.3 Shift dispatcher . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.4 Quantization . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Results 43 5.1 Quantization error minimization training . . . . . . . . . . . . . . 43 5.2 Implementation results . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.1 Area and power . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . 46 6 Conclusion 53 Reference 55 | |
dc.language.iso | zh-TW | |
dc.title | 低數值精確度捲積神經網路加速器之可重置化超大型積體電路設計 | zh_TW |
dc.title | Reconfigurable Low Arithmetic Precision Convolution Neural Network Accelerator VLSI Design and Implementation | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 蔡宗漢,吳安宇,楊家驤 | |
dc.subject.keyword | 低數值精確度,捲積神經網路,加速器,可重置化,超大型積體電路設計, | zh_TW |
dc.subject.keyword | Reconfigurable,Low Arithmetic Precision,Convolution Neural Network,Accelerator,VLSI,CNN,Quantization, | en |
dc.relation.page | 58 | |
dc.identifier.doi | 10.6342/NTU201902618 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2019-08-07 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-108-1.pdf 目前未授權公開取用 | 16.96 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。