應用於深度神經網路推理之節能方法

鄭瑞軒; Rui-Xuan Zheng

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78771

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉宗德	zh_TW
dc.contributor.advisor	Tsung-Te Liu	en
dc.contributor.author	鄭瑞軒	zh_TW
dc.contributor.author	Rui-Xuan Zheng	en
dc.date.accessioned	2021-07-11T15:18:11Z	-
dc.date.available	2024-07-15	-
dc.date.copyright	2019-07-16	-
dc.date.issued	2019	-
dc.date.submitted	2002-01-01	-
dc.identifier.citation	[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Neural Information Processing Systems, vol. 25, 01 2012. [2] V. Sze, Y. Chen, T. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, pp. 2295–2329, Dec 2017. [3] Y. Chen, J. Emer, and V. Sze, “Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 367–379, June 2016. [4] J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos“Cnvlutin: Ineffectual-neuron-free deep neural network computing,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 1–13, June 2016. [5] D. Lee, S. Kang, and K. Choi, “Compend: Computation pruning through early negative detection for ReLU in a deep neural network accelerator,” pp. 139–148, 06 2018. [6] A. Krizhevsky, V. Nair, and G. Hinton, “The CIFAR-10 dataset.” https://www.cs.toronto.edu/~kriz/cifar.html. [7] Y. LeCun, C. Cortes, and C. J. Burges, “THE MNIST DATABASE of handwritten digits.” http://yann.lecun.com/exdb/mnist/. [8] “The Long Short-Term Memory (LSTM) cell can process data sequentially and keep its hidden state through time.” https://en.wikipedia.org/wiki/Long_short-term_memory#/media/File:The_LSTM_cell.png. Accessed: 2019-7-14. [9] Y. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 52, pp. 127–138, Jan 2017. [10] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv e-prints, p. arXiv:1409.1556, Sep 2014. [11] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vision, vol. 115, pp. 211–252, Dec. 2015. [12] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, Nov 1998. [13] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going Deeper with Convolutions,” arXiv e-prints, p. arXiv:1409.4842, Sep 2014. [14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv e-prints, p. arXiv:1512.03385, Dec 2015. [15] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, pp. 1735–80, 12 1997.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78771	-
dc.description.abstract	近年來，隨著深度學習的迅速發展，深度神經網路已被廣泛應用於多個領域，包括計算機視覺、自然語言處理與生醫訊號分析。現階段已有大量針對深度神經網路進行最佳化的處理器設計被提出，其中零的跳過為一經常被使用的技術, 其利用ReLU所致的稀疏性，藉由跳過輸入為零的計算來節省能量消耗。除零的跳過外，最近被提出的提前負值偵測技術能更進一步地利用ReLU的特性與所致的稀疏性。透過提前偵測輸出為負值的非必要計算，此技術可取得額外的節省量，但必須使用位元序列的架構進行計算，也因此其節省效果受限於權重的位元數。本研究則提出基於閾值且不限於位元序列架構的通用型提前負值偵測方法，以及針對閾值的最佳化流程，此流程可在精準度維持於使用者指定範圍內之條件下，最小化計算量。在軟體層面，本研究之方法在精準度變異為0.09%的情況下，可節省31.97%的計算量；相較於既有方法，使用4位元權重的節省比例可增加31.05%。而在硬體層面,本研究透過實作一40奈米CMOS深度神經網路處理器來驗證所提出的方法，其中包含零的跳過與提前負值偵測方法的硬體設計。透過本研究之方法，此處理器可節省22.8% 的能量消耗而僅使測試精準度略降0.96%，並在0.81V、250MHz的操作條件下，取得1.04TOPS/W的能量效率。	zh_TW
dc.description.abstract	Recently, deep neural networks (DNNs) have been widely used in fields including computer vision, natural language processing, and bio-signal analysis. Many processors have been proposed to improve efficiency of executing DNNs. In these works, zero skipping technique is commonly adopted to save energy by skipping zero-input operations due to ReLU. Apart from zero skipping, a newly introduced technique can further exploit ReLU-induced sparsity through early negative detection, i.e., detecting and skipping unnecessary negative-generating computations at early stage. However, the technique requires to be performed on bit-serial architecture and thus its capability is subject to bit number of weights. This work proposes a generic threshold-based approach realizing early negative detection without necessity of bit-serial scheme, together with a systematic procedure of threshold optimization that minimizes computations while keeping accuracy variation within an acceptable range via user-specified constraint. At software level, the proposed approach reduces 31.97% operations with 0.09% accuracy variation and outperforms the previous work by 31.05% computation reduction rate if using 4-bit weights. Moreover, a 40-nm CMOS reconfigurable DNN processor implemented with both conventional zero skipping technique and scheme for the proposed approach is designed for evaluation of effectiveness and overhead. The processor acquires 22.8% reduction of energy consumption solely by the proposed approach with 0.96% loss in test accuracy, and reaches an energy efficiency of 1.04TOPS/W at 0.81V, 250MHz.	en
dc.description.provenance	Made available in DSpace on 2021-07-11T15:18:11Z (GMT). No. of bitstreams: 1 ntu-108-R05943033-1.pdf: 4981904 bytes, checksum: a227e7b21f307a349a72a0965eaf758d (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	口試委員會審定書 iii 致謝 v Acknowledgements vii 摘要 ix Abstract xi 1 緒論 1 1.1 研究動機 1 1.2 研究貢獻 2 1.3 論文架構 2 2 研究背景 3 2.1 深度神經網路架構與計算 3 2.1.1 卷積神經網路的計算 4 2.2 利用稀疏性對深度神經網路處理器的優化 6 2.2.1 資料壓縮 7 2.2.2 零的跳過 7 3 提前負值偵測與主要相關研究 11 3.1 提前負值偵測 11 3.2 主要相關研究 11 3.2.1 位元序列之運算方式 12 3.2.2 反轉二補數表示法 12 3.2.3 實現提前負值偵測的硬體設計 13 3.2.4 模擬結果 15 4 本研究的提前負值偵測方法 17 4.1 輸入通道分割 18 4.2 基於閾值的負值遮蔽 19 4.2.1 間接負擔 21 4.3 閾值的最佳化 21 4.3.1 目標與基本構想 21 4.3.2 實現方式 22 4.4 本方法針對非 ReLU 之激勵函數的調整 25 4.5 流程、實驗與比較 28 4.5.1 流程 28 4.5.2 實驗一 28 4.5.3 實驗二 31 4.5.4 實驗三 33 4.5.5 比較 34 5 硬體實現 37 5.1 資料流 . 38 5.1.1 卷積層資料流 38 5.1.2 全連接層資料流 40 5.2 位元序列處理單元設計 41 5.3 零的跳過設計 42 5.4 提前負值偵測的節能設計 44 5.5 實驗結果、電路分析以及比較 45 5.5.1 實驗結果 45 5.5.2 電路分析與比較 48 6 結論 51 參考文獻 53	-
dc.language.iso	zh_TW	-
dc.subject	機器學習	zh_TW
dc.subject	提前負值偵測	zh_TW
dc.subject	深度神經網路	zh_TW
dc.subject	具能源效益加速器	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	energy-efficient accelerator	en
dc.subject	convolutional neural networks	en
dc.subject	deep neural networks	en
dc.subject	ReLU	en
dc.subject	early negative detection	en
dc.subject	machine learning	en
dc.title	應用於深度神經網路推理之節能方法	zh_TW
dc.title	An Energy-Efficient Approach for Deep-Neural-Network Inference	en
dc.type	Thesis	-
dc.date.schoolyear	107-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	闕志達;盧奕璋	zh_TW
dc.contributor.oralexamcommittee	Tzi-Dar Chiueh;Yi-Chang Lu	en
dc.subject.keyword	機器學習,提前負值偵測,深度神經網路,卷積神經網路,具能源效益加速器,	zh_TW
dc.subject.keyword	machine learning,early negative detection,ReLU,deep neural networks,convolutional neural networks,energy-efficient accelerator,	en
dc.relation.page	54	-
dc.identifier.doi	10.6342/NTU201901480	-
dc.rights.note	未授權	-
dc.date.accepted	2019-07-16	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
dc.date.embargo-lift	2024-07-16	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-2.pdf 未授權公開取用	4.87 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。