深度神經網路於現場可程式化邏輯閘陣列之高效實作與轉換方法

Hao-Yuan Kuo; 郭皓元

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68796

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	江介宏(Jie-Hong Jiang)
dc.contributor.author	Hao-Yuan Kuo	en
dc.contributor.author	郭皓元	zh_TW
dc.date.accessioned	2021-06-17T02:35:48Z	-
dc.date.available	2018-08-24
dc.date.copyright	2017-08-24
dc.date.issued	2017
dc.date.submitted	2017-08-17
dc.identifier.citation	[1] Cristanel Razafimandimby, Valeria Loscri, and Anna Maria Vegni. A neural network and iot based scheme for performance assessment in internet of robotic things. In Internet-of-Things Design and Implementation (IoTDI), 2016 IEEE First International Conference on, pages 241–246. IEEE, 2016. [2] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneer-shelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016. [3] Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. Deepdriving:Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 2722–2730,2015. [4] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter,Helen M Blau, and Sebastian Thrun. Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639):115–118, 2017. [5] Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. Finn: A framework for fast, scalable binarized neural network inference. arXiv preprint arXiv:1612.07119, 2016. [6] Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. Int’l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017. [7] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016. [8] Jaeyong Chung, Taehwan Shin, and Yongshin Kang. Insight: A neuromorphic computing system for evaluation of large neural networks. arXiv preprint arXiv:1508.01008, 2015. [9] Steven K Esser, Paul A Merolla, John V Arthur, Andrew S Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J Berg, Jeffrey L McKinstry, Timothy Melano, Davis R Barch, et al. Convolutional networks for fast,energy-efficient neuromorphic computing. Proceedings of the National Academy of Sciences, page 201604850, 2016. [10] Paul A Merolla, John V Arthur, Rodrigo Alvarez-Icaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, et al. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science, 345(6197):668–673, 2014. [11] Ao Ren, Ji Li, Zhe Li, Caiwen Ding, Xuehai Qian, Qinru Qiu, Bo Yuan, and Yanzhi Wang. Sc-dcnn: highly-scalable deep convolutional neural network using stochastic computing. arXiv preprint arXiv:1611.05939, 2016. [12] Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers, pages 262–263, 2016. [13] Ian J Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. Maxout networks. arXiv preprint arXiv:1302.4389, 2013. [14] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024, 2014. [15] Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998. [16] François Chollet et al. Keras. https://github.com/fchollet/keras, 2015. [17] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xi-aoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68796	-
dc.description.abstract	隨著機器學習應用的蓬勃發展，深度神經網路的硬體實作是一個低功耗，高效能，且低面積的解決方案。實作上，為了減少高成本算數電路的使用，將神經網路作二元化的處理是一個很重要的步驟。現有的神經網路二元化方法，大致上需要依靠特殊的網路訓練方式或者利用隨機計算的技巧。給定網路拓樸架構的情況下，前者需要大量的實作經驗來找出適當的重新訓練步驟；而後者則是在不仰賴重新訓練網路的情形下，找出一個一般化且系統化的計算方法。在這篇論文中，透過上述的重新訓練以及隨機計算的方法，我們會探討如何將深度神經網路轉換並實作在現場可程式化邏輯閘陣列上。作為個案討論，我們將兩種訓練於MNIST手寫數字辨識資料集的深度神經網路實作在Xilinx Artix-7的現場可程式化邏輯閘陣列上，並且比較各實作方式之優劣。	zh_TW
dc.description.abstract	Deploying deep neural networks (DNNs) in hardware is a common tactic to achieve energy, performance, and area efficiency for widespread applications. Binarizing neural networks is a key step to eliminate costly arithmetic computation from circuit implementation.Available neuron binarization methods are based on either special training or stochastic computation.The former requires expertise and experience to find a workable training procedure for a given network architecture;the latter decouples the training process and is generally and systematically applicable to a well-trained neural network. In this thesis, we study how to map DNNs for efficient FPGA realization through retraining parameters and stochastic computation. As a case study, a maxout DNN and ReLU DNN for MNIST dataset classification is trained and mapped onto Xilinx Artix-7 FPGA for efficient implementation.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:35:48Z (GMT). No. of bitstreams: 1 ntu-106-R04943084-1.pdf: 1382219 bytes, checksum: 62f8e18ba4b75bf714f23a26f5eb7d11 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	Acknowledgements ii 摘要 iii Abstract iv List of Figures x List of Tables xii 1 Introduction 1 1.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Our contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Preliminary 4 2.1 Neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Activation function: Maxout . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Activation function: ReLU . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Neuron and neural unit . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Problem Formulation 8 4 System Architecture 10 4.1 Design topview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.2 Computational neural unit . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Network controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.4 On-chip memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.5 Flash memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5 Implementing using Fixed-Point Multipliers 14 5.1 Parameter definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.2 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5.3 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6 Implementing using Bit-Shifting Multipliers 18 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.2 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 6.3 Weights adjustment to negative power of 2 . . . . . . . . . . . . . . 20 6.4 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 7 Implementing using Stochastic Multipliers 24 7.1 Stochastic computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.2 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 7.3 Bit-stream sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.4 Multiplexer as a weighted summer . . . . . . . . . . . . . . . . . . . . 28 7.5 Weight adjustment to be positive . . . . . . . . . . . . . . . . . . . . 30 7.6 Scaling factors and bias . . . . . . . . . . . . . . . . . . . . . . . . . . 31 7.7 Control signal generation . . . . . . . . . . . . . . . . . . . . . . . . . 32 7.7.1 RNG based control signals . . . . . . . . . . . . . . . . . . . . 33 7.7.2 FSM based control signals . . . . . . . . . . . . . . . . . . . . 35 7.8 Parallel computation for throughput improvement . . . . . . . . . . . 37 7.8.1 RNG based parallelization . . . . . . . . . . . . . . . . . . . . 38 7.8.2 FSM based parallelization . . . . . . . . . . . . . . . . . . . . 39 8 Experimental Results 41 8.1 The MNIST dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.2 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8.2.1 Training environment . . . . . . . . . . . . . . . . . . . . . . . 42 8.2.2 Deploying environment . . . . . . . . . . . . . . . . . . . . . . 43 8.3 Network and circuit structure . . . . . . . . . . . . . . . . . . . . . . 43 8.3.1 Maxout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 8.3.2 ReLU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 9 Conclusions 53 Bibliography 54 Appendices 57 A System controlling states . . . . . . . . . . . . . . . . . . . . . . . . . 58 B Output range adjustment for neural units . . . . . . . . . . . . . . . . 59 B.1 Input value ∈ [−m,M], , output value ∈ [−m,M] . . . . . . 60 B.2 Input value ∈ [0,1], output value ∈ [−m,M] . . . . . . . . . 60
dc.language.iso	en
dc.title	深度神經網路於現場可程式化邏輯閘陣列之高效實作與轉換方法	zh_TW
dc.title	Mapping Deep Neural Network for Efficient FPGA Implementation	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄭福炯(Fu-Chiung Cheng),楊家驤(Chia-Hsiang Yang),洪士灝(Shih-Hao Hung)
dc.subject.keyword	機器學習,神經網路,乘法器,隨機計算,現場可程式化邏輯閘陣列,	zh_TW
dc.subject.keyword	machine learning,neural network,stochastic computing,multipliers,FPGA,	en
dc.relation.page	61
dc.identifier.doi	10.6342/NTU201702277
dc.rights.note	有償授權
dc.date.accepted	2017-08-17
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	1.35 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。