可重組的卷積神經網絡加速器設計

Yi-Jou Lee; 李依柔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68991

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲
dc.contributor.author	Yi-Jou Lee	en
dc.contributor.author	李依柔	zh_TW
dc.date.accessioned	2021-06-17T02:45:56Z	-
dc.date.available	2019-08-25
dc.date.copyright	2017-08-25
dc.date.issued	2017
dc.date.submitted	2017-08-15
dc.identifier.citation	[1] M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-layer cnn accelerators. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1–12. IEEE, 2016. [2] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, volume 49, pages 269–284. ACM, 2014. [3] Z. Du, A. Lingamneni, Y. Chen, K. Palem, O. Temam, and C. Wu. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. In Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific, pages 201–206. IEEE, 2014. [4] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pages 109–116. IEEE, 2011. [5] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture, pages 243–254. IEEE Press, 2016. [6] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135–1143, 2015. [7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [8] Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 45–54. ACM, 2017. [9] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [10] O. Temam. A defect-tolerant accelerator for emerging high-performance applications. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on, pages 356–367. IEEE, 2012. [11] S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. cacti 5.3 rev 174. HP Labs,[Online]. Available: http://quid. hpl. hp. com, 9081, 2014. [12] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. Cambricon-x: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1–12. IEEE, 2016.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68991	-
dc.description.abstract	隨著卷積神經網絡的參數量逐漸增大，卷積神經網絡加速器的性能和能量效率成為一個重要問題。從之前的設計中，我們可以發現，因為資料量的龐大，DRAM存取佔能耗的很大一部分。觀察卷積層的運算行為，可以發現計算中有許多參數可以被共用，但是這些參數可能因為加速器上的儲存空間有限會被重複從DRAM讀取。所以我們希望可以通過加速器中的儲存空間來重複使用參數，以減少對計算中對DRAM的讀取。參數的重複使用可以分為三種，一種是以重複使用輸入的參數、一種是重複使用濾波器，另一種則是重複使用中間產物的參數。卷積神經網絡模型中的每個層都可以根據其輸入，輸出和濾波器的大小來支持不同的數據重用策略。但現有的卷積神經網絡加速器只關注通過卷積神經網絡處理的一種數據重複使用。為了在卷積神經網絡處理中為每一層使用不同的數據復用策略具有靈活性，我們想提出一種可重新配置的卷積神經網絡的加速器設計，可以彈性的配置利用不同類型的數據重用，來最小化DRAM存取的資料量。通過將卷積神經網絡處理分為不同輸入和濾波器卷積單位的計算單元，我們可以通過在加速器中排列這些計算單元的計算順序來使用不同的數據重用。而加速器的將會根據事先分析模組產生的策略所生成的指令來執行。我們的結果展示了卷積神經網絡加速器設計中使用重組後可以使得DRAM存取的量減少，比較在使用不同資料重複使用的策略下，執行時間的差異。並且我們也實驗了不同硬體限制的情況下，對於執行結果所造成的分析結果。	zh_TW
dc.description.abstract	With the large size of the convolutional neural network (CNN), performance and energy efficiency of CNN accelerator become an important problem. From previous works, we can find that DRAM accesses took a large part in energy consumption. To reduce DRAM accesses, we observe the computation behavior of convolutional layer, and many parameters are shared between computation. Those data may be loaded on-chip repeatedly with the limitation of on-chip buffer size in an accelerator. We would like to capture data reuse via the on-chip buffer to reduce DRAM accesses of CNN computation. There are three kinds of data reuse can be captured, and those data will be kept by on-chip buffer and be evicted when not needed. The first kind of data reuse is input feature map reuse, the next is filter reuse and the other is intermediate feature map reuse. Each layer in a CNN model may favor different data reuse policy based on the size of its input, output, and filters. But existing CNN accelerators only focus on one type of data reuse through CNN processing. To have flexibility using different data reuse policy for each layer in CNN processing, we would like to propose a reconfigurable CNN accelerator design, which can be configured to capture different types of reuse with the objective of minimizing off-chip memory accesses. With separating the CNN processing into several computation primitives which are units of convolution with different inputs and filters, we can reuse different data by arranging the computation ordering of those computation primitives in our accelerator. And our accelerator will execute based on the instructions generated by off-line generator considering the optimal reuse policy and hardware constraints. Our work shows that with our reconfigurable design, DRAM accesses can be reduced, and compare the execution time and the energy when using different data reuse policy. We also analyze the effect of the different configuration in our CNN accelerator design.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:45:56Z (GMT). No. of bitstreams: 1 ntu-106-R04922056-1.pdf: 1436680 bytes, checksum: 3230484f383de56ad42e7732491638f2 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	Abstract iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Motivation 5 Chapter 3 Related Work 7 3.1 Sparse Neural Network 7 3.2 Data Reuse 8 Chapter 4 Methodology 9 4.1 Mechanism 9 4.2 Hardware Design 11 4.2.1 Overview 11 4.2.2 Processing Element 12 4.2.3 Buffer Controller 13 4.2.4 Global Buffer 14 4.2.5 Central Controller 15 4.2.6 Execution Model 16 4.2.7 Bus 17 Chapter 5 Experiment and Result 18 5.1 Experiment setup 18 5.2 Performance of reconfigurable design 19 5.3 Energy of reconfigurable design 23 5.4 Design factor 26 5.4.1 Global buffer size 26 5.4.2 Filter buffer size 27 5.4.3 PE number 29 Chapter 6 Discussion 31 Chapter 7 Conclusion 33 Bibliography 34
dc.language.iso	en
dc.subject	彈性配置	zh_TW
dc.subject	能源效率	zh_TW
dc.subject	減少DRAM存取	zh_TW
dc.subject	加速器	zh_TW
dc.subject	卷積神經網絡	zh_TW
dc.subject	Convolutional Neural Network	en
dc.subject	CNN accelerator	en
dc.subject	Reconfigurable	en
dc.subject	Reduce DRAM access	en
dc.subject	energy efficiency	en
dc.title	可重組的卷積神經網絡加速器設計	zh_TW
dc.title	A Reconfigurable CNN Accelerator Design	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	徐慰中,洪士灝
dc.subject.keyword	卷積神經網絡,加速器,彈性配置,減少DRAM存取,能源效率,	zh_TW
dc.subject.keyword	Convolutional Neural Network,CNN accelerator,Reconfigurable,Reduce DRAM access,energy efficiency,	en
dc.relation.page	35
dc.identifier.doi	10.6342/NTU201703483
dc.rights.note	有償授權
dc.date.accepted	2017-08-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	1.4 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。