Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68991
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊佳玲
dc.contributor.authorYi-Jou Leeen
dc.contributor.author李依柔zh_TW
dc.date.accessioned2021-06-17T02:45:56Z-
dc.date.available2019-08-25
dc.date.copyright2017-08-25
dc.date.issued2017
dc.date.submitted2017-08-15
dc.identifier.citation[1] M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-layer cnn accelerators. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1–12. IEEE, 2016.
[2] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, volume 49, pages 269–284. ACM, 2014.
[3] Z. Du, A. Lingamneni, Y. Chen, K. Palem, O. Temam, and C. Wu. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. In Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific, pages 201–206. IEEE, 2014.
[4] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pages 109–116. IEEE, 2011.
[5] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture, pages 243–254. IEEE Press, 2016.
[6] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135–1143, 2015.
[7] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[8] Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 45–54. ACM, 2017.
[9] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[10] O. Temam. A defect-tolerant accelerator for emerging high-performance applications. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on, pages 356–367. IEEE, 2012.
[11] S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. Jouppi. cacti 5.3 rev 174. HP Labs,[Online]. Available: http://quid. hpl. hp. com, 9081, 2014.
[12] S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. Cambricon-x: An accelerator for sparse neural networks. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1–12. IEEE, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/68991-
dc.description.abstract隨著卷積神經網絡的參數量逐漸增大,卷積神經網絡加速器的性能和能量效率成為一個重要問題。
從之前的設計中,我們可以發現,因為資料量的龐大,DRAM存取佔能耗的很大一部分。觀察卷積層的運算行為,可以發現計算中有許多參數可以被共用,但是這些參數可能因為加速器上的儲存空間有限會被重複從DRAM讀取。
所以我們希望可以通過加速器中的儲存空間來重複使用參數,以減少對計算中對DRAM的讀取。
參數的重複使用可以分為三種,一種是以重複使用輸入的參數、一種是重複使用濾波器,另一種則是重複使用中間產物的參數。
卷積神經網絡模型中的每個層都可以根據其輸入,輸出和濾波器的大小來支持不同的數據重用策略。
但現有的卷積神經網絡加速器只關注通過卷積神經網絡處理的一種數據重複使用。
為了在卷積神經網絡處理中為每一層使用不同的數據復用策略具有靈活性,我們想提出一種可重新配置的卷積神經網絡的加速器設計,可以彈性的配置利用不同類型的數據重用,來最小化DRAM存取的資料量。
通過將卷積神經網絡處理分為不同輸入和濾波器卷積單位的計算單元,我們可以通過在加速器中排列這些計算單元的計算順序來使用不同的數據重用。
而加速器的將會根據事先分析模組產生的策略所生成的指令來執行。
我們的結果展示了卷積神經網絡加速器設計中使用重組後可以使得DRAM存取的量減少,比較在使用不同資料重複使用的策略下,執行時間的差異。
並且我們也實驗了不同硬體限制的情況下,對於執行結果所造成的分析結果。
zh_TW
dc.description.abstractWith the large size of the convolutional neural network (CNN), performance and energy efficiency of CNN accelerator become an important problem.
From previous works, we can find that DRAM accesses took a large part in energy consumption. To reduce DRAM accesses, we observe the computation behavior of convolutional layer, and many parameters are shared between computation. Those data may be loaded on-chip repeatedly with the limitation of on-chip buffer size in an accelerator.
We would like to capture data reuse via the on-chip buffer to reduce DRAM accesses of CNN computation.
There are three kinds of data reuse can be captured, and those data will be kept by on-chip buffer and be evicted when not needed. The first kind of data reuse is input feature map reuse, the next is filter reuse and the other is intermediate feature map reuse.
Each layer in a CNN model may favor different data reuse policy based on the size of its input, output, and filters.
But existing CNN accelerators only focus on one type of data reuse through CNN processing.
To have flexibility using different data reuse policy for each layer in CNN processing, we would like to propose a reconfigurable CNN accelerator design, which can be configured to capture different types of reuse with the objective of minimizing off-chip memory accesses.
With separating the CNN processing into several computation primitives which are units of convolution with different inputs and filters, we can reuse different data by arranging the computation ordering of those computation primitives in our accelerator. And our accelerator will execute based on the instructions generated by off-line generator considering the optimal reuse policy and hardware constraints.
Our work shows that with our reconfigurable design, DRAM accesses can be reduced, and compare the execution time and the energy when using different data reuse policy. We also analyze the effect of the different configuration in our CNN accelerator design.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T02:45:56Z (GMT). No. of bitstreams: 1
ntu-106-R04922056-1.pdf: 1436680 bytes, checksum: 3230484f383de56ad42e7732491638f2 (MD5)
Previous issue date: 2017
en
dc.description.tableofcontentsAbstract iii
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Motivation 5
Chapter 3 Related Work 7
3.1 Sparse Neural Network 7
3.2 Data Reuse 8
Chapter 4 Methodology 9
4.1 Mechanism 9
4.2 Hardware Design 11
4.2.1 Overview 11
4.2.2 Processing Element 12
4.2.3 Buffer Controller 13
4.2.4 Global Buffer 14
4.2.5 Central Controller 15
4.2.6 Execution Model 16
4.2.7 Bus 17
Chapter 5 Experiment and Result 18
5.1 Experiment setup 18
5.2 Performance of reconfigurable design 19
5.3 Energy of reconfigurable design 23
5.4 Design factor 26
5.4.1 Global buffer size 26
5.4.2 Filter buffer size 27
5.4.3 PE number 29
Chapter 6 Discussion 31
Chapter 7 Conclusion 33
Bibliography 34
dc.language.isoen
dc.subject彈性配置zh_TW
dc.subject能源效率zh_TW
dc.subject減少DRAM存取zh_TW
dc.subject加速器zh_TW
dc.subject卷積神經網絡zh_TW
dc.subjectConvolutional Neural Networken
dc.subjectCNN acceleratoren
dc.subjectReconfigurableen
dc.subjectReduce DRAM accessen
dc.subjectenergy efficiencyen
dc.title可重組的卷積神經網絡加速器設計zh_TW
dc.titleA Reconfigurable CNN Accelerator Designen
dc.typeThesis
dc.date.schoolyear105-2
dc.description.degree碩士
dc.contributor.oralexamcommittee徐慰中,洪士灝
dc.subject.keyword卷積神經網絡,加速器,彈性配置,減少DRAM存取,能源效率,zh_TW
dc.subject.keywordConvolutional Neural Network,CNN accelerator,Reconfigurable,Reduce DRAM access,energy efficiency,en
dc.relation.page35
dc.identifier.doi10.6342/NTU201703483
dc.rights.note有償授權
dc.date.accepted2017-08-16
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-106-1.pdf
  未授權公開取用
1.4 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved