Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78852
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊家驤zh_TW
dc.contributor.advisorChia-Hsiang Yangen
dc.contributor.author呂丞勛zh_TW
dc.contributor.authorCheng-Hsun Luen
dc.date.accessioned2021-07-11T15:24:07Z-
dc.date.available2024-01-01-
dc.date.copyright2019-01-24-
dc.date.issued2019-
dc.date.submitted2002-01-01-
dc.identifier.citation[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.
[2] D. E. Rumelhart, G. E. Hinton, R. J. Williams, “Learning representations by backpropagating errors,” Nature, vol. 323, pp. 533-–536, Oct. 1986.
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,” IEEE Computer Vision and Pattern Recognition (CVPR), 2009.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105, 2012.
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 770-–778, Jun. 2016.
[7] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” CVPR, pp. 1–9, 2015.
[8] O. Abdel-Hamid, A.-R. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2012, pp. 4277–4280.
[9] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 3431–-3440.
[10] J. Donahue et al., “Long-term recurrent convolutional networks for visual recognition and description,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 2625–-2634.
[11] Y.-H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 262–263, Feb. 2016.
[12] B. Moons, M. Verheist, “A 0.3–2.6 TOPS/W precision-scalable processor for realtime large-scale ConvNets,” Proc. IEEE Symp. VLSI Circuits, pp. 1–2, Jun. 2016.
[13] B. Moons, M. Verheist, “An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 52, pp. 903–914, no. 4, Apr. 2017.
[14] B. Moons, R. Uytterhoeven, W. Dehaene, M. Verheist, “Envision: A 0.26-to-10 TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm FDSOI,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 246–257, Feb. 2017.
[15] S. Yin, P. Ouyang, S. Tang, F. Tu, X. Li, L. Liu, S. Wei, “A 1.06-to-5.09 TOPS/W Reconfigurable Hybrid-Neural-Network Processor for Deep Learning Applications,” Symposium on VLSI Circuits Digest of Technical Papers, pp. 26–27, 2017.
[16] J. Lee, C. Kim, S.kang, D. Shin, S. Kim, H.-H. Yoo, “UNPU: A 50.6TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 218–219, Feb. 2018.
[17] S. Choi, J. Lee, K. Lee, and H.-H. Yoo, “A 9.02mW CNN-Stereo-Based Real-Time 3D Hand-Gesture Recognition Processor for Smart Mobile Devices,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 220–221, Feb. 2018.
[18] D. Bankman, L. Yang, B. Moons, M. Verhelst, B. Murmann, “An Always-On 3.8μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28nm CMOS,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 222–223, Feb. 2018.
[19] S. Yin, P. Ouyang, J. Yang, T. Lu, X. Li, L. Liu, S. Wei, “An ultra-high energy-efficient reconfigurable processor for deep neural networks with binary/ternary weights in 28nm CMOS,” Symposium on VLSI Circuits Digest of Technical Papers, pp. 37–38, 2018.
[20] S. Yin, P. Ouyang, S. Zheng, D. Song, X. Li, L. Liu, S. Wei, “A 141 uW, 2.46 pJ/Neuron Binarized Convolutional Neural Network based Self-learning Speech Recognition Processor in 28nm CMOS,” Symposium on VLSI Circuits Digest of Technical Papers, pp. 139–140, 2018.
[21] D. Shin, J. Lee, J. Lee, H.-J. Yoo, “DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 240–242, Feb. 2017.
[22] Z. Yuan, J. Yue, H. Yang, Z. Wang, J. Li, Y. Yang, Q. Guo, X. Li, M.-F. Chang, H. Yang and Y. Liu, “STICKER: A 0.41-62.1 TOPS/W 8bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers, ”Symposium on VLSI Circuits Digest of Technical Papers, pp. 139–140, 2018.
[23] S. Bang, J. Wang, Z. Li, C. Gao, Y. Kim, Q. Dong, Y.-P. Chen, L. Fick, X. Sun, R. Dreslinski, T. Mudge, H.-S. Kim, D. Blaauw, D. Sylvester, “A 288μW Programmable Deep-Learning Processor with 270KB On-Chip Weight Storage Using Non-Uniform Memory Hierarchy for Mobile Intelligence,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 250–252, Feb. 2017.
[24] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. ACM/IEEE 43rd Annu. Int. Symp. Comput. Archit. (ISCA), Jun. 2016, pp. 267–-278.
[25] P. N. Whatmough, S. K. Lee, H. Lee, S. Rama, D. Brooks, G.-Y. Wei, “A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications,” International Solid-State Circuits Conference (ISSCC) digest technical papers, pp. 242–243, Feb. 2017.
[26] J. Yosinski, J. Clune, Y. Bengio, H. Lipson, “How transferable are features in deep neural networks?,” Advances in Neural Information Processing System 27 (NIPS), pp 3320–3328, 2014.
[27] B. Fleischer, et. al., “A Scalable Multi-TeraOPS Deep Learning Processor Core for AI Training and Inference,” Symposium on VLSI Circuits Digest of Technical Papers, pp. 35–36, 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78852-
dc.description.abstract深度學習已廣泛使用於各種領域,並且在某些應用上已達到超越人類的性能。為了滿足於對於運算能力的需求,目前已有許多客製化的深度神經網路推論加速器。主動學習機制對於安全與隱私保護有顯著的提升,特別是用於醫療照護及身份認證,針對使用者特徵進行神經網路調整更可進一步提高辨識準確度。這些功能都仰賴晶片上訓練,但目前支援晶片上訓練卻相當有限。考量訓練計算所需的運算複雜度比起推論高上許多,設計具高能量效率能支援推論及訓練的深度學習處理器相當具有挑戰。本設計提出文獻上第一顆可同時支援深度神經網路推論與訓練的客制化卷積式神經網路處理器,可支援各種神經網路維度與多種精準度需求。針對推理及訓練的卷積運算流程進行資料重新排序,並將卷積層及完全連接層的運算轉換成相同運算以大幅提升處理器性能。最大池化層與線性整流器的共同設計可以降低近75%的記憶體需求。簡化後的歸一化函數可省下78%的硬體資源。浮點數與固定點數整合分別為乘法器與加法器省下56.8%與17.3%的硬體資源,合併兩者之乘加器更能進一步節省33%的硬體資源。透過資料閘控及時脈閘控,可以在低精準度模式省下62%功率消耗。處理器以40nm實現,推理階段能達到1.25 TOPS/W的能量效能,與文獻上之推理加速器效能相當。操作於訓練模式之能量效能可達 327 GOPS/W之,達到高於CPU 105倍的能量效率。zh_TW
dc.description.abstractDeep learning has been widely deployed in many areas and demonstrates beyond-human performance in some applications. In order to meet the required computing power, many dedicated accelerators for deep neural networks in inference have been proposed. Active learning is beneficial to security and privacy protection, especially for the health care and ID verification. In addition, Self-adaptation can further improve the classification performance by leveraging users’ specific features. These are enabled by on-chip training, but only limited on-chip training is supported by existing solutions. Considering the computational complexity of training is much higher than that of inference, designing an energy-efficient processor for both inference and training is very challenging. This work presents a deep learning processor that can support both inference and training for convolutional neural networks with any dimensions and variable precisions. Data re-arrangement and operations formulation are utilized to significantly improve the performance. Maxpooling and ReLU modules are co-designed to reduce the memory requirement by 75%. The softmax function is modified to reduce the hardware area by 78%. The integration of fixed-point and floating-point operators reduce the area of multipliers and adders by 56.8% and 17.3%, respectively. Integrating a multiplier and an adder into a unified MAC unit further reduces area by 33%. In the low-precision mode, clock gating and data gating are employed to reduce the power consumption by 62%. Fabricated in 40nm technology, the proposed deep learning processor achieves 1.25TOPS/W in inference, which is competitive with state-of-the-art inference designs. The chip also delivers an energy efficiency of 327GOPS/W in training, which is 105 higher than a high-end CPU.en
dc.description.provenanceMade available in DSpace on 2021-07-11T15:24:07Z (GMT). No. of bitstreams: 1
ntu-108-R05943005-1.pdf: 3814795 bytes, checksum: 48f5e1576cdf84c0e5d542bfce961434 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents口試委員會審定書 ii
誌謝 iii
摘要 iv
ABSTRACT v
Contents vii
List of Figures ix
List of Tables x
1 INTRODUCTION 1
2 Convolutional Neural Network Algorithm 4
2.1 Inference Algorithm 4
2.2 Training Algorithm 6
2.2.1 Derivatives of convolutional layers 7
2.2.2 Derivatives of fully-connected layers 9
2.2.3 Derivatives of activation functions 9
3 System Architecture 10
3.1 Processing Unit (PU) 12
3.2 Processing Element Cluster (PEC) 15
3.3 Maxpooling and ReLU Module 18
3.4 Modified Softmax Module 20
4 Power-Area Optimization 21
4.1 Filter Rearrangement 21
4.2 Fixed-Floating-Point Hardware Integration 23
4.3 Multiplier with Variable Wordlengths 25
4.4 Modified Softmax 27
5 Chip Implementation 29
6 CONCLUSION 34
References 36
-
dc.language.isoen-
dc.subjectCMOS 數位積體電路zh_TW
dc.subject晶片上訓練zh_TW
dc.subject神經網路調整zh_TW
dc.subject主動學習zh_TW
dc.subject卷積式神經網路zh_TW
dc.subject深度學習zh_TW
dc.subjectCMOS digital integrated circuitsen
dc.subjectneural network adaptationen
dc.subjectactive learningen
dc.subjectconvolutional neural networken
dc.subjectDeep learningen
dc.subjecton-chip trainingen
dc.title具適應智能之可程式化深度學習處理器zh_TW
dc.titleA Fully-Programmable Deep Learning Processor with Adaptable Intelligenceen
dc.typeThesis-
dc.date.schoolyear107-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee劉宗德;柳德政zh_TW
dc.contributor.oralexamcommitteeTsung-Te Liu;en
dc.subject.keyword深度學習,卷積式神經網路,主動學習,神經網路調整,晶片上訓練,CMOS 數位積體電路,zh_TW
dc.subject.keywordDeep learning,convolutional neural network,active learning,neural network adaptation,on-chip training,CMOS digital integrated circuits,en
dc.relation.page39-
dc.identifier.doi10.6342/NTU201900136-
dc.rights.note未授權-
dc.date.accepted2019-01-22-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
dc.date.embargo-lift2024-01-24-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
3.73 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved