Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87567
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳安宇zh_TW
dc.contributor.advisorAn-Yeu Wuen
dc.contributor.author王則勛zh_TW
dc.contributor.authorTse-Hsun Wangen
dc.date.accessioned2023-06-20T16:06:21Z-
dc.date.available2023-11-09-
dc.date.copyright2023-06-20-
dc.date.issued2022-
dc.date.submitted2022-10-28-
dc.identifier.citation[1] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks.” In NIPS, 2012.
[2] Dario Amodei and Danny Hernandez, https://openai.com/blog/ai-and-compute
[3] Rocio Vargas, Amir Mosavi, and Ramon Ruiz, “Deep learning: A review.” Oct 2018.
[4] Shrestha, A.; Mahmood, A. Review of Deep Learning Algorithms and Architectures. IEEE Access 2019, 7, 53040–53065
[5] Nvidia Developer, “https://developer.nvidia.com/deep-learning”
[6] Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, Michael S. Lew,“Deep learning for visual understanding: A review,” Neurocomputing, Volume 187, 2016, Pages 27-48,
[7] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556, 2014.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[9] Y. Kang et al., “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGPLAN Notices, vol. 52, no. 4, pp. 615–629, 2017.
[10] https://github.com/Xilinx/Vitis-AI
[11] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network inference,” in FPGA, 2017.
[12] N. P. Jouppi et al., “In-datacenter performance analysis of a tensor processing unit,” in Proc. ACM/IEEE Int. Symp. Comput. Architecture (ISCA), 2017
[13] Y. -H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, Jan. 2017, doi: 10.1109/JSSC.2016.2616357.
[14] Y. Park et al., “GRLC: grid-based run-length compression for energy-efficient CNN accelerator,” ISLPED, 2020
[15] Vivienne Sze, “Efficient Processing of Deep Neural Network: from Algorithms to Hardware Architectures” 2019 NeurIPS tutorial
[16] Alwani, Manoj, et al., “Fused-layer CNN accelerators,” Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016.
[17] Zhang, J., Raj, P., Zarar, S., Ambardekar, A., & Garg, S., “CompAct: On-chip Com- pression of Activations for Low Power Systolic Array Based CNN Acceleration,” in ACM Transactions on Embedded Computing Systems (TECS), 2019.
[18] Bill Dally (Stanford), Cadence Embedded Neural Network Summit, February 1, 2017
[19] D. Zhang, J. Yang, D. Ye, and G. Hua, “LQ-Nets: Learned quantization for highly accurate and compact deep neural networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 365–382.
[20] David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sep. 1952), 1098–1101.
[21] Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization, and Huffman coding. CoRR abs/1510.00149.
[22] Y. Choi, M. El-Khamy and J. Lee, "Universal Deep Neural Network Compression," in IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 4, pp. 715-726, May 2020, doi: 10.1109/JSTSP.2020.2975903.
[23] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017
[24] M. Rhu, M. O'Connor, N. Chatterjee, J. Pool, Y. Kwon, and S. W. Keckler, "Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks," 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018, pp. 78-91, doi: 10.1109/HPCA.2018.00017.
[25] Y. -H. Chen, T. -J. Yang, J. Emer and V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019, doi: 10.1109/JETCAS.2019.2910232.
[26] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016, “EIE: Efficient inference engine on compressed deep neural network.” In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 243–254.
[27] Ko, Yousun, Alex Chadwick, Daniel Bates, and Robert Mullins. "Lane compression: A lightweight lossless compression method for machine learning on embedded systems." ACM Transactions on Embedded Computing Systems (TECS) 20, no. 2 (2021): 1-26.
[28] J. Zhu et al., “Flexible-width Bit-level Compressor for Convolutional Neural Network,” in Proc. IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2021, pp. 1-4.
[29] G. Retsinas, A. Elafrou, G. Goumas, and P. Maragos, "Weight Pruning via Adaptive Sparsity Loss," arXiv e-prints, p. arXiv:2006.02768. [Online]. Available: https://ui.adsabs.harvard. edu/abs/ 2020arXiv2006 02768R
[30] (2020 ISLPED) GRLC Grid-based Run-length Compression for Energy-efficient CNN Accelerator
[31] Gil Shomron, Freddy Gabbay, Samer Kurzum, and Uri Weiser. Post-training sparsity-aware quantization. arXiv preprint arXiv:2105.11010, 2021.
[32] L. Cavigelli, G. Rutishauser, and L. Benini, "EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 4, pp. 723-734, Dec. 2019
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87567-
dc.description.abstract隨著現在深度學習的蓬勃發展,深度學習已經是解決各種問題的重要方法。然而,深度學習的運算量非常龐大,以往只能在伺服器上做運算。然而近年來,大資料時代讓資料量呈現指數性的成長。如果我們只於伺服器上做深度學習運算,我們必須面對傳輸資料時間過長和資料隱私的問題。為了解決上述問題,大多研究都指向把深度學習實做在邊緣端,利用深度學習加速器提高運算效能。邊緣端的深度學習加速器仍然需要克服許多困難,最重要的是其高能耗的特性。其能量消耗來自兩個原因,一個是運算上的消耗,另一個是在資料傳輸上的消耗,而後者也是大家往往所忽略的。在深度學習加速器上,每當進行一層運算時,我們時常需要先把激活值的結果從動態隨機存取記憶體 (DRAM) 中取出,運算後再將結果放進DRAM中,因此造成高能量消耗。針對這個問題,本文利用資料壓縮的方式,將輸出激活值壓縮,以減少能量的消耗。本文會利用激活值有很高稀疏性的特性,使用零資料壓縮 (Zero-value Compression, ZVC)技術,此外我們還會搭配塊狀壓縮 (Block Compression, BC) 和繞過機制 (Bypass Mechanism),讓壓縮率來到2.39倍。另外,我們也提出K有損壓縮 (K-lossy Compression),在只降低0.4%準確率的情況下,讓壓縮率來到3.73倍。最後,我們會結合上述提及的演算法優化技術,提出一可調整架構(Scalable architecture)的資料壓縮/解壓縮引擎,相較於代表作,吞吐量提高19%,並只有增加8%的面積。最後用DRAMSim2來驗證此引擎能降低56%在DRAM資料傳輸上的消耗。zh_TW
dc.description.abstractAs the development of deep learning (DL) has become more and more popular, DL has become an important solution to different kinds of problems. However, DL requires a large amount of computation, which can be computed on the cloud. In recent years, the number of data increases exponentially. Thus, cloud-based DL systems face the challenge of large data transmissions and data privacy leakages. To address these issues, most of the research aims to move the inferencing of the DL system to edge devices. The DL accelerators are developed to enhance the computational efficiency of the inferencing process. However, the DLA consumes a lot of energy. There are two aspects to reducing energy consumption: computation and data transmission. We will focus on reducing the energy consumption of the data transmission as it is the bottleneck in the current DLA. When computing a layer in the DLA, the activations are fetched from the DRAM. After computations in DLA, the output activations are stored back in DRAM. The data transmission between the DLA and DRAM causes high energy consumption. In this thesis, we use activation compression (AC) techniques to reduce the transmission between the DLA and DRAM and thus reduce the overall energy consumption. We exploit the high sparsity of activations generated from the ReLU function. The zero-value compression (ZVC) is combined with the block compression and the bypass mechanism. It can achieve a ×2.39 compression ratio. We also propose two K-lossy compression techniques, that is mixed-K lossy compression and K-lossy aware training. With 0.4% accuracy drops, we can achieve a ×3.73 compression ratio. Finally, combining the above algorithms, we propose a scalable architecture and implement it with hardware. The proposed scalable architecture can outperform the state-of-the-art by increasing by 19% throughput with 8% hardware overhead. The overall system’s energy consumption is also verified with DRAMSim2, showing that our method reduces read energy and write energy by 56%.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-06-20T16:06:21Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-06-20T16:06:21Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 vii
摘要 ix
ABSTRACT xi
CONTENTS xiii
LIST OF FIG.URES xvi
LIST OF TABLES xx
Chapter 1 Introduction 1
1.1 Background 1
1.1.1 The Background of Deep learning 1
1.1.2 Deep Learning from Cloud to Edge 3
1.1.3 Deep Learning Accelerator (DLA) 5
1.2 Motivation and Main Contributions 6
1.2.1 The architecture of DLA 6
1.2.2 The bottleneck of DLA 7
1.2.3 Thesis Target 8
1.3 Thesis Organization 9
Chapter 2 Review of Activation Compression 10
2.1 Related Works of Activation Compression 10
2.1.1 Entropy Coding 10
2.1.2 Zero-value compression (ZVC) 11
2.1.3 Zero Run-length Coding (Z-RLC) 13
2.1.4 Lane Compression 16
2.2 Challenges of the Prior Works 18
2.3 Summary 19
Chapter 3 Algorithm of Compression 20
3.1 Proposed Bit-level ZVC 20
3.1.1 Block Compression 20
3.1.2 Bypass Mechanism 23
3.2 K-Lossy Compression 26
3.2.1 Analysis of K-Lossy 26
3.2.2 Mixed K-lossy Compression 30
3.3 K-lossy-aware Training 35
3.3.1 Add K-lossy Noise 35
3.3.2 Simulation Result 36
3.4 Budget-aware Compression 39
3.4.1 Problem Formulation 39
3.4.2 Budget-aware Compression 42
3.4.3 Simulation Result 44
3.5 Summary 45
Chapter 4 Hardware IP and System Analysis 46
4.1 Hardware of Compressor 46
4.1.1 The Architecture of Compressor 46
4.1.2 Basic Function 48
4.1.3 ZVC and Block Compression 51
4.1.4 Proposed Scalable Bit-level Non-Zero Values Concatenation 53
4.1.5 Output Wrapper 55
4.2 Hardware of Decompressor 57
4.2.1 The Architecture of Decompressor 57
4.2.2 Input Unpacker 59
4.2.3 The Decompression Method 60
4.2.4 Proposed Scalable Bit-level Decompressor 61
4.3 Performance Results 63
4.3.1 Performance Analysis of Different Sizes of Sub-block 63
4.3.2 Performance Analysis of Related Work 64
4.4 System Analysis 65
4.4.1 Power analysis 65
4.4.2 DRAMSim2 66
4.4.3 Simulation Results 68
4.5 Summary 70
Chapter 5 Main Contribution and Future Directions 71
5.1 Main Contribution 71
5.2 Future Directions 72
REFERENCE 73
-
dc.language.isoen-
dc.subject可調整架構zh_TW
dc.subject有損壓縮zh_TW
dc.subject塊狀壓縮zh_TW
dc.subject激活值壓縮zh_TW
dc.subject零資料壓縮zh_TW
dc.subject繞過機制zh_TW
dc.subjectZero-value compressionen
dc.subjectBypass Mechanismen
dc.subjectK-lossyen
dc.subjectBlock Compressionen
dc.subjectActivation compressionen
dc.subjectscalable architectureen
dc.title基於稀疏性之低記憶體使用量激活值壓縮引擎設計zh_TW
dc.titleSparsity-based Activation Compression Engine Design for Low-memory Access in DLAen
dc.typeThesis-
dc.date.schoolyear111-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee盧奕璋;沈中安zh_TW
dc.contributor.oralexamcommitteeYi-Chang Lu;Chung-An Shenen
dc.subject.keyword激活值壓縮,零資料壓縮,塊狀壓縮,繞過機制,有損壓縮,可調整架構,zh_TW
dc.subject.keywordActivation compression,Zero-value compression,Block Compression,Bypass Mechanism,K-lossy,scalable architecture,en
dc.relation.page76-
dc.identifier.doi10.6342/NTU202210009-
dc.rights.note未授權-
dc.date.accepted2022-10-31-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-1.pdf
  未授權公開取用
4.11 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved