適用於記憶體內運算之神經網路演算法與架構共同設計

張承洋; Cheng-Yang Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93275

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳安宇	zh_TW
dc.contributor.advisor	An-Yeu Wu	en
dc.contributor.author	張承洋	zh_TW
dc.contributor.author	Cheng-Yang Chang	en
dc.date.accessioned	2024-07-23T16:37:58Z	-
dc.date.available	2024-07-24	-
dc.date.copyright	2024-07-23	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-22	-
dc.identifier.citation	[1] “The Digitization of the World From Edge to Core” [Online]. Available: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf [2] Zhenhua Zhu et al., ASP-DAC 2024 Tutorial-7 [3] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017. [4] K1 Monday Keynote: Semiconductor Technology - A System Perspective - H.S. Philip Wong [5] Xu, Xiaowei, et al. “Scaling for edge inference of deep neural networks,” Nature Electronics 1.4, pp. 216-222, 2018. [6] S. Yu, H. Jiang, S. Huang, X. Peng and A. Lu, “Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects,” in IEEE Circuits and Systems Magazine, vol. 21, no. 3, pp. 31-56, 2021. [7] https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator.html [8] Y. -H. Chen, T. Krishna, J. S. Emer and V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127-138, 2017. [9] J. Lee, C. Kim, S. Kang, D. Shin, S. Kim and H. -J. Yoo, “UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision,” in IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 173-185, 2019. [10] Y. He et al., “7.3 A 28nm 38-to-102-TOPS/W 8b Multiply-Less Approximate Digital SRAM Compute-In-Memory Macro for Neural-Network Inference,” 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 130-132, 2023. [11] Peng, Xiaochen, et al. “Inference engine benchmarking across technological platforms from CMOS to RRAM,” in Proceedings of the International Symposium on Memory Systems, pp. 471-479, 2019. [12] G. Yeap et al., “5nm CMOS production technology platform featuring full-fledged EUV, and high mobility channel FinFETs with densest 0.021µm2 SRAM cells for mobile SoC and high performance computing applications,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2019. [13] C. Yu, T. Yoo, K. T. C. Chai, T. T. -H. Kim and B. Kim, “A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks,” in IEEE Journal of Solid-State Circuits, vol. 57, no. 11, pp. 3466-3476, Nov. 2022. [14] J. -H. Yoon, M. Chang, W. -S. Khwa, Y. -D. Chih, M. -F. Chang and A. Raychowdhury, “A 40-nm 118.44-TOPS/W Voltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi-Bit Encoding,” in IEEE Journal of Solid-State Circuits, vol. 57, no. 3, pp. 845-857, 2022. [15] J.-Y. Wu et al., “A 40nm low-power logic compatible phase change memory technology,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), 2018. [16] T. Shimoi et al., “A 22-nm 32-Mb Embedded STT-MRAM Macro Achieving 5.9-ns Random Read Access and 7.4-MB/s Write Throughput at up to 150°C,” in IEEE Journal of Solid-State Circuits, 2023. [17] K. Guo, W. Li, K. Zhong, Z. Zhu, S. Zeng, S. Han, Y. Xie, P. Debacker, M. Verhelst, Y. Wang. “Neural Network Accelerator Comparison” [Online]. Available: https://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator/ [18] C. -Y. Chang, K. -C. Chou, Y. -C. Chuang and A. -Y. Wu, “E-UPQ: Energy-Aware Unified Pruning-Quantization Framework for CIM Architecture,” in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 13, no. 1, pp. 21-32, 2023. [19] Peng, Xiaochen, et al. “DNN+ NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies,” 2019 IEEE international electron devices meeting (IEDM), 2019. [20] S. Mittal, “A survey of ReRAM-based architectures for processing-in-memory and neural networks,” Mach. Learn. Knowl. Extraction, vol. 1, no. 1, pp. 75–114, 2019. [21] H. Ji, et al., “ReCom: An efficient resistive accelerator for compressed deep neural networks,” Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 237-240, 2018. [22] J. Lin, et al., “Learning the sparsity for ReRAM: Mapping and pruning sparse neural network for reram based accelerator,” Proceedings of Asia and South Pacific Design Automation Conference, pp. 639-644, 2019. [23] T. H. Yang, et al., “Sparse ReRAM engine: Joint exploration of activation and weight sparsity in compressed neural networks,” Proceedings of International Symposium on Computer Architecture, pp. 236-249, 2019. [24] S. Yang, et al., “AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator,” Proceedings of the ACM International Conference on Supercomputing, pp. 304-315, 2021. [25] S. H. Sie, et al., “MARS: Multimacro architecture SRAM CIM-based accelerator with co-designed compressed neural networks.” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41.5, pp. 1550-1562, 2021. [26] H. Sun, et al., “An energy-efficient quantized and regularized training framework for processing-in-memory accelerators,” Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 325-330, 2020. [27] S. Huang, et al., “Mixed precision quantization for ReRAM-based DNN inference accelerators.” Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 372-377, 2021. [28] B. Kang, et al., “Genetic algorithm-based energy-aware CNN quantization for processing-in-memory architecture,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 11.4, pp. 649-662, 2021. [29] B. Li, et al., “An automated quantization framework for high-utilization RRAM-based PIM,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41.3, pp. 583-596, 2021. [30] F. Liu, et al., “SoBS-X: Squeeze-out bit sparsity for ReRAM-crossbar-based neural network accelerator,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022. [31] Y. Kim et al., “Extreme Partial-Sum Quantization for Analog Computing-In-Memory Neural Network Accelerators,” in ACM Journal on Emerging Technologies in Computing Syst. (JETC), pp. 1-19, 2022. [32] J. Bai, W. Xue, Y. Fan, S. Sun, and W. Kang, “Partial Sum Quantization for Computing-In-Memory Based Neural Network Accelerator,” in IEEE Trans. Circuits and Syst. II: Express Briefs, 2023. [33] Y. Cai et al., “Low Bit-Width Convolutional Neural Network on RRAM,” in IEEE Trans. Computer-Aided Design of Integrated Circuits and Syst. (TCAD), vol. 39.7, pp. 1414-1427, 2019. [34] X. Sun et al., “XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks,” Design, Automation & Test in Europe Conf. & Exhibition (DATE), pp. 1423-1428, 2018. [35] A. Azama et al., “Quarry: Quantization-based ADC reduction for ReRAM-based deep neural network accelerators,” in IEEE/ACM Inter. Conf. On Computer Aided Design (ICCAD), 2021. [36] G. Yuan et al., “TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators,” in Design, Automation & Test in Europe Conf. & Exhibition (DATE), pp. 926-931, 2021. [37] F. Tu et al., “A 28nm 29.2 TFLOPS/W BF16 and 36.5 TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration,” in IEEE Inter. Solid-State Circuits Conf. (ISSCC), vol. 65, pp. 1-3, 2022. [38] G. Yuan et al., “FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator,” in ACM/IEEE 48th Annual Inter. Symp. Computer Architecture (ISCA), 2021. [39] W. S. Khwa et al.,” A 40-nm, 2M-Cell, 8b-Precision, Hybrid SLC-MLC PCM Computing-in-Memory Macro with 20.5 - 65.0TOPS/W for Tiny-Al Edge Devices,” in IEEE Inter. Solid-State Circuits Conf. (ISSCC), pp. 1-3, 2022. [40] F. Karimzadeh, J. -H. Yoon and A. Raychowdhury, “BitS-Net: Bit-Sparse Deep Neural Network for Energy-Efficient RRAM-Based Compute-In-Memory,” in IEEE Trans. on Circuits and Syst. I: Regular Papers, vol. 69, no. 5, pp. 1952-1961, 2022. [41] R. Guo et al., “TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization,” in IEEE Journal of Solid-State Circuits, vol. 58, no. 3, pp. 852-866, 2023. [42] Y. Wang, et al., “Differentiable joint pruning and quantization for hardware efficiency,” European Conference on Computer Vision, pp. 259-277, 2020. [43] T. Wang, et al., “APQ: Joint search for network architecture, pruning and quantization policy,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2075-2084, 2020. [44] S. Zhou, et al., “DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients,” arXiv preprint arXiv:1606.06160,2016. [45] F. Liu, et al., “IVQ: In-memory acceleration of dnn inference exploiting varied quantization,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022. [46] J. Yue, et al., “STICKER-IM: A 65 nm Computing-in-Memory NN Processor Using Block-Wise Sparsity Optimization and Inter/Intra-Macro Data Reuse,” IEEE Journal of Solid-State Circuits, 2022. [47] Simonyan, Karen, and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. [48] He, Kaiming, et al., “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016. [49] B. Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” in Proc. IEEE Conf on Computer Vision and Pattern Recognition (CVPR), 2018. [50] G. Shomron, F. Gabbay, S. Kurzum, and U. Weiser, “Post-Training Sparsity-Aware Quantization,” Advances in Neural Information Processing Syst, pp. 17737-17748, 2021. [51] J. H. Wilkinson, “Rounding errors in algebraic processes,” Courier Corporation, 1994. [52] W. Hua et al., “Channel gating neural networks.,” in NeurIPS, pp. 1886-1896, 2019. [53] Y. Zhang et al., “Precision gating: Improving neural network efficiency with dynamic dual-precision activations,” in ICLR, 2020. [54] G. W. Wu et al., “DE-C3: Dynamic Energy-Aware Compression for Computing-In-Memory-Based Convolutional Neural Network Acceleration,” in SOCC, 2023. [55] Y. C. Wu et al. “DEA-NIMC: Dynamic Energy-Aware Policy for Near/In-Memory Computing Hybrid Architecture,” in SOCC, 2023. [56] A. Dosovitskiy et al. ,“An image is worth 16x16 words: Transformers for image recognition at scale,” in Inter. Conf. on Learning Representations, 2021. [57] Y. Lin et al., “Demonstration of Generative Adversarial Network by Intrinsic Random Noises of Analog RRAM Devices,” in IEEE Inter. Electron Devices Meeting (IEDM), pp. 3.4.1-3.4.4, 2018 [58] A. Shafiee et al.,” ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars,” in Proc. 43rd Annu. Int. Symp. Computer Architecture (ISCA), pp. 14-26, 2016 [59] F. Tu et al., “ReDCIM: Reconfigurable Digital Computing-In-Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration,” in JSSC, vol. 58, no. 1, pp. 243-255, 2023. [60] J. Yue et al., “A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, 2023. [61] A. Guo et al., “A 28nm 64-kb 31.6-TFLOPS/W Digital-Domain Floating-Point-Computing-Unit and Double-Bit 6T-SRAM Computing-in-Memory Macro for Floating-Point CNNs,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 128-130, 2023. [62] P. Chen et al., “7.8 A 22nm Delta-Sigma Computing-In-Memory (Δ∑CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 140-142, 2023. [63] F. Tu et al., “A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 466-468, 2022. [64] H. Fujiwara et al., “A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computing-in-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations,” IEEE International Solid-State Circuits Conference (ISSCC), pp. 1-3, 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93275	-
dc.description.abstract	隨著資訊與電子產業的高速發展，全球產生的資料量也指數型上升，國際數據資訊 (International Data Corporation, IDC) 指出，在2025年時，全球的年資料產量將會到達175兆GB，因此新型態的資料處理框架需要被提出。同時，深度學習 (Deep Learning) 技術快速興起，在電腦視覺及自然語言領域打敗傳統演算法，並融入人們的日常生活中，許多應用逐漸從傳統只依賴雲端運算的解決方案，移向智慧邊緣裝置，將傳統以“運算”為中心的計算方式，拓展成以“資料”為中心的運算方式。但是隨著深度學習的快速發展，現有的邊緣裝置將面臨記憶體資料傳輸瓶頸，即使處理器的運算速度遠快於記憶體讀寫，資料處理速度仍會受記憶體傳輸頻寬局限，不足以支撐如此複雜的演算法以及如此龐大的模型！基於記憶體內運算 (Computing-in-Memory) 之非馮紐曼架構 (non-von Neumann Architecture) 逐漸興起，將運算邏輯嵌入記憶體單元，解決資料傳輸的瓶頸問題，並具備低功耗以及高密度之優點，能有效提升運算能源效率，近年來，記憶體內運算與神經網路模型應用相結合，取得了卓越的推論性能。然而，隨著深度學習應用的複雜度提升，神經網路模型參數量的成長速度遠高於記憶體內運算所帶來之性能提升，因此眾多研究人員也紛紛轉向思考如何針對記憶體內運算技術優化模型架構，開發硬體友善之演算法。在本論文中，我們的目標在於利用演算法-架構共同優化概念，引進神經網路之稀疏性 (Sparsity) 提升記憶體內運算的能源效率。儘管神經網路之推論受益於稀疏性，其稀疏粒度 (Granularity) 可根據記憶體內運算的模式進一步改善，神經網路各層的稀疏性大小也應採取系統化的設定方式，以確保能源效率的最大化。然而，過往文獻在解決運算效能時僅著重於壓縮模型大小，忽略記憶體內運算架構能耗分布特性，且針對兩種常見的模型壓縮方法: 剪枝 (Pruning) 及量化 (Quantization) 分開進行優化，導致最終神經網路模型之推論能耗仍高於預期; 另外，在記憶體內運算系統中使用類比-數位轉換器 (ADCs) 占據了能耗的重要部分，過去文獻雖已經探討使用低精度ADCs以節省能耗，或是利用稀疏偵測機制避免ADC資源的浪費，但這些方法須依賴訓練數據的調整以最小化模型的準確度損失，造成較高的部署前成本。為了克服上述困難，本論文提出在記憶體內運算基礎上具備能耗覺察 (Energy-aware) 特性的模型壓縮技術，將壓縮所帶來的能耗下降量作為決定稀疏程度的依據，讓模型針對能耗較大的權重進行壓縮; 此外，我們亦提出可訓練化的參數實現位元層級的壓縮，將剪枝/量化技術統一視為混合精度量化 (Mixed-Precision) 的選項，在壓縮過程中進行可微分的共同搜索，以確保模型在準確率與能耗之間取得最佳平衡。此外，我們基於近似運算 (Approximate Computing) 想法，提出了一種即時資料位寬調整的數值範圍感知舍入 (Range-aware Rounding) 技術，避免部署前調整模型權重的成本，此技術可以使用動態塊浮點算法 (Dynamic Block-Floating-Point Arithmetic) 整合到記憶體內運算架構，降低高功耗的ADC存取次數，亦能配合動態推論提升推論能源效率及吞吐量。本論文提出適用於記憶體內運算之神經網路推論運算架構，在台積電28奈米製程環境下整合記憶體內運算模塊及數位模組，實現上述兩套演算法於晶片實作，藉此驗證此運算架構能達到較高的能源效率。	zh_TW
dc.description.abstract	With the rapid development of the information and electronics industry, the amount of data generated globally has been rising exponentially. The International Data Corporation (IDC) predicts that by 2025, the annual global data production will reach 175 zettabytes, necessitating new data processing frameworks. Concurrently, deep learning (DL) technologies have quickly emerged, surpassing traditional algorithms in fields such as computer vision and natural language processing and integrating them into daily life. Many applications are gradually shifting from cloud-based solutions to intelligent edge devices, transitioning from computation-centric to data-centric approaches. However, with the rapid advancement of DL, existing edge devices face a bottleneck in data transfer, as the speed of processors far exceeds that of memory read/write operations, limiting the data processing speed and insufficiently supporting complex algorithms and large models. Computing-in-Memory (CIM) based on the non-von Neumann architecture embeds computational logic within memory units to address data transfer bottlenecks, offering low power consumption and high density. Combining CIM with deep neural networks (DNN) has recently achieved outstanding inference energy efficiency. However, as the complexity of DL applications and the size of the DNN model grow much faster than the performance improvements from CIM, researchers are increasingly focusing on optimizing model architectures to develop hardware-friendly algorithms for CIM technology. This dissertation aims to leverage sparsity in convolutional neural network (CNN) models to enhance the energy efficiency of CIM with algorithm-architecture co-optimization. While CNN inference benefits from sparsity, its granularity should be adapted to the computing scheme of CIM. In addition, a systematic approach should be adopted to set the sparsity levels of different layers. Previous literature focused on reducing model size to enhance efficiency, overlooking the energy consumption distribution characteristics of the CIM architecture. Furthermore, standard compression techniques, e.g., pruning and quantization, are optimized separately, resulting in sub-optimal solutions. Meanwhile, Analog-Digital Converters (ADCs) in CIM systems account for significant energy consumption. Although previous studies have explored using low-precision ADCs or employing sparsity detection mechanisms to avoid wasting ADC resources, these methods rely on adjusting model weights according to calibration data, leading to additional pre-deployment costs. This dissertation proposes an energy-aware model compression technique for CIM to overcome these challenges. We decide the sparsity level according to the energy reduction from compression, preferentially compressing energy-intensive weight groups. Additionally, we introduce trainable parameters for bit-level compression, treating pruning/quantization as mixed-precision quantization options and conducting a differentiable joint search during compression to ensure an optimal balance between accuracy and energy consumption. Meanwhile, based on approximate computing, we propose a Range-aware Rounding technique for run-time bit-width adjustment to avoid pre-deployment costs. This technique can be integrated into the CIM architecture using Dynamic Block-Floating-Point (BFP) Arithmetic, enhancing inference performance by reducing ADC accesses. Dynamic inference mechanisms can also be adapted to exploit input-specific redundancy to improve efficiency. Finally, this dissertation presents an architectural design for CNN inference, integrating analog CIM macros and digital modules in TSMC 28nm process environment. We implement the algorithms mentioned above to validate that the proposed CIM engine can provide competitive energy efficiency.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-23T16:37:58Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-23T16:37:58Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 I 摘要 II Abstract IV 目次 VI 圖次 IX 表次 XIII Chapter 1 Introduction 1 1.1 BACKGROUND 1 1.1.1 THE GROWING GAP BETWEEN COMPUTING SUPPLY AND DEMAND 1 1.1.2 MEMORY WALL BOTTLENECK 3 1.2 COMPUTING-IN-MEMORY (CIM) ARCHITECTURE 5 1.3 ENERGY-AWARE MODEL COMPRESSION 9 1.3.1 DESIGN CHALLENGES 10 1.3.2 RESEARCH CONTRIBUTIONS 12 1.4 INPUT-AWARE APPROXIMATE COMPUTING 14 1.4.1 DESIGN CHALLENGES 14 1.4.2 RESEARCH CONTRIBUTIONS 17 1.5 SPARSITY-SCALING CIM ARCHITECTURE 18 1.5.1 DESIGN CHALLENGES 18 1.5.2 RESEARCH CONTRIBUTIONS 20 1.6 DISSERTATION ORGANIZATION 20 Chapter 2 Review of Related Works 22 2.1 PRINCIPLES OF COMPUTING-IN-MEMORY (CIM) 22 2.1.1 BASIC OPERATION OF CIM 22 2.1.2 CIM-BASED ACCELERATION FOR NEURAL NETWORK (NN) WORKLOAD 23 2.2 MODEL COMPRESSION FOR CIM 26 2.2.1 STRUCTURED PRUNING APPROACH 26 2.2.2 MIXED-PRECISION QUANTIZATION APPROACH 27 2.3 MITIGATING ADC OVERHEAD OF CIM SYSTEM 28 2.3.1 ZERO-SKIPPING APPROACH 28 2.3.2 APPROXIMATE COMPUTING APPROACH 29 2.4 SUMMARY 29 Chapter 3 Energy-Aware Model Compression 31 3.1 PROPOSED TRAINABLE ENERGY-AWARE PRUNING (T-EAP) 31 3.1.1 CHALLENGES OF CIM-AWARE STRUCTURED PRUNING 31 3.1.2 OBSERVATION OF LAYER-WISE ENERGY CONSUMPTION OF CIM 32 3.1.3 PRUNING MASKS BASED ON TRAINABLE THRESHOLDS 34 3.1.4 T-EAP SUMMARY 35 3.2 PROPOSED ENERGY-AWARE UNIFIED PRUNING-QUANTIZATION (E UPQ) FRAMEWORK 36 3.2.1 CHALLENGES OF ENERGY-AWARE PRUNING AND QUANTIZATION 36 3.2.2 OVERVIEW OF PROPOSED E-UPQ 38 3.2.3 GROUP-WISE UNIFIED PRUNING AND QUANTIZATION 41 3.2.4 REGULARIZATION WITH ENERGY-AWARE LOSS 43 3.3 ARCHITECTURAL SUPPORT FOR MIXED-PRECISION COMPUTATION BASED ON BIT-SLICE CIM MAPPING 44 3.3.1 SELECTIVE ADC POWER-ON/OFF MECHANISM BASED ON INTER-SUBARRAY BIT-SLICE MAPPING 45 3.3.2 OVERALL E-UPQ ARCHITECTURE 46 3.3.3 DESIGN OF BIT-WIDTH TABLE 48 3.3.4 SUMMARY OF E-UPQ 49 3.4 PERFORMANCE EVALUATION 50 3.4.1 SIMULATION SETUP 50 3.4.2 ENERGY-ACCURACY TRADE-OFF 50 3.4.3 ANALYSIS OF LAYER-WISE COMPRESSION POLICY 54 3.4.4 SENSITIVITY ANALYSIS WITH DIFFERENT BLOCK SIZES 57 3.4.5 ANALYSIS OF COMPRESSION RATIO 58 3.5 SUMMARY 58 Chapter 4 Input-aware Approximate Computing 60 4.1 PROPOSED RANGE-AWARE ROUNDING (RAR) 60 4.1.1 CHALLENGES OF APPROXIMATE COMPUTING FOR CIM 60 4.1.2 INFERENCE ACCURACIES WITH DIFFERENT WINDOW POSITIONS 62 4.1.3 THE PROCESSING FLOW OF RANGE-AWARE ROUNDING (RAR) 63 4.1.4 SUMMARY OF RAR 67 4.2 PROPOSED DYNAMIC BLOCK-FLOATING-POINT (BFP) ARITHMETIC FOR CIM ARCHITECTURE (BFP-CIM) 67 4.2.1 CHALLENGES OF INTEGRATING RAR INTO CIM ARCHITECTURE 67 4.2.2 EXPLOITING INPUT BIT-LEVEL SPARSITY 68 4.2.3 DYNAMIC BFP ARITHMETIC BASED ON DYNAMIC BLOCK FORMATION 72 4.2.4 DYNAMIC BFP SUMMARY 75 4.3 PROPOSED MAGNITUDE-AWARE EARLY TERMINATION (MET-CIM) 76 4.3.1 CHALLENGES OF DYNAMIC INFERENCE WITH CIM 76 4.3.2 MOTIVATIONAL EXPERIMENTS 78 4.3.3 THE PROCESSING FLOW OF MET-CIM 80 4.3.4 MET-CIM SUMMARY 82 4.4 PERFORMANCE EVALUATION 83 4.4.1 SIMULATION SETUP 83 4.4.2 PRECISION-SCALABLE QUANTIZATION 84 4.4.3 ENERGY EFFICIENCY AND LATENCY EVALUATION 86 4.4.4 ENERGY-ACCURACY SCALABILITY EVALUATION 92 4.4.5 VISUALIZATION OF THE EFFECTS OF DYNAMIC INFERENCE THRESHOLDS 93 4.5 SUMMARY 94 Chapter 5 Architecture Design and VLSI Implementation of Sparsity-Scaling CIM-based CNN Accelerator 95 5.1 ARCHITECTURE DESIGN OF CIM-BASED CNN ACCELERATOR 95 5.1.1 SYSTEM ARCHITECTURE 95 5.1.2 DATAFLOW 95 5.2 ALGORITHMIC MAPPING OF INPUT APPROXIMATION 97 5.2.1 RANGE-AWARE ROUNDING (RAR) 97 5.2.2 WORDLINE (WL) DECODER 98 5.3 ALGORITHMIC MAPPING OF MIXED-PRECISION WEIGHT MAPPING 102 5.3.1 COMPACT MAPPING WITH COLUMN GROUPING 102 5.3.2 HIERARCHICAL ACCUMULATOR (HA) 102 5.4 IMPLEMENTATION RESULT AND PERFORMANCE COMPARISONS 102 5.4.1 IMPLEMENTATION AND ARCHITECTURAL-LEVEL PERFORMANCE MODELING 102 5.4.2 COMPARISON WITH STATE-OF-THE-ART DESIGNS 106 5.4.3 EFFICIENCY-ACCURACY TRADE-OFF AND LAYER-WISE ANALYSIS 106 5.5 SUMMARY 108 Chapter 6 Conclusions and Future Works 109 6.1 DESIGN ACHIEVEMENTS 109 6.2 FUTURE WORKS 110 Bibliography 112	-
dc.language.iso	en	-
dc.title	適用於記憶體內運算之神經網路演算法與架構共同設計	zh_TW
dc.title	Computing-in-Memory-based Neural Network Algorithm and Architecture Co-Design	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	劉宗德;李進福;鄭湘筠;沈中安;魏一勤;張恩瑞	zh_TW
dc.contributor.oralexamcommittee	Tsung-Te Liu;Jin-Fu Li;Hung-Sheng Chang;Chung-An Shen;I-Chyn Wey;En-Jui Chang	en
dc.subject.keyword	深度神經網路,記憶體內運算,模型壓縮,近似運算,動態推論,	zh_TW
dc.subject.keyword	Deep neural network,Computing-in-memory,Model compression,Approximate computing,Dynamic inference,	en
dc.relation.page	117	-
dc.identifier.doi	10.6342/NTU202401980	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2024-07-22	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電子工程學研究所	-
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	7.32 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。