神經網路和記憶體內運算架構共同設計

陳炫均; Xuan-Jun Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96186

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲	zh_TW
dc.contributor.advisor	Chia-Lin Yang	en
dc.contributor.author	陳炫均	zh_TW
dc.contributor.author	Xuan-Jun Chen	en
dc.date.accessioned	2024-11-20T16:09:04Z	-
dc.date.available	2024-11-21	-
dc.date.copyright	2024-11-20	-
dc.date.issued	2024	-
dc.date.submitted	2024-11-07	-
dc.identifier.citation	[1] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of IEEE Symposium on Foundations of Computer Science (FOCS), 2006. [2] T. Andrulis, J. S. Emer, and V. Sze. Raella: Reforming the arithmetic for efficient, low-resolution, and low-loss analog pim: No retraining required! In Proceedings of International Symposium on Computer Architecture (ISCA), 2023. [3] I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese. 3d semantic parsing of large-scale indoor spaces. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [4] R. Balasubramonian, A. B. Kahng, N. Muralimanohar, A. Shafiee, and V. Srinivas. Cacti 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO), 2017. [5] K. Bhardwaj, G. Li, and R. Marculescu. How does topology influence gradient propagation and model performance of deep networks with densenet-type skip connections? In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [6] I. Chakraborty, M. F. Ali, D. E. Kim, A. Ankit, and K. Roy. Geniex: A generalized approach to emulating non-ideality in memristive xbars using neural networks. In Proceedings of Design Automation Conference (DAC), 2020. [7] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. Shapenet: An information-rich 3d model repository, 2015. [8] G. Charan, J. Hazra, K. Beckmann, X. Du, G. Krishnan, R. V. Joshi, N. C. Cady, and Y. Cao. Accurate inference with inaccurate rram devices: Statistical data, model transfer, and on-line adaptation. In Proceedings of Design Automation Conference (DAC), 2020. [9] C. Chen, X. Zou, H. Shao, Y. Li, and K. Li. Point cloud acceleration by exploiting geometric similarity. In Proceedings of International Symposium on Microarchitecture (MICRO), 2023. [10] W. Chen, K. Li, W. Lin, K. Hsu, P. Li, C. Yang, C. Xue, E. Yang, Y. Chen, Y. Chang, T. Hsu, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. A 65nm 1mb nonvolatile computing-in-memory reram macro with sub-16ns multiply-and-accumulate for binary dnn ai edge processors. In Proceedings of International Solid-State Circuits Conference (ISSCC), 2018. [11] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In Proceedings of International Symposium on Computer Architecture (ISCA), 2016. [12] N. Corporation. Jetson agx xavier series, 2022. [13] N. Corporation. Nvidia nsight systems, 2024. [14] Y. Cui, R. Chen, W. Chu, L. Chen, D. Tian, Y. Li, and D. Cao. Deep learning for image and point cloud fusion in autonomous driving: A review. IEEE Transactions on Intelligent Transportation Systems (TITS), 2021. [15] X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, K. Chen, Y. Tian, M. Yu, P. Vajda, and J. E. Gonzalez. Fbnetv3: Joint architecture-recipe search using predictor pretraining. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. [16] R. Dong, Z. Tan, M. Wu, L. Zhang, and K. Ma. Finding the task-optimal low-bit sub-distribution in deep neural networks. In Proceedings of International Conference on Machine Learning (ICML), 2022. [17] Y. Eldar, M. Lindenbaum, M. Porat, and Y. Y. Zeevi. The farthest point strategy for progressive image sampling. IEEE Transactions on Image Processing (TIP), 1997. [18] Y. Feng, G. Hammonds, Y. Gan, and Y. Zhu. Crescent: Taming memory irregularities for accelerating deep point cloud analytics. In Proceedings of International Symposium on Computer Architecture (ISCA), 2022. [19] Y. Feng, B. Tian, T. Xu, P. Whatmough, and Y. Zhu. Mesorasi: Architecture support for point cloud analytics via delayed-aggregation. In Proceedings of International Symposium on Microarchitecture (MICRO), 2020. [20] Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun. Single path one-shot neural architecture search with uniform sampling. In Proceedings of European Conference on Computer Vision (ECCV), 2020. [21] S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections for efficient neural network. 2015. [22] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams. Dot-product engine for neuromorphic computing: Programming 1t1m crossbar to accelerate matrix-vector multiplication. In Proceedings of Design Automation Conference (DAC), 2016. [23] C. Huang, N. Xu, J. Zeng, W. Wang, Y. Hu, L. Fang, D. Ma, and Y. Chen. Rescuing reram-based neural computing systems from device variation. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2022. [24] S. Huang, Z. Gojcic, J. Huang, A. Wieser, and K. Schindler. Dynamic 3d scene analysis by point cloud accumulation. In Proceedings of European Conference on Computer Vision (ECCV), 2022. [25] W. Jiang, Q. Lou, Z. Yan, L. Yang, J. Hu, X. S. Hu, and Y. Shi. Device-circuit-architecture co-exploration for computing-in-memory neural accelerators. IEEE Transactions on Computers (TC), 2020. [26] A. Kazemi, S. Sahay, A. Saxena, M. M. Sharifi, M. Niemier, and X. S. Hu. A flash-based multi-bit content-addressable memory with euclidean squared distance. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), 2021. [27] A. Kazemi, M. M. Sharifi, A. F. Laguna, F. Müller, R. Rajaei, R. Olivo, T. Kämpfe, M. Niemier, and X. S. Hu. In-memory nearest neighbor search with fefet multi-bit content-addressable memories. In Proceedings of Design, Automation & Test in Europe Conference (DATE), 2021. [28] A. Kazemi, M. M. Sharifi, A. F. Laguna, F. Müller, X. Yin, T. Kämpfe, M. Niemier, and X. S. Hu. Fefet multi-bit content-addressable memories for in-memory nearest neighbor search. IEEE Transactions on Computers (TC), 2022. [29] Y. Kim, Y. Zhang, and P. Li. A digital neuromorphic vlsi architecture with memristor crossbar synaptic array for machine learning. In Proceedings of International SOC Conference (SOCC), 2012. [30] L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Brändli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici. A 3.1 mw 8b 1.2 gs/s single-channel asynchronous sar adc with alternate comparators for enhanced speed in 32 nm digital soi cmos. IEEE Journal of Solid-State Circuits (JSSC), 2013. [31] S. Kundu, A. Sarah, V. Joshi, O. J. Omer, and S. Subramoney. Cimnet: Towards joint optimization for dnn architecture and configuration for compute-in-memory hardware. In Proceedings of tinyML Research Symposium (tinyML), 2024. [32] A. F. Laguna, M. Niemier, and X. S. Hu. Design of hardware-friendly memory enhanced neural networks. In Proceedings of Design, Automation & Test in Europe Conference (DATE), 2019. [33] A. F. Laguna, X. Yin, D. Reis, M. Niemier, and X. S. Hu.Ferroelectric fet based in-memory computing for few-shot learning. In Proceedings of Great Lakes Symposium on VLSI (GLSVLSI), 2019. [34] J. S. Lee, J. Yoon, and W. Y. Choi. In-memory nearest neighbor search with nano-electromechanical ternary content-addressable memory. Letters of IEEE Electron Device, 2022. [35] K. C. K. Lee, B. Zheng, H. Li, and W. Lee. Approaching the skyline in z order. In Proceedings of International Conference on Very Large Data Bases (VLDB), 2007. [36] N. Lee, T. Ajanthan, S. Gould, and P. H. S. Torr. A signal propagation perspective for pruning neural networks at initialization. In Proceedings of International Conference on Learning Representations (ICLR), 2020. [37] G. Li, S. K. Mandal, U. Y. Ogras, and R. Marculescu. Flash: Fast neural architecture search with hardware optimization. ACM Transactions on Embedded Computing Systems (TECS), 2021. [38] J. Lin, C. Wen, X. Hu, T. Tang, I. Lin, Y. Wang, and Y. Xie. Rescuing rram-based computing from static and dynamic faults. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2020. [39] W. Lin, H. Cheng, C. Yang, M. Lin, K., H. Hu, H. Chang, H. Li, M. Chang, Y. Tsou, and C. Nien. Dl-rsim: A reliability and deployment strategy simulation framework for reram-based cnn accelerators. ACM Transactions on Embedded Computing Systems (TECS), 2022. [40] Y. Lin, Z. Zhang, H. Tang, H. Wang, and S. Han.Pointacc: Efficient point cloud accelerator. In Proceedings of International Symposium on Microarchitecture (MICRO), 2021. [41] C. Liu, H. Chen, M. Imani, K. Ni, A. Kazemi, A. F. Laguna, M. Niemier, X. S. Hu, L. Zhao, C. Zhuo, and X. Yin. Cosime: Fefet based associative memory for in-memory cosine similarity search. In Proceedings of International Conference on Computer-Aided Design (ICCAD), 2022. [42] C. Liu, B. Zoph, J. Shlens, W. Hua, L. Li, F. Li, A. Yuille, J. Huang, and K. Murphy. Progressive neural architecture search. In Proceedings of European Conference on Computer Vision (ECCV), 2018. [43] Q. Liu, B. Gao, P. Yao, D. Wu, J. Chen, Y. Pang, W. Zhang, Y. Liao, C. Xue, W. Chen, J. Tang, Y. Wang, M. Chang, H. Qian, and H. Wu. A fully integrated analog reram based 78.4tops/w compute-in-memory chip with fully parallel mac computing. In Proceedings of International Solid-State Circuits Conference (ISSCC), 2020. [44] X. Liu, M. Mao, B. Liu, H. Li, Y. Chen, B. Li, Y. Wang, H. Jiang, M. Barnell, Q. Wu, and J. Yang. Reno: A high-efficient reconfigurable neuromorphic computing accelerator design. In Proceedings of Design Automation Conference (DAC), 2015. [45] Y. Liu, B. Fan, G. Meng, J. Lu, S. Xiang, and C. Pan. Densepoint: Learning densely contextual representation for efficient point cloud processing. In Proceedings of International Conference on Computer Vision (ICCV), 2019. [46] Z. Liu, A. Amini, S. Zhu, S. Karaman, S. Han, and D. L. Rus. Efficient and robust lidar-based end-to-end navigation. In Proceedings of International Conference on Robotics and Automation (ICRA), 2021. [47] S. Negi, I. Chakraborty, A. Ankit, and K. Roy. Nax: Neural architecture and memristive xbar based accelerator co-design. In Proceedings of Design Automation Conference (DAC), 2022. [48] K. Ni, X. Yin, A. F. Laguna, S. Joshi, S. Dünkel, M. Trentzsch, J. Müller, S. Beyer, M. Niemier, X. S. Hu, and S. Datta. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics, 2019. [49] Y. Oike, M. Ikeda, and K. Asada. A high-speed and low-voltage associative co-processor with exact hamming/manhattan-distance estimation using word-parallel and hierarchical search architecture. Journal of Solid-State Circuits (JSSC), 2004. [50] A. Oliver, S. Kang, B. C. Wünsche, and B. MacDonald. Using the kinect as a navigation sensor for mobile robotics. In Proceedings of Conference on Image and Vision Computing New Zealand (IVCNZ), 2012. [51] X. Peng, S. Huang, Y. Luo, X. Sun, and S. Yu. Dnn+neurosim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In Proceedings of International Electron Devices Meeting (IEDM), 2019. [52] L. Pentecost, A. Hankin, M. Donato, M. Hempstead, G. Wei, and D. Brooks. Nvm-explorer: A framework for cross-stack comparisons of embedded non-volatile memories. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA), 2022. [53] H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean. Efficient neural architecture search via parameters sharing. In Proceedings of International Conference on Machine Learning (ICML), 2018. [54] A. P. Placitelli and L. Gallo. Low-cost augmented reality systems via 3d point cloud sensors. In Proceedings of International Conference on Signal Image Technology & Internet-Based Systems (SITIS), 2011. [55] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature, 2015. [56] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [57] C. R. Qi, H. Su, K. Mo, and L. J. Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [58] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of International Conference on Neural Information Processing Systems (NIPS), 2017. [59] A. Ranjan, S. Jain, J. R. Stevens, D. Das, B. Kaul, and A. Raghunathan. X-mann: A crossbar based architecture for memory augmented neural networks. In Proceedings of Design Automation Conference (DAC), 2019. [60] D. Reis, A. F. Laguna, M. Niemier, and X. S. Hu.A fast and energy efficient computing-in-memory architecture for few-shot learning applications.In Proceedings of Design, Automation & Test in Europe Conference (DATE), 2020. [61] M. Saberi, R. Lotfi, K. Mafinezhad, and W. A. Serdijn. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation adcs. IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 2011. [62] A. Saulsbury, F. Pong, and A. Nowatzyk. Missing the memory wall: The case for processor/memory integration. In Proceedings of International Symposium on Computer Architecture (ISCA), 1996. [63] R. Sekimoto, A. Shikata, K. Yoshioka, T. Kuroda, and H. Ishikuro. A 0.5-v 5.2-fj/conversion-step full asynchronous sar adc with leakage power reduction down to 650 pw by boosted self-power gating in 40-nm cmos. IEEE Journal of Solid-State Circuits, 2013. [64] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of International Symposium on Computer Architecture (ISCA), 2016. [65] M. Simon, K. Amende, A. Kraus, J. Honer, T. Samann, H. Kaulbersch, S. Milz, and H. M. Gross. Complexer-yolo: Real-time 3d object detection and tracking on semantic point clouds. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [66] J. D. Stets, Y. Sun, W. Corning, and S. W. Greenwald. Visualization and labeling of point clouds in virtual reality. In Proceedings of SIGGRAPH Asia Conference (SA), 2017. [67] A. Stillmaker and B. Baas. Scaling equations for the accurate prediction of cmos device performance from 180 nm to 7 nm. Integration, 2017. [68] T. M. Taha, R. Hasan, C. Yakopcic, and M. R. McLean. Exploring the design space of specialized multicore neural processors. In Proceedings of International Joint Conference on Neural Networks (IJCNN), 2013. [69] M. Vinkler, J. Bittner, and V. Havran. Extended morton codes for high performance bounding volume hierarchy construction.In Proceedings of High Performance Graphics (HPG), 2017. [70] W. Wan, R. Kubendran, S. B. Eryilmaz, W. Zhang, Y. Liao, D. Wu, S. Deiss, B. Gao, P. Raina, S. Joshi, H. Wu, G. Cauwenberghs, and H. P. Wong. A 74 tmacs/w cmos-rram neurosynaptic core with dynamically reconfigurable dataflow and in-situ transposable weights for probabilistic graphical models. In Proceedings of International Solid-State Circuits Conference (ISSCC), 2020. [71] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019. [72] M. Whitty, S. Cossell, K. S. Dang, J. Guivant, and J. Katupitiya. Autonomous navigation using a real-time 3d point cloud. In Proceedings of Australasian Conference on Robotics and Automation (ACRA), 2010. [73] M. V. Wilkes. The memory wall and the cmos end-point. ACM SIGARCH Computer Architecture News, 1995. [74] S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 2009. [75] W. Wu, Z. Qi, and L. Fuxin.Pointconv: Deep convolutional networks on 3d point clouds. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [76] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. [77] W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 1995. [78] L. Xiao, Y. Bahri, J. S. Dickstein, S. S. Schoenholz, and J. Pennington. Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks. In Proceedings of International Conference on Machine Learning (ICML), 2018. [79] C. Xue, W. Chen, J. Liu, J. Li, W. Lin, W. Lin, J. Wang, W. Wei, T. Chang, T. Chang, T. Huang, H. Kao, S. Wei, Y. Chiu, C. Lee, C. Lo, Y. King, C. Lin, R. Liu, C. Hsieh, K. Tang, and M. Chang. A 1mb multibit reram computing-in-memory macro with 14.6ns parallel mac computing time for cnn based ai edge processors. In Proceedings of International Solid-State Circuits Conference (ISSCC), 2019. [80] C. Yakopcic and T. M. Taha. Energy efficient perceptron pattern recognition using segmented memristor crossbar arrays. In Proceedings of International Joint Conference on Neural Networks (IJCNN), 2013. [81] T. Yang, H. Cheng, C. Yang, I. Tseng, H. Hu, H. Chang, and H. Li. Sparse reram engine: Joint exploration of activation and weight sparsity in compressed neural networks. In Proceedings of International Symposium on Computer Architecture (ISCA), 2019. [82] X. Yang, S. Belakaria, B. K. Joardar, H. Yang, J. R. Doppa, P. P. Pande, K. Chakrabarty, and H. Li. Multi-objective optimization of reram crossbars for robust dnn inferencing under stochastic noise.In Proceedings of International Conference on Computer-Aided Design (ICCAD), 2021. [83] Z. Ying, S. Bhuyan, Y. Kang, Y. Zhang, M. T. Kandemir, and C. R. Das. Edgepc: Efficient deep learning analytics for point clouds on edge devices. In Proceedings of International Symposium on Computer Architecture (ISCA), 2023. [84] Z. Ying, S. Zhao, S. Bhuyan, C. S. Mishra, M. T. Kandemir, and C. R. Das. Pushing point cloud compression to the edge. In Proceedings of International Symposium on Microarchitecture (MICRO), 2022. [85] G. Yuan, P. Behnam, Z. Li, A. Shafiee, S. Lin, X. Ma, H. Liu, X. Qian, M. N. Bojnordi, Y. Wang, and C. Ding. Forms: Fine-grained polarized reram-based in-situ computation for mixed-signal dnn accelerator. In Proceedings of International Symposium on Computer Architecture (ISCA), 2021. [86] X. Yue, B. Wu, S. A. Seshia, K. Keutzer, and A. L. Sangiovanni-Vincentelli. A lidar point cloud generator: from a virtual world to autonomous driving. In Proceedings of International Conference on Multimedia Retrieval (ICMR), 2018. [87] D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski.Top-pim: Throughput-oriented programmable processing in memory. In Proceedings of International Conference on High-Performance Parallel and Distributed Computing (HPDC), 2014. [88] K. Zhang, M. Hao, J. Wang, X. Chen, Y. Leng, C. W. d. Silva, and C. Fu. Linked dynamic graph cnn: Learning through point cloud by linking hierarchical features. In Proceedings of International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2021. [89] X. Zhang, H. Lu, C. Hao, J. Li, B. Cheng, Y. Li, K. Rupnow, J. Xiong, T. Huang, H. Shi, W. Hwu, and D. Chen. Skynet: a hardware-efficient method for object detection and tracking on embedded systems. In Proceedings of Machine Learning and Systems (MLSys), 2020. [90] Q. Zheng, X. Li, Z. Wang, G. Sun, Y. Cai, R. Huang, Y. Chen, and H. Li. Mobilattice: A depth-wise dcnn accelerator with hybrid digital/analog nonvolatile processing-in-memory block. In Proceedings of International Conference on Computer-Aided Design (ICCAD), 2020. [91] Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [92] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96186	-
dc.description.abstract	記憶體內運算架構已證明其有效解決記憶體牆瓶頸的能力，神經網路結構搜索致力於自動化機器學習模型的設計。然而，若欲整合記憶體內運算架構至神經網路結構搜索中，將出現重大的挑戰。在記憶體內運算加速器上部署神經網路，會引入與硬體相關的因素，導致大量額外的模擬負擔。本論文透過結合量化和裝置感知的準確率預測器，介紹了一種超高效的記憶體內運算和神經網路的架構搜索框架。此外，我們邁出了第一步，在定量分析資訊如何於記憶體內運算的神經網路加速器中傳播，以及額外的記憶體內運算因素如何影響該訊息傳播。另一方面，本論文介紹了第一個利用記憶體內運算的最佳化機會，來解決記憶體效率低下問題的點雲深度學習分析加速器，而後我們也基於所提出的記憶體內運算架構，利用神經網路結構搜索的技術來探索最佳的點雲模型。	zh_TW
dc.description.abstract	Computing-in-memory (CIM) architecture has demonstrated its ability to address the memory wall bottleneck effectively. Neural architecture search (NAS) endeavors to design machine learning models automatically. However, integrating CIM into NAS presents a significant challenge. Deploying neural networks on CIM accelerators introduces hardware-related factors, resulting in substantial additional simulation overhead. This dissertation introduces an ultra-efficient CIM-NAS framework by incorporating a quantization and device aware accuracy predictor. In addition, we take the first step in quantitative analysis of how information propagates in CIM neural accelerators and how additional CIM factors influence that information propagation. On the other hand, this dissertation introduces the first deep point cloud (PC) analytics, an emerging machine learning application, accelerator that leverages CIM optimization opportunities to address memory inefficiency. We also explore optimal PC models based on the proposed CIM architecture using NAS techniques.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-11-20T16:09:04Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-11-20T16:09:04Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee . . . i Acknowledgements . . . iii 摘要 . . . v Abstract . . . vii Contents . . . ix List of Figures . . . xiii List of Tables . . . xvii Denotation . . . xix Chapter 1 Introduction . . . 1 Chapter 2 Background . . . 5 2.1 NAS . . . 5 2.2 CIM DNN Accelerator . . . 6 2.3 LDI . . . 7 2.4 Deep PC Analytics . . . 8 Chapter 3 Joint Search for DNN and CIM Architecture . . . 11 3.1 Challenges for NAS on CIM Architecture . . . 12 3.2 Opportunity of Employing Predictor-based NAS . . . 14 3.3 CIMNet: Co-Search Framework . . . 16 3.3.1 Overview . . . 17 3.3.2 Search Space Reduction . . . 18 3.3.2.1 Cell Resolution Aware Quantization . . . 18 3.3.2.2 Convolution Layer Choice: Standard vs. Depthwise . . . 19 3.3.2.3 Layer-Customized Weight Mapping Policy . . . 21 3.3.3 Method . . . 22 3.3.3.1 Search Space . . . 22 3.3.3.2 Quantization and Device Aware Accuracy Predictor . . . 23 3.3.3.3 Latency/Energy Estimation . . . 25 3.3.3.4 Search Engine . . . 26 3.4 Experiments . . . 27 3.4.1 Experimental Setup . . . 27 3.4.2 Searched Results . . . 28 3.4.3 Analysis . . . 28 3.4.3.1 Accuracy . . . 28 3.4.3.2 Latency . . . 29 3.4.3.3 Energy Consumption . . . 30 3.4.4 Comparison against Other CIM-NAS Works . . . 31 3.4.4.1 Quality . . . 31 3.4.4.2 Efficiency . . . 32 Chapter 4 Unified Agile Accuracy Assessment in CIM Neural Accelerators . . . 33 4.1 Method . . . 37 4.1.1 Ideal Layerwise Jacobian . . . 38 4.1.2 Layerwise Jacobian for Error . . . 39 4.1.3 Factors in Error . . . 42 4.1.3.1 Weight Precision . . . 43 4.1.3.2 Analog-to-Digital Converter . . . 44 4.1.3.3 Cell Variation . . . 44 4.1.4 Overall Layerwise Jacobian . . . 46 4.2 Experiments . . . 49 4.2.1 Experimental Setup . . . 49 4.2.2 Results . . . 49 4.2.3 Discussion . . . 50 Chapter 5 A CIM Architecture for Accelerating Deep PC Analytics . . . 51 5.1 Computation Optimization of PointCIM . . . 55 5.1.1 Challenges for In-Memory PC Network Inference . . . 55 5.1.2 Base+Offset Mapping . . . 57 5.1.3 Early-Stopping Optimization for Bit-Serial Computation . . . 59 5.2 PointCIM Hardware Architecture . . . 62 5.2.1 Signed Arithmetic . . . 63 5.2.2 Margin Calculation . . . 64 5.2.3 Point Feature Read . . . 65 5.2.4 Morton Code-Based Crossbar Interleaving . . . 65 5.2.5 Pipelined Execution . . . 66 5.2.6 Max/Top-K Logic . . . 67 5.3 Evaluation . . . 68 5.3.1 Methodology . . . 68 5.3.1.1 Workloads . . . 68 5.3.1.2 Hardware Configuration . . . 68 5.3.1.3 Accelerator Synthesis and Simulation . . . 69 5.3.1.4 Comparison Baselines . . . 70 5.3.2 Overall Results . . . 71 5.3.2.1 Accuracy . . . 71 5.3.2.2 Speedup . . . 71 5.3.2.3 Energy Consumption . . . 72 5.3.2.4 Area . . . 72 5.3.3 Optimization Effects . . . 73 5.3.3.1 Early Stopping for Bit-Serial Computation . . . 73 5.3.3.2 Morton Code-Based Crossbar Interleaving . . . 73 5.3.4 Sensitivity Studies . . . 74 5.3.4.1 DAC Resolution . . . 74 5.3.4.2 Number of CUs . . . 75 5.3.5 Other Results . . . 76 5.3.5.1 Comparison with E2-MCAM . . . 76 5.3.5.2 End-to-End System Efficiency . . . 76 5.3.5.3 PC Model Exploration . . . 77 Chapter 6 Conclusion . . . 93 References . . . 95	-
dc.language.iso	en	-
dc.subject	點雲深度學習分析	zh_TW
dc.subject	量化	zh_TW
dc.subject	神經網路結構搜索	zh_TW
dc.subject	記憶體內運算加速器	zh_TW
dc.subject	Deep Point Cloud Analytics	en
dc.subject	Computing-in-Memory Accelerator	en
dc.subject	Neural Architecture Search	en
dc.subject	Quantization	en
dc.title	神經網路和記憶體內運算架構共同設計	zh_TW
dc.title	Neural Network and Computing-in-Memory Architecture Co-Design	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	博士	-
dc.contributor.oralexamcommittee	郭大維;胡璧合;張原豪;鄭湘筠	zh_TW
dc.contributor.oralexamcommittee	Tei-Wei Kuo;Pi-Ho Hu;Yuan-Hao Chang;Hsiang-Yun Cheng	en
dc.subject.keyword	記憶體內運算加速器,神經網路結構搜索,量化,點雲深度學習分析,	zh_TW
dc.subject.keyword	Computing-in-Memory Accelerator,Neural Architecture Search,Quantization,Deep Point Cloud Analytics,	en
dc.relation.page	108	-
dc.identifier.doi	10.6342/NTU202404552	-
dc.rights.note	未授權	-
dc.date.accepted	2024-11-07	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	56.49 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。