卷積神經網路之深度學習加速器架構設計

邵長威; Chang-Wei Shao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92095

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	方煒	zh_TW
dc.contributor.advisor	Wei Fang	en
dc.contributor.author	邵長威	zh_TW
dc.contributor.author	Chang-Wei Shao	en
dc.date.accessioned	2024-03-05T16:15:54Z	-
dc.date.available	2024-03-06	-
dc.date.copyright	2024-03-05	-
dc.date.issued	2024	-
dc.date.submitted	2024-02-16	-
dc.identifier.citation	Albert Chun Chen Liu, Oscar Ming Kin Law. Jul 2020. Deep Learning-Hardware Design, 3-2 to 3-8. Chuan Hwa Book Co., LTD. Albert Chun Chen Liu, Oscar Ming Kin Law. Jul 2020. Deep Learning-Hardware Design, 3-35. Chuan Hwa Book Co., LTD. Alex Krizhevsky et al. Jan 2012. ImageNet Classification with Deep Convolutional Neural Networks, NIPS. Alfredo Canziani, Eugenio Culurciello, Adam Paszke. Apr 2017. An Analysis of Deep Neural Network Models for Practical Applications. arXiv:1605.07678v4. Anantha P. Chandrakasan, S. Sheng, R. W. Brodersen. Apr. 1992. Low-power CMOS digital design. IEEEJ. Solid‐State Circuits, vol. 27, no. 4, pp. 473‐484. Andres Rodriguez et al. Jan 2018. Lower Numerical Precision Deep Learning Inference and Training. p.6 and p.18. https://www.intel.com/content/dam/develop/external/us/en/documents/lower- numerical-precision-deep-learning-jan2018-754765.pdf. Accessed: Feb 2022. Andrew G. Howard et al. Apr 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861. Chia-Hsiang Yang. 2021. VLSI Signal Processing. Lecture. Graduate Institute of Electronics Engineering, National Taiwan University. Christian Szegedy et al. Sep 2014. Going deeper with convolutions. arXiv: 1409.4842v1. Chuck Moore, April 27, 2011. Data Processing in Exascale-class Computing Systems. The Salishan Conference on High Speed Computing. David L. Mulnix. 2017. Intel® Xeon® Processor Scalable Family Technical Overview. https://www.intel.com/content/www/us/en/developer/articles/technical/xeon- processor-scalable-family-technical-overview.html. Accessed: Feb 2022. David Silver et al. Jan 2016. Mastering the game of Go with deep neural networks and tree search, Nature 529, 484–489 (2016). https://doi.org/10.1038/nature16961. Dejan Markovic. 2006. A Power/Area Optimal Approach to VLSI Signal Processing. Ph.D. Thesis, University of California, Berkeley. Dejan Markovićetal. Aug. 2004, “Methods for True Energy‐Performance Optimization” IEEE Journal of Solid‐State Circuits, vol. 39, no. 8, pp. 1282‐1293. Gao Huang et al. Jan 2018. Densely Connected Convolutional Networks. arXiv:1608.06993. Google Cloud TPU. 2022. Introduction to Cloud TPU. https://cloud.google.com/tpu/docs/intro-to-tpu. Accessed: Sep 2022. Google Cloud TPU. 2022. System Architecture. https://cloud.google.com/tpu/docs/system-architecture-tpu-vm. Accessed: Sep 2022. Hung-Yi Lee. Machine Learning. Course (2020, Spring). Electrical Engineering of National Taiwan University. Hung-Yi Lee. Machine Learning. Course, ML2021 week3 3/12 Convolution Neural Network (CNN). Electrical Engineering of National Taiwan University. Jiin Lai. Sep 2021. Application Acceleration with High-Level Synthesis. Lecture. hls-for-software-define-hardware.pdf. p.8. Joel Emer et al., 2017. Hardware Architectures for Deep Neural Networks, ISCA Tutorial p.271-p.282, 2017. https://eems.mit.edu/wp- content/uploads/2017/06/ISCA-2017-Hardware-Architectures-for-DNN- Tutorial.pdf. Accessed: Oct 2022. Jorge Albericio et al. June 2016. Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. ISCA 2016. Kaiming He et al. Dec 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385. Karen Simonyan, Andrew Zisserman. Apr 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556. Keras. 2023. Keras Applications. https://keras.io/api/applications/. Accessed: Jan 2023. Lauro Rizzatti. November 23, 2018. A Breakthrough in FPGA-Based Deep Learning Inference. Li Du, Yuan Du et al. Aug 2017. A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things. IEEE DOI: 10.1109/TCSI.2017.2735490. Liang-Gee Chen. Oct 2021. Course: Computing Architecture and System design for AI Machine Learning, lecture Oct 14 – 28. Lisa Su. Aug 2019. Delivering the Future of High-Performance Computing. Hot Chips: A Symposium on High Performance Chips. HC31-K1, 7:26-9:50. Mark Horowitz, 2014. Computing’s Energy Problem (and what we can do about it), IEEE ISSCC 2014. Mark Sandler et al. Mar 2019. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv:1801.04381. Microsoft Azure Machine Learning, Jun 2023. How to Deploy FPGA Web Service, Figure “Silicon alternatives”. https://docs.microsoft.com/en- us/azure/machine-learning/how-to-deploy-fpga-web-service#fpgas-vs-cpu-gpu- and-asic .Accessed: Jan 2022. Mingxing Tan, Quoc V. Le. Sep 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946. Mingyu Gao et al. April, 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. Stanford University. Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2017. Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Unit. Google, Inc. CA USA. ISCA 44th. arXiv: 1704.04760. NVIDIA Developer. 2018. Hardware Architectural Specification. http://nvdla.org/hw/v1/hwarch.html#hardware-architectural-specification. Accessed: Oct 2022. NVIDIA Developer. 2018. Winograd Convolution Mode. http://nvdla.org/hw/v1/hwarch.html#winograd. Accessed: Oct 2022. NVIDIA Whitepaper. 2016. NVIDIA Tesla P100. p7, p10 and p11. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture- whitepaper.pdf. Accessed: Oct 2022. NVIDIA Whitepaper. 2018. NVIDIA TURING GPU ARCHITECTURE. p58- p60. https://images.nvidia.com/aem-dam/en-zz/Solutions/design- visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture- Whitepaper.pdf. Accessed: Oct 2022. NVIDIA Whitepaper. 2021. NVIDIA AMPERE GA102 GPU ARCHITECTURE Second-Generation RTX. p12-p17. https://www.nvidia.com/content/PDF/nvidia- ampere-ga-102-gpu-architecture-whitepaper-v2.pdf. Accessed: Oct 2022. Olga Russakovsky et al. Jan 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision. Peter J. Denning. July 2005. The Locality Principle. Communications of the ACM, Volume 48, Issue 7, (2005), Pages 19–24. Shibo Wang, Pankaj Kanwar. Aug 2019. BFloat16: The secret to high performance on Cloud TPUs. Google TPU. https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret- to-high-performance-on-cloud-tpus. Accessed: Feb 2021. TensorFlow. 2023. Module tf.keras.applications. https://www.tensorflow.org/api_docs/python/tf/keras/applications. Accessed: Jan 2023. Tien-Ju Yang et al. 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. IEEE CVPR. Ting-Ting Hwang. Computer Architecture Lecture. Feb 2011. (15A-17B) Single- Cycle Processor - 21B Pipelining. Department of Computer Science, National Tsing Hua University. Ting-Ting Hwang. Computer Architecture Lecture. Oct 2021. (18AR-21BR) Pipelining. Department of Computer Science, National Tsing Hua University. Ting-Ting Hwang. Computer Architecture Lecture. Oct 2021. (22AR-26BR) Memory. Department of Computer Science, National Tsing Hua University. Ting-Yang Chen. 2023. An 8.1-to-353 TOPS/W Energy-Aware Deep-Learning Accelerator Supporting Dynamic Neural Networks. Master Thesis. Graduate Institute of Electronics Engineering National Taiwan University. Vivienne Sze et al. Dec 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. p2308. Vivienne Sze. 2019. Efficient Processing of Deep Neural Networks: from Algorithms to Hardware Architectures. https://eyeriss.mit.edu/2019_neurips_tutorial.pdf. Wikipedia. MIPS architecture processors. Oct 2020. https://en.wikipedia.org/wiki/MIPS_architecture_processors. Accessed: Feb 2021. Wikipedia. Systolic array. Oct 2020. https://en.wikipedia.org/wiki/Systolic_array. This URL is provided by Google Cloud. Accessed: Feb 2021. Yu-Hsin Chen et al., 2016. An Energy-Efficient Reconfigurable Accelerator for Deep Learning Convolution Networks. IEEE ISSCC 2016. Yu-Hsin Chen. Aug 2018. Design for Highly Flexible and Energy-Efficient Deep Neural Network Accelerators. https://www.youtube.com/watch?v=brhOo-_7NS4 . Accessed: Oct 2022. Yu-Hsin Chen. June 2018. Architecture Design for Highly Flexible and Energy- Efficient Deep Neural Network Accelerators.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92095	-
dc.description.abstract	由於深度卷積神經網路 (DCNN) 在人工智慧 (AI) 的應用中對計算力的需求指數成長，對於專用硬體的架構設計來說是一大挑戰。小型系統的功能支援上通常不齊全，而大型晶片變得過於複雜。在不同功能的運算上，同樣仰賴太多模組的切換，導致過多資料搬移，有高延遲與部分模組使用率不高的問題。這項研究提出了一種基於統計方法的改進結構，包括參數分析、演算法劃分後映射到硬體，以在暫存器傳輸級 (RTL) 建構出一個管線的硬體架構。研究結果顯示，本論文的架構在單核處理元件 (PE) 上比起其他評比 (e.g, MIT Eyeriss, UCLA DCNN Acc., Google TPU) 的設計，有較佳的硬體重用率、硬體共用性且支援最多功能，也減少了激活函數時的延遲，並確保了所有功能的運算都在核心內完成，且擁有相同的週波時間。	zh_TW
dc.description.abstract	The demand for computing power in deep convolutional neural networks (DCNNs) for artificial intelligence (AI) applications is exponentially increasing. Designing application-specific hardware architectures poses significant challenges. Smaller systems typically lack essential functionality, while large-scale chips become overly complex. Dynamic switching of modules with different functions results in excessive data movement, leading to high latency and underutilization of certain modules. This research proposes a statistical-based improved structure comprising parameter analysis, algorithm partitioning, and hardware mapping to construct a pipeline-based architecture in Register Transfer Level (RTL). The research results indicate that this architecture demonstrates improved hardware reuse, sharing, and functionality on a single processing element (PE) compared to the benchmark. (e.g, MIT Eyeriss, UCLA DCNN Acc., Google TPU). It reduces activation function latency and ensures that all functions operate in-core with consistent cycle times.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-05T16:15:54Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-03-05T16:15:54Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i Acknowledgements ii 中文摘要 iv Abstract v List of Figures viii List of Tables x Chapter 1 Introduction 1 1.1 Background 1 1.2 Research Scope 3 1.3 Research Objectives 4 1.4 Contributions of This Research 6 Chapter 2 Related Work 9 2.1 Exploration of AI and CNN Domains 9 2.2 Comparison of CPU, GPU, ASIC, FPGA in CNN 10 2.3 Optimizing ASIC Architectures in AI Computation 12 2.4 Benchmarks for Deep Learning Accelerator 16 Chapter 3 Research Method and Design 17 3.1 Research Gap 17 3.2 Deep Learning Accelerator Architecture Design Flow 17 3.3 Popular DCNN Models Analysis 19 3.4 Layers Parameters Statistics 25 3.5 Function and Specification Definition 26 3.6 Algorithm Design and Segmentation 30 3.7 Divide-and-conquer 35 3.8 System Flow Design 38 3.9 Instruction Set Design 41 Chapter 4 Implementation 45 4.1 Architecture Mapping to RTL 45 Chapter 5 Results and Discussion 49 5.1 Operations Cycle Time Analysis 49 5.2 Performance Analysis and Simulation 55 5.3 Data Reuse Rate 55 5.4 Design Comparison 57 5.5 Data Path Comparison 60 5.6 Benchmark 63 Chapter 6 Conclusions and Future Work 67 6.1 Pros and Cons 67 6.2 Research Contributions 69 6.3 Research Limitations 69 6.4 Future Work 70 Reference 73	-
dc.language.iso	en	-
dc.title	卷積神經網路之深度學習加速器架構設計	zh_TW
dc.title	Architecture Design of Deep Learning Accelerator for CNN	en
dc.type	Thesis	-
dc.date.schoolyear	112-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳倩瑜;曾國師	zh_TW
dc.contributor.oralexamcommittee	Chien-yu Chen;Kuo-Shih Tseng	en
dc.subject.keyword	計算機架構,深度學習加速器,深度卷積神經網路,人工智慧,演算法,	zh_TW
dc.subject.keyword	Computer Architecture,Deep Learning Accelerator (DLA),Deep Convolutional Neural Networks (DCNN),Artificial Intelligence (AI),Algorithms,	en
dc.relation.page	79	-
dc.identifier.doi	10.6342/NTU202400510	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-02-17	-
dc.contributor.author-college	生物資源暨農學院	-
dc.contributor.author-dept	生物機電工程學系	-
顯示於系所單位：	生物機電工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-1.pdf	3.76 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。