分析神經網路推論於Nvidia Jetson AGX Xavier之能源並以自適應頻率調節優化能源效率

洪崗竣; Kang-Chun Hung

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87289

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲	zh_TW
dc.contributor.advisor	Chia-Lin Yang	en
dc.contributor.author	洪崗竣	zh_TW
dc.contributor.author	Kang-Chun Hung	en
dc.date.accessioned	2023-05-18T16:51:47Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-05-11	-
dc.date.issued	2023	-
dc.date.submitted	2023-02-18	-
dc.identifier.citation	S. Barrachina, A. Castelló, M. Catalán, M. F. Dolz, and J. I. Mestre. Pydtnn: A user-friendly and extensible framework for distributed deep learning. J. Supercomput., 77(9):9971–9987, 2021. S. Barrachina, A. Castelló, M. Catalán, M. F. Dolz, and J. I. Mestre. Python distributed training of neural networks. https://github.com/hpca-uji/PyDTNN/, 2022. J. Brownlee. A gentle introduction to pooling layers for convolutional neural networks. https:// machinelearningmastery.com/ pooling-layers-for-convolutional- neural-networks/, 2019. J. Brownlee. How do convolutional layers work in deep learning neural networks? https:// machinelearningmastery.com/ convolutional-layers-for-deep- learning-neural-networks/, 2019. E. Cai, D.-C. Juan, D. Stamoulis, and D. Marculescu. Neuralpower: Predict and deploy energy-efficient convolutional neural networks. arXiv preprint arXiv:1710.05420, 2017. E. Calore, A. Gabbana, S. F. Schifano, and R. Tripiccione. Evaluation of dvfs techniques on modern hpc processors and accelerators for energy-aware applications. Concurrency and Computation: Practice and Experience, 2017. A. Castelló, S. Barrachina, M. F. Dolz, E. S. Quintana-Ortí, P. S. Juan, and A. E. Tomás. High performance and energy efficient inference for deep learning on multicore arm processors using general optimization techniques and blis. Journal of Systems Architecture, 125:102459, 2022. T. Contributors. Models and pre-trained weights. https://pytorch.org/vision/stable/ models.html, 2017. R. Desislavov, F. Martínez-Plumed, and J. Hernández-Orallo. Compute and energy consumption trends in deep learning inference. arXiv preprint arXiv:2109.05472, 2021. M.F.Dolz,S.Barrachina,H.Martínez,A.Castelló,A.Maciá,G.Fabregat,andA.E. Tomás. Performance–energy trade-offs of deep learning convolution algorithms on arm processors. The Journal of Supercomputing, 2023. K. Fan, B. Cosenza, and B. Juurlink. Predictable gpus frequency scaling for energy and performance. In Proceedings of the 48th International Conference on Parallel Processing, ICPP ’19, 2019. D.Franklin.Nvidia jetson agx xavier delivers 32 tera ops for new era of ai in robotics. https:// developer.nvidia.com/ blog/ nvidia-jetson-agx-xavier-32-teraops-ai-robotics/, 2018. W. A. Hanafy, T. Molom-Ochir, and R. Shenoy. Design considerations for energy- efficient inference on edge devices. In Proceedings of the Twelfth ACM International Conference on Future Energy Systems, e-Energy ’21, page 302–308, 2021. K.He,X.Zhang,S.Ren,andJ.Sun.Deep residual learning for image recognition.In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016. S. Ilager, R. Muralidhar, K. Rammohanrao, and R. Buyya. A data-driven frequency scaling approach for deadline-aware energy efficient scheduling on graphics pro- cessing units (gpus). In 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pages 579–588, 2020. ImageNet. Imagenet large scale visual recognition challenge (ilsvrc). https:// www.image-net.org/challenges/LSVRC/index.php, 2020. W. Jiang, H. Yu, J. Zhang, J. Wu, S. Luo, and Y. Ha. Optimizing energy efficiency of cnn-based object detection with dynamic voltage and frequency scaling. Journal of Semiconductors, 41(2):022406, 2020. kangalow. Nvpmodel–nvidia jetson agx xavier developer kit. https://jetson-hacks.com/2018/10/07/nvpmodel-nvidia-jetson-agx-xavier-developer-kit/, 2018. B. Kim, S. Lee, A. R. Trivedi, and W. J. Song. Energy-efficient acceleration of deep neural networks on realtime-constrained embedded edge devices. IEEE Access, 8:216259–216270, 2020. Y. G. Kim and C.-J. Wu. Autoscale: Energy efficiency optimization for stochastic edge inference using reinforcement learning. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 1082–1096, 2020. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, 2012. S. Liu and A. Karanth. Dynamic voltage and frequency scaling to improve energy-efficiency of hardware accelerators. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), pages 232–241, 2021. M.-A. Maleki, A. Nabipour-Meybodi, M. Kamal, A. Afzali-Kusha, and M. Pedram. An energy-efficient inference method in convolutional neural networks based on dynamic adjustment of the pruning level. ACM Transactions on Design Automation of Electronic Systems, 26(6), 2021. F.Mendes, P.Tomás, and N.Roma.Decoupling gpgpu voltage-frequency scaling for deep-learning applications. Journal of Parallel and Distributed Computing, 165:32– 51, 2022. B. Moons, R. Uytterhoeven, W. Dehaene, and M. Verhelst. Dvafs: Trading computational accuracy for energy through dynamic-voltage-accuracy-frequency-scaling. In Design, Automation & Test in Europe Conference & Exhibition, pages 488–493, 2017. S. M. Nabavinejad, S. Reda, and M. Ebrahimi. Coordinated batching and dvfs for dnn inference on gpu accelerators. IEEE Transactions on Parallel and Distributed Systems, 2022. NVIDIA. Nvidia jetson xavier system-on-module. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/, 2014. NVIDIA. Jetson agx xavier thermal design guide. https://www.diamondsystems.com/files/binaries/ Jetson_AGX_Xavier_Thermal_Design_Guide_v1.0.pdf, 2018. NVIDIA. Jetson developer kits and modules. https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/SD/Clocks.html#configuring-gpu-clocks, 2022. NVIDIA. Nvidia jetson xavier - performance tuning by setting cpu, gpu, and fre- quency values manually. https://developer.ridgerun.com/wiki/index.php/Xavier/JetPack_5.0.2/Performance_Tuning/Set_Values_Manually, 2022. NVIDIA. Advanced ai embedded systems. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/, 2023. NVIDIA. Jetpack 4.5 archive. https://developer.nvidia.com/embedded/jetpack-sdk-45-archive, 2023. NVIDIA. Jetson agx xavier series. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier/, 2023. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2015. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2015. C.Szegedy, V.Vanhoucke, S.Ioffe, J.Shlens, and Z.Wojna. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2818–2826, 2016. M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le. Mnasnet: Platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626, 2018. M. Tan and Q.V.Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019. E. Tang, S. Minakova, and T. Stefanov. Energy-efficient and high-throughput cnn inference on embedded cpus-gpus mpsocs. In Embedded Computer Systems: Architectures, Modeling, and Simulation: 21st International Conference(SAMOS), page 127–143, 2021. Z. Tang, Y. Wang, Q. Wang, and X. Chu. The impact of gpu dvfs on the energy and performance of deep learning: An empirical study. In Proceedings of the Tenth ACM International Conference on Future Energy Systems, e-Energy ’19, page 315–325, 2019. F. G. Van Zee and R. A. van de Geijn. Blis: A framework for rapidly instantiating blas functionality. ACM Transactions on Mathematical Software, 41(3), 2015. T.-J. Yang, Y.-H. Chen, and V. Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6071–6079, 2017. zivzhong. [day 09] cnn 的前行之末: Fully connected layer & loss function. https:// ithelp.ithome.com.tw/articles/10220782, 2019. 内核工匠. Linux devfreq framework 剖析. https://blog.csdn.net/feelabclihu/article/ details/105592301, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87289	-
dc.description.abstract	近年來，人工智慧在科技發展中扮演重要的角色，而在這其中最為重要的是神經網路。許多網站以及機器都會倚賴神經網路來增進其功能，為人類帶更多的方便。然而，因為神經網路的深度，造成劇增的計算量以及過多的能量耗損。在神經網路推論的階段，GPU計算以及DRAM資料搬移在Nvidia Jetson AGX Xavier上佔據了約莫67%的總能源耗損。因此，這篇論文將以調節GPU以及DRAM的頻率來達到能源效率。此篇論文中，我們使用多層感知器來當作我們「頻率預測器」的模型架構去預測最節能的頻率設定。再者，我們在推論目標神經網路前（offline），對其做層層分析。依據不同種的網路層，我們提供訓練完成的「頻率預測器」，頻率預測器會根據網路層不同的設定預測出最有能源效率的頻率。之後，我們將所有網路層中預測的結果，取最高的頻率當作上限值，取最低的當作下限值，最後作為目標神經網路推論時，GPU以及DRAM可以浮動的區間。在Nvidia Jetson AGX Xavier上，此機制在總能量耗損中達到平均20.0%的減少，其中，GPU達到平均29.8%，DRAM達到平均23.7%的能量耗損減少。	zh_TW
dc.description.abstract	Artificial intelligence has played an important role in technology development in recent years, and the essential items in this path are neural networks. Recently, more and more devices and websites have counted on neural networks to improve their functionalities, bringing much more convenience to people in this era. However, thanks to the deeper depth of the recent neural network, the model inference will have massive computations and excessive energy consumption. During the inference stage, GPU computation and DRAM memory access occupy approximately 67% of overall inference energy consumption on Nvidia Jetson AGX Xavier. Therefore, we focus on the frequency modulations of GPU and DRAM in this work. Moreover, in this work, we utilized the multi-layer perceptron models (MLP models) as the frequency setting predictors to predict the most energy-efficient frequency setting. First, we offline layer-wisely analyze the target model, providing frequency setting predictors to determine frequency settings that can achieve maximum energy efficiency. Afterward, among all predicted frequency settings, we select the maximum frequency setting as the upper bound and the minimum frequency setting as the lower bound. Eventually, we set this limitation as the range in which GPU and DRAM frequency can fluctuate during the target model inference stage. This work achieves an average 20.0% overall energy consumption reduction and an average 29.8% and 23.7% energy consumption reduction for GPU and DRAM, respectively, on Nvidia Jetson AGX Xavier during the target model inference stage.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-18T16:51:47Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-05-18T16:51:47Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements I 摘要 III Abstract V Contents VII List of Figures XI List of Tables XIII Chapter 1 Introduction 1 Chapter 2 Background 5 2.1 Convolution Neural Network 5 2.2 Training and Inference 9 2.3 Nvidia Jetson AGX Xavier 10 2.3.1 Volta GPU 12 2.3.2 Dynamic Voltage and Frequency Scaling (DVFS) Policy 12 2.3.3 JetPack SDK 12 2.3.4 Power Monitor 15 Chapter 3 Related Works 17 3.1 Model Inference Considering Power or Energy 17 3.2 Dynamic Voltage and Frequency Scaling Techniques 18 Chapter 4 Motivation 21 4.1 Why Should We Layer-Wisely Analyze the Target Model? 21 4.2 Why Not Fix the Frequency? 22 4.3 Shrink the Frequency Range 24 4.4 Feature Selection 25 4.4.1 Convolution Layer 28 4.4.2 Fully-Connected Layer 33 4.4.3 Pooling Layer 35 4.4.4 Hyperparameters 39 Chapter 5 Methodology 41 5.1 Overview 41 5.2 Workflow 41 5.3 Frequency Setting Predictor 42 5.3.1 Predictor Methodology 42 5.4 Predictor Models Performance Discussion 44 5.5 Dataset 44 5.6 Train 47 5.7 Layer-level Framework 48 Chapter 6 Evaluations 49 6.1 Environment Setup 49 6.2 Experiments 50 6.2.1 Energy Consumption 50 6.2.2 Latency 51 6.3 Distance from the Optimal Solution 54 6.4 Predicted Frequency Settings 58 6.5 Compare to Related Work 61 6.6 Overall Performance 61 Chapter 7 Conclusion 65 References 67	-
dc.language.iso	en	-
dc.subject	頻率調節	zh_TW
dc.subject	神經網絡模型	zh_TW
dc.subject	Nvidia Jetson AGX Xavier	zh_TW
dc.subject	能量耗損	zh_TW
dc.subject	Energy Consumption	en
dc.subject	Nvidia Jetson AGX Xavier	en
dc.subject	Neural Network Model	en
dc.subject	Frequency Modulation	en
dc.title	分析神經網路推論於Nvidia Jetson AGX Xavier之能源並以自適應頻率調節優化能源效率	zh_TW
dc.title	Analyzing the Inference of Neural Network on Nvidia Jetson AGX Xavier and Optimizing the Energy Efficiency through Self-Adaptive Frequency Scaling	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳依蓉;鄭湘筠	zh_TW
dc.contributor.oralexamcommittee	Yi-Jung Chen;Hsiang-Yun Cheng	en
dc.subject.keyword	Nvidia Jetson AGX Xavier,能量耗損,神經網絡模型,頻率調節,	zh_TW
dc.subject.keyword	Nvidia Jetson AGX Xavier,Energy Consumption,Neural Network Model,Frequency Modulation,	en
dc.relation.page	73	-
dc.identifier.doi	10.6342/NTU202300620	-
dc.rights.note	未授權	-
dc.date.accepted	2023-02-18	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf 未授權公開取用	2.58 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。