利用細粒度結構化修剪對 CNN 模型進行有效推理

吳政鴻; Cheng-Hung Wu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89006

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉邦鋒	zh_TW
dc.contributor.advisor	Pangfeng Liu	en
dc.contributor.author	吳政鴻	zh_TW
dc.contributor.author	Cheng-Hung Wu	en
dc.date.accessioned	2023-08-16T16:44:18Z	-
dc.date.available	2024-05-03	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-08	-
dc.identifier.citation	L. Bossard, M. Guillaumin, and L. Van Gool. Food101 – mining discriminative components with random forests. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars,editors,ComputerVision – ECCV 2014, pages 446–461, Cham, 2014. Springer International Publishing T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated endto end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, OSDI’18, page 579–594, USA, 2018. USENIX Association T.W. Chen, P. Liu, and J.J. Wu. Exploiting data entropy for neural network compression. In 2020 IEEE International Conference on Big Data (Big Data), pages 5007–5016, Los Alamitos, CA, USA, 2020. IEEE, IEEE Computer Society Y. Gong, L. Liu, M. Yang, and L. D. Bourdev. Compressing deep convolutional networks using vector quantization. CoRR, abs/1412.6115(10):1–10, 201 C. Guo, B. Y. Hsueh, J. Leng, Y. Qiu, Y. Guan, Z. Wang, X. Jia, X. Li, M. Guo, and Y. Zhu. Accelerating sparse dnn models without hardwaresupport via tile-wise sparsity. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’20, pages 1–15, IEEE Press, 2020. IEEE Press S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations (ICLR), 1(14):1–14, 2016 S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28:1135–1143, 2015 S. H. HasanPour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. CoRR, abs/1608.06037(20):1–20, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, IEEE, 2016. IEEE. Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1398–1406, IEEE, 2017. IEEE. G. Hinton, O. Vinyals, and J. Dean. Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop, 1(9):1–9, 2015. H. Hu, R. Peng, Y.W. Tai, and C.K. Tang. Network Trimming: A DataDriven Neuron Pruning Approach towards Efficient Deep Architectures. arXiv eprints, 1(9):arXiv:1607.03250, July 2016. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep AAconvolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. International Conference on Learning Representations, 1(13):1–13, 2017 J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431–3440, Los Alamitos, CA, USA, jun 2015. IEEE Computer Society. J.H. Luo and J. Wu. An Entropybased Pruning Method for CNN Compression. arXiv eprints, abs/1706.05791:arXiv:1706.05791, June 2017. F. Meng, H. Cheng, K. Li, H. Luo, X. Guo, G. Lu, and X. Sun. Pruning filter in filter.Advances in Neural Information Processing Systems, 33:17629–17640, 2020 NVIDIA. Nvidia a100 tensor core gpu architecture. https://images.nvidia.com/aemdam/enzz/Solutions/datacenter/nvidiaamperearchitecturewhitepaper.pdf, 2020. NVIDIA. Nvidia ampere ga102 gpu architecture. https://images.nvidia.com/aemdam/enzz/Solutions/geforce/ampere/pdf/NVIDIAampereGA102GPUArchitectureWhitepaperV1.pdf, 2020. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch:An imperative style, highperformance deep learning library. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’AlchéBuc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conferencon Neural Information Processing Systems 2019, NeurIPS 2019, December 814, 2019, Vancouver, BC, Canada, pages 8024–8035, Vancouver, Canada, 2019. Neural Information Processing Systems foundation. J. Redmon and A. Farhadi. YOLO9000: better, faster, stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 2126, 2017, pages 6517–6525, IEEE Computer Society, 2017. IEEE Computer Society. S. Ren, K. He, R. Girshick, and J. Sun. Faster rcnn: Towards realtime object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28, 57 Morehouse Ln, Red Hook, New York, 12571, United States, 2015. Curran Associates, Inc. O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, editors, Medical Image Computing and ComputerAssisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015. K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. 3rd International Conference on Learning Representations (ICLR2015), 1(14):1–14, 2015. W. Sun, A. Zhou, S. Stuijk, R. Wijnhoven, A. O. Nelson, H. Corporaal, et al. Dominosearch: Find layerwise finegrained n: M sparse schemes from dense neural networks. Advances in neural information processing systems, 34:20721–20732, 2021 J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu. Coca:Contrastive captioners are imagetext foundation models. Transactions on Machine Learning Research, 1(20):1–20, 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89006	-
dc.description.abstract	卷積神經網絡是一種革命性的深度學習技術，改變了計算機視覺領域。在現代的CNN模型中，卷積通常占據了大部分的計算時間。模型壓縮是一種在深度學習中用於減小神經網絡尺寸並同時保持其準確性的方法。權重修剪是一種從網絡中移除冗餘或不重要權重的方法。這些方法有助於減小神經網絡的尺寸和計算成本，同時保持其準確性。在本篇論文中，我們提出了一種動態規劃演算法，根據卷積層的執行時間和L1範數，在總時間預算內為每一層卷積層找到一個適當的稀疏比例。在確定每一層的稀疏比例後，我們修改了TVM並用它來生成使用掩碼指示要加載進行處理的數據的代碼。此外，我們提出了CHWN佈局，將數據批次的維度移到最內層維度，以消除最內層維度的變化大小，並使內存訪問模式連續。實驗結果顯示我們的方法相比於密集模型在ImageNet數據集上對VGG-16模型提升了0.35％的準確性和1.55倍的加速。	zh_TW
dc.description.abstract	Convolutional neural network (CNN) is a deep learning technique that has revolutionized the field of computer vision. In modern CNN models, convolution typically accounts for the majority of the computation time. Model compression is a method used in deep learning to reduce the size of a neural network while preserving its accuracy. Weight pruning removes redundant or unimportant weights from the network. These methods can help reduce the size and computational cost of neural networks while preserving their accuracy. In this work, we propose a a dynamic programming algorithm to find a good sparsity ratio for every layer individually under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio for every layer, we modify TVM to generate code that uses a mask to indicate the data to load for processing. Furthermore, we propose the CHWN layout, where we move the dimension of the batch of data (N) to the innermost dimension to get rid of the varying size in the innermost dimension and make the memory access pattern contiguous. The experiment result shows that our scheme can achieve 0.35\% accuracy improvement and a 1.55x speedup on VGG-16 with the ImageNet dataset than the dense model.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T16:44:18Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T16:44:18Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgement i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables viii Denotation ix Chapter1 Introduction 1 Chatper2 Related Works 5 2.1 Unstructured Pruning 5 2.2 Structure Pruning 6 2.3 Sparse Tensor Core 7 2.4 DominoSearch 8 Chapter3 Pruning Algorithm 10 3.1 Problem Definition 10 3.2 Dynamic Programming 11 3.2.1 Sparsity Pattern Choosing Algorithm 11 3.2.2 Time Complexity 11 Chapter 4 Speed up of Pruned Models 13 4.1 Mask Loading 13 4.2 CHWN Layout 16 Chapter5 Evaluation 18 5.1 Experiment Settings 18 5.1.1 Models 18 5.1.2 Benchmarks 19 5.1.3 Implementation 19 5.2 Performance of Mask-loading instructions 19 5.3 Performance of Different Layout 21 5.4 Performance of Different Batch Size 23 5.5 Performance of Different Sparsity Pattern 24 5.6 Performance of Pruned Models 25 5.6.1 VGG-16 25 5.6.2 Resnet18 27 5.6.3 Resnet50 29 Chapter6 Conclusion 31 References 32	-
dc.language.iso	en	-
dc.subject	TVM	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	結構化修剪	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	模型壓縮	zh_TW
dc.subject	模型剪枝	zh_TW
dc.subject	動態規劃	zh_TW
dc.subject	細粒度	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	Model compression	en
dc.subject	Machine learning	en
dc.subject	Deep learning	en
dc.subject	Model pruning	en
dc.subject	Dynamic programming	en
dc.subject	Fine-grained	en
dc.subject	Structured Pruning	en
dc.subject	TVM	en
dc.title	利用細粒度結構化修剪對 CNN 模型進行有效推理	zh_TW
dc.title	{Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	洪鼎詠;吳真貞	zh_TW
dc.contributor.oralexamcommittee	Ding-Yong Hong;Jan-Jan Wu	en
dc.subject.keyword	機器學習,深度學習,卷積神經網路,模型壓縮,模型剪枝,動態規劃,細粒度,結構化修剪,TVM,	zh_TW
dc.subject.keyword	Machine learning,Deep learning,Model compression,Model pruning,Dynamic programming,Fine-grained,Structured Pruning,TVM,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202303337	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-09	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2028-08-08	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	353.2 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。