使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間

林建宏; Chien-Hung Lin

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	劉邦鋒	zh_TW
dc.contributor.advisor	Pangfeng Liu	en
dc.contributor.author	林建宏	zh_TW
dc.contributor.author	Chien-Hung Lin	en
dc.date.accessioned	2023-07-24T16:09:52Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-07-24	-
dc.date.issued	2023	-
dc.date.submitted	2023-06-19	-
dc.identifier.citation	T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, page 579–594, 2018. T.-W. Chen, P. Liu, and J.-J. Wu. Exploiting data entropy for neural network com pression. In 2020 IEEE International Conference on Big Data (Big Data), pages 5007–5016. IEEE, 2020. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014. C. Guo, B. Y. Hsueh, J. Leng, Y. Qiu, Y. Guan, Z. Wang, X. Jia, X. Li, M. Guo, and Y. Zhu. Accelerating sparse dnn models without hardware-support via tile wise sparsity. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020. S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015. S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv preprint arXiv:1608.06037, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. H. Kung, B. McDanel, and S. Q. Zhang. Packing sparse convolutional neural net works for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 821–834, 2019. H. T. Kung and C. E. Leiserson. Systolic arrays (for vlsi). In Sparse Matrix Proceedings 1978, volume 1, pages 256–282. Society for industrial and applied mathematics Philadelphia, PA, USA, 1979. Y. Le and X. Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015. N. Lee, T. Ajanthan, and P. H. Torr. Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016. Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. F. Meng, H. Cheng, K. Li, H. Luo, X. Guo, G. Lu, and X. Sun. Pruning filter in filter. Advances in Neural Information Processing Systems, 33:17629–17640, 2020. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. R. Rao and S. Iyengar. Bin-packing by simulated annealing. Computers & Mathematics with Applications, 27(5):71–82, 1994. J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomed ical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa thy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. P. J. Van Laarhoven, E. H. Aarts, P. J. van Laarhoven, and E. H. Aarts. Simulated annealing. Springer, 1987. C. Wang, G. Zhang, and R. Grosse. Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020. Z. Wang, X. Geng, and Z. Shao. An effective simulated annealing algorithm for solving the traveling salesman problem. Journal of Computational and Theoretical Nanoscience, 6(7):1680–1686, 2009. J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863	-
dc.description.abstract	卷積神經網絡在各種電腦視覺問題中取得了相當好的成果。然而，最先進的卷積神經網絡模型的規模越來越巨大，這導致推論時間非常長和內存使用量很高。非結構化剪枝等模型壓縮技術可以在不影響準確性的情況下剪枝掉很大一部分參數，但有效利用其稀疏性仍然是一個挑戰。行組合通過將捲積濾波器矩陣中的多個稀疏行組合成單個密集行來壓縮非結構化剪枝後的卷積神經網絡模型。此外，在組合後行的每一列中剪枝除了最大幅度權重之外的所有權重，可以有效地進一步壓縮矩陣。然而，之前的論文沒有解決分割稀疏行的細節，以盡量減少額外剪枝對模型性能的負面影響。在這項工作中，我們首先證明行分區問題是一個 NP-Complete 問題。接下來，我們提出了一種基於模擬退火和全局非結構化剪枝的行組合方案，以最大限度地減少額外剪枝對模型性能的不利影響。我們在沒有特殊硬件支持的情況下使用 TVM AI 編譯器實現了行組合卷積神經網絡模型的加速。我們所提出的方案實現了更有效的模型壓縮，在 TinyImageNet 數據集稀疏度為 88% 的 VGG19 模型上精度提高 0.65%，推理時間加快 1.24 倍。	zh_TW
dc.description.abstract	Convolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-the-art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24x faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-07-24T16:09:52Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-07-24T16:09:52Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgments i 摘要 ii Abstract iii Contents v List of Figures vii List of Tables ix Denotation x Chapter 1 Introduction 1 Chapter 2 Background 5 Chapter 2.1 Unstructured Pruning 5 Chapter 2.2 Structured Pruning 6 Chapter 2.3 Column Combining 6 Chapter 2.4 GEMM-based Convolution 9 Chapter 3 Proof of Column Partition Problem 10 Chapter 4 Scheme 12 Chapter 4.1 Column Combining based on Simulated Annealing 12 Chapter 4.2 Simulated Annealing-based Column Partition 14 Chapter 4.3 Matrix Multiplication for Combined Matrix 17 Chapter 5 Evaluation 20 Chapter 5.1 Experiment Settings 20 Chapter 5.1.1 Models 20 Chapter 5.1.2 Benchmarks 20 Chapter 5.1.3 Implementation 21 Chapter 5.2 Performance of Different α Percentage 21 Chapter 5.3 Performance of Column Combining Methods 23 Chapter 5.4 Acceleration of Column Combining Methods 24 Chapter 5.5 Performance of Row Rearrangement 28 Chapter 6 Conclusion 30 References 31 Appendix A — The experimental parameters 36 Appendix A.1 The experimental parameters for dense-column-first combining 36	-
dc.language.iso	en	-
dc.subject	模擬退火	zh_TW
dc.subject	TVM	zh_TW
dc.subject	行組合	zh_TW
dc.subject	模型剪枝	zh_TW
dc.subject	模型壓縮	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	Model compression	en
dc.subject	TVM	en
dc.subject	Machine learning	en
dc.subject	Model pruning	en
dc.subject	Column combining	en
dc.subject	Simulated annealing	en
dc.subject	Deep learning	en
dc.title	使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間	zh_TW
dc.title	Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	吳真貞;洪鼎詠	zh_TW
dc.contributor.oralexamcommittee	Jan-Jan Wu;Ding-Yong Hong	en
dc.subject.keyword	機器學習,深度學習,模型壓縮,模型剪枝,行組合,模擬退火,TVM,	zh_TW
dc.subject.keyword	Machine learning,Deep learning,Model compression,Model pruning,Column combining,Simulated annealing,TVM,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202301025	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-06-21	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2026-08-31	-
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-111-2.pdf Restricted Access	363.99 kB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets