請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 劉邦鋒 | zh_TW |
dc.contributor.advisor | Pangfeng Liu | en |
dc.contributor.author | 林建宏 | zh_TW |
dc.contributor.author | Chien-Hung Lin | en |
dc.date.accessioned | 2023-07-24T16:09:52Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-07-24 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-06-19 | - |
dc.identifier.citation | T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, page 579–594, 2018.
T.-W. Chen, P. Liu, and J.-J. Wu. Exploiting data entropy for neural network com pression. In 2020 IEEE International Conference on Big Data (Big Data), pages 5007–5016. IEEE, 2020. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014. C. Guo, B. Y. Hsueh, J. Leng, Y. Qiu, Y. Guan, Z. Wang, X. Jia, X. Li, M. Guo, and Y. Zhu. Accelerating sparse dnn models without hardware-support via tile wise sparsity. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020. S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015. S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv preprint arXiv:1608.06037, 2016. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017. H. Kung, B. McDanel, and S. Q. Zhang. Packing sparse convolutional neural net works for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 821–834, 2019. H. T. Kung and C. E. Leiserson. Systolic arrays (for vlsi). In Sparse Matrix Proceedings 1978, volume 1, pages 256–282. Society for industrial and applied mathematics Philadelphia, PA, USA, 1979. Y. Le and X. Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015. N. Lee, T. Ajanthan, and P. H. Torr. Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016. Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018. J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015. F. Meng, H. Cheng, K. Li, H. Luo, X. Guo, G. Lu, and X. Sun. Pruning filter in filter. Advances in Neural Information Processing Systems, 33:17629–17640, 2020. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. R. Rao and S. Iyengar. Bin-packing by simulated annealing. Computers & Mathematics with Applications, 27(5):71–82, 1994. J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015. O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomed ical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa thy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. P. J. Van Laarhoven, E. H. Aarts, P. J. van Laarhoven, and E. H. Aarts. Simulated annealing. Springer, 1987. C. Wang, G. Zhang, and R. Grosse. Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020. Z. Wang, X. Geng, and Z. Shao. An effective simulated annealing algorithm for solving the traveling salesman problem. Journal of Computational and Theoretical Nanoscience, 6(7):1680–1686, 2009. J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863 | - |
dc.description.abstract | 卷積神經網絡在各種電腦視覺問題中取得了相當好的成果。然而,最先進的卷積神經網絡模型的規模越來越巨大,這導致推論時間非常長和內存使用量很高。非結構化剪枝等模型壓縮技術可以在不影響準確性的情況下剪枝掉很大一部分參數,但有效利用其稀疏性仍然是一個挑戰。行組合通過將捲積濾波器矩陣中的多個稀疏行組合成單個密集行來壓縮非結構化剪枝後的卷積神經網絡模型。此外,在組合後行的每一列中剪枝除了最大幅度權重之外的所有權重,可以有效地進一步壓縮矩陣。然而,之前的論文沒有解決分割稀疏行的細節,以盡量減少額外剪枝對模型性能的負面影響。在這項工作中,我們首先證明行分區問題是一個 NP-Complete 問題。接下來,我們提出了一種基於模擬退火和全局非結構化剪枝的行組合方案,以最大限度地減少額外剪枝對模型性能的不利影響。我們在沒有特殊硬件支持的情況下使用 TVM AI 編譯器實現了行組合卷積神經網絡模型的加速。我們所提出的方案實現了更有效的模型壓縮,在 TinyImageNet 數據集稀疏度為 88% 的 VGG19 模型上精度提高 0.65%,推理時間加快 1.24 倍。 | zh_TW |
dc.description.abstract | Convolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-the-art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24x faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-07-24T16:09:52Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-07-24T16:09:52Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Acknowledgments i
摘要 ii Abstract iii Contents v List of Figures vii List of Tables ix Denotation x Chapter 1 Introduction 1 Chapter 2 Background 5 Chapter 2.1 Unstructured Pruning 5 Chapter 2.2 Structured Pruning 6 Chapter 2.3 Column Combining 6 Chapter 2.4 GEMM-based Convolution 9 Chapter 3 Proof of Column Partition Problem 10 Chapter 4 Scheme 12 Chapter 4.1 Column Combining based on Simulated Annealing 12 Chapter 4.2 Simulated Annealing-based Column Partition 14 Chapter 4.3 Matrix Multiplication for Combined Matrix 17 Chapter 5 Evaluation 20 Chapter 5.1 Experiment Settings 20 Chapter 5.1.1 Models 20 Chapter 5.1.2 Benchmarks 20 Chapter 5.1.3 Implementation 21 Chapter 5.2 Performance of Different α Percentage 21 Chapter 5.3 Performance of Column Combining Methods 23 Chapter 5.4 Acceleration of Column Combining Methods 24 Chapter 5.5 Performance of Row Rearrangement 28 Chapter 6 Conclusion 30 References 31 Appendix A — The experimental parameters 36 Appendix A.1 The experimental parameters for dense-column-first combining 36 | - |
dc.language.iso | en | - |
dc.title | 使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間 | zh_TW |
dc.title | Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 吳真貞;洪鼎詠 | zh_TW |
dc.contributor.oralexamcommittee | Jan-Jan Wu;Ding-Yong Hong | en |
dc.subject.keyword | 機器學習,深度學習,模型壓縮,模型剪枝,行組合,模擬退火,TVM, | zh_TW |
dc.subject.keyword | Machine learning,Deep learning,Model compression,Model pruning,Column combining,Simulated annealing,TVM, | en |
dc.relation.page | 36 | - |
dc.identifier.doi | 10.6342/NTU202301025 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2023-06-21 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 目前未授權公開取用 | 363.99 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。