Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor劉邦鋒zh_TW
dc.contributor.advisorPangfeng Liuen
dc.contributor.author林建宏zh_TW
dc.contributor.authorChien-Hung Linen
dc.date.accessioned2023-07-24T16:09:52Z-
dc.date.available2023-11-09-
dc.date.copyright2023-07-24-
dc.date.issued2023-
dc.date.submitted2023-06-19-
dc.identifier.citationT. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation, page 579–594, 2018.
T.-W. Chen, P. Liu, and J.-J. Wu. Exploiting data entropy for neural network com pression. In 2020 IEEE International Conference on Big Data (Big Data), pages 5007–5016. IEEE, 2020.
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014.
Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014.
C. Guo, B. Y. Hsueh, J. Leng, Y. Qiu, Y. Guan, Z. Wang, X. Jia, X. Li, M. Guo, and Y. Zhu. Accelerating sparse dnn models without hardware-support via tile wise sparsity. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020.
S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv preprint arXiv:1608.06037, 2016.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Y. He, X. Zhang, and J. Sun. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
H. Kung, B. McDanel, and S. Q. Zhang. Packing sparse convolutional neural net works for efficient systolic array implementations: Column combining under joint optimization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 821–834, 2019.
H. T. Kung and C. E. Leiserson. Systolic arrays (for vlsi). In Sparse Matrix Proceedings 1978, volume 1, pages 256–282. Society for industrial and applied mathematics Philadelphia, PA, USA, 1979.
Y. Le and X. Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
N. Lee, T. Ajanthan, and P. H. Torr. Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018.
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
F. Meng, H. Cheng, K. Li, H. Luo, X. Guo, G. Lu, and X. Sun. Pruning filter in filter. Advances in Neural Information Processing Systems, 33:17629–17640, 2020.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
R. Rao and S. Iyengar. Bin-packing by simulated annealing. Computers & Mathematics with Applications, 27(5):71–82, 1994.
J. Redmon and A. Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomed ical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpa thy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
P. J. Van Laarhoven, E. H. Aarts, P. J. van Laarhoven, and E. H. Aarts. Simulated annealing. Springer, 1987.
C. Wang, G. Zhang, and R. Grosse. Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020.
Z. Wang, X. Geng, and Z. Shao. An effective simulated annealing algorithm for solving the traveling salesman problem. Journal of Computational and Theoretical Nanoscience, 6(7):1680–1686, 2009.
J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863-
dc.description.abstract卷積神經網絡在各種電腦視覺問題中取得了相當好的成果。然而,最先進的卷積神經網絡模型的規模越來越巨大,這導致推論時間非常長和內存使用量很高。非結構化剪枝等模型壓縮技術可以在不影響準確性的情況下剪枝掉很大一部分參數,但有效利用其稀疏性仍然是一個挑戰。行組合通過將捲積濾波器矩陣中的多個稀疏行組合成單個密集行來壓縮非結構化剪枝後的卷積神經網絡模型。此外,在組合後行的每一列中剪枝除了最大幅度權重之外的所有權重,可以有效地進一步壓縮矩陣。然而,之前的論文沒有解決分割稀疏行的細節,以盡量減少額外剪枝對模型性能的負面影響。在這項工作中,我們首先證明行分區問題是一個 NP-Complete 問題。接下來,我們提出了一種基於模擬退火和全局非結構化剪枝的行組合方案,以最大限度地減少額外剪枝對模型性能的不利影響。我們在沒有特殊硬件支持的情況下使用 TVM AI 編譯器實現了行組合卷積神經網絡模型的加速。我們所提出的方案實現了更有效的模型壓縮,在 TinyImageNet 數據集稀疏度為 88% 的 VGG19 模型上精度提高 0.65%,推理時間加快 1.24 倍。zh_TW
dc.description.abstractConvolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-the-art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24x faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-07-24T16:09:52Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-07-24T16:09:52Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgments i
摘要 ii
Abstract iii
Contents v
List of Figures vii
List of Tables ix
Denotation x
Chapter 1 Introduction 1
Chapter 2 Background 5
Chapter 2.1 Unstructured Pruning 5
Chapter 2.2 Structured Pruning 6
Chapter 2.3 Column Combining 6
Chapter 2.4 GEMM-based Convolution 9
Chapter 3 Proof of Column Partition Problem 10
Chapter 4 Scheme 12
Chapter 4.1 Column Combining based on Simulated Annealing 12
Chapter 4.2 Simulated Annealing-based Column Partition 14
Chapter 4.3 Matrix Multiplication for Combined Matrix 17
Chapter 5 Evaluation 20
Chapter 5.1 Experiment Settings 20
Chapter 5.1.1 Models 20
Chapter 5.1.2 Benchmarks 20
Chapter 5.1.3 Implementation 21
Chapter 5.2 Performance of Different α Percentage 21
Chapter 5.3 Performance of Column Combining Methods 23
Chapter 5.4 Acceleration of Column Combining Methods 24
Chapter 5.5 Performance of Row Rearrangement 28
Chapter 6 Conclusion 30
References 31
Appendix A — The experimental parameters 36
Appendix A.1 The experimental parameters for dense-column-first combining 36
-
dc.language.isoen-
dc.subject模擬退火zh_TW
dc.subjectTVMzh_TW
dc.subject行組合zh_TW
dc.subject模型剪枝zh_TW
dc.subject模型壓縮zh_TW
dc.subject深度學習zh_TW
dc.subject機器學習zh_TW
dc.subjectModel compressionen
dc.subjectTVMen
dc.subjectMachine learningen
dc.subjectModel pruningen
dc.subjectColumn combiningen
dc.subjectSimulated annealingen
dc.subjectDeep learningen
dc.title使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間zh_TW
dc.titleAccelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealingen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee吳真貞;洪鼎詠zh_TW
dc.contributor.oralexamcommitteeJan-Jan Wu;Ding-Yong Hongen
dc.subject.keyword機器學習,深度學習,模型壓縮,模型剪枝,行組合,模擬退火,TVM,zh_TW
dc.subject.keywordMachine learning,Deep learning,Model compression,Model pruning,Column combining,Simulated annealing,TVM,en
dc.relation.page36-
dc.identifier.doi10.6342/NTU202301025-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-06-21-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2026-08-31-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
363.99 kBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved