使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間

林建宏; Chien-Hung Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863

標題:	使用基於模擬退火的列組合加速 CNN 模型在 CPU 上的推論時間 Accelerate Inference of CNN Models on CPU via Column Combining Based on Simulated Annealing
作者:	林建宏 Chien-Hung Lin
指導教授:	劉邦鋒 Pangfeng Liu
關鍵字:	機器學習,深度學習,模型壓縮,模型剪枝,行組合,模擬退火,TVM, Machine learning,Deep learning,Model compression,Model pruning,Column combining,Simulated annealing,TVM,
出版年 :	2023
學位:	碩士
摘要:	卷積神經網絡在各種電腦視覺問題中取得了相當好的成果。然而，最先進的卷積神經網絡模型的規模越來越巨大，這導致推論時間非常長和內存使用量很高。非結構化剪枝等模型壓縮技術可以在不影響準確性的情況下剪枝掉很大一部分參數，但有效利用其稀疏性仍然是一個挑戰。行組合通過將捲積濾波器矩陣中的多個稀疏行組合成單個密集行來壓縮非結構化剪枝後的卷積神經網絡模型。此外，在組合後行的每一列中剪枝除了最大幅度權重之外的所有權重，可以有效地進一步壓縮矩陣。然而，之前的論文沒有解決分割稀疏行的細節，以盡量減少額外剪枝對模型性能的負面影響。在這項工作中，我們首先證明行分區問題是一個 NP-Complete 問題。接下來，我們提出了一種基於模擬退火和全局非結構化剪枝的行組合方案，以最大限度地減少額外剪枝對模型性能的不利影響。我們在沒有特殊硬件支持的情況下使用 TVM AI 編譯器實現了行組合卷積神經網絡模型的加速。我們所提出的方案實現了更有效的模型壓縮，在 TinyImageNet 數據集稀疏度為 88% 的 VGG19 模型上精度提高 0.65%，推理時間加快 1.24 倍。 Convolutional Neural Networks (CNNs) have been successful in various computer vision tasks. However, the size of state-of-the-art CNN models tends to be tremendous, which results in very long inference times and high memory usage. Model compression technology such as unstructured pruning can prune a significant proportion of parameters without affecting accuracy, but efficient utilization of sparsity remains a challenge. Column combining compress unstructurally-pruned CNN models by combining multiple sparse columns in a convolutional filter matrix into a single dense column. In addition, pruning all but the largest magnitude weight in each row of the combined column further compresses the matrix effectively. However, previous work did not address the details of partitioning sparse columns to minimize the negative impact of additional pruning on the performance of the model. In this work, we first prove that the column partition problem is an NP-Complete problem. Next, we propose a column combining scheme based on simulated annealing and global unstructured pruning to minimize the adverse effects of additional pruning on model performance. We implement the acceleration of column-combined CNN models using the TVM AI compiler without special hardware support. The proposed scheme achieves more efficient model compression, leading to a 0.65% improvement in accuracy and a 1.24x faster inference time on VGG19 under 88% sparsity with the TinyImageNet dataset.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87863
DOI:	10.6342/NTU202301025
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	363.99 kB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。