可感知資源限制之神經網路壓縮與加速

Yu-Cheng Wu; 吳禹澄

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71450

標題:	可感知資源限制之神經網路壓縮與加速 Constraint-Aware Neural Network Compression and Acceleration
作者:	Yu-Cheng Wu 吳禹澄
指導教授:	簡韶逸(Shao-Yi Chien)
關鍵字:	神經網路壓縮,網路剪枝, network compression,network pruning,
出版年 :	2020
學位:	碩士
摘要:	卷積神經網路（CNN）已經在許多電腦視覺問題中都取得了顯著的成就。但是大多數的CNN具有大量的參數與計算複雜度，這導致它們僅適合在具有豐富計算資源的平台運行。為了讓這些網路能夠在現實應用中運算資源受限的平台上運行，人們開始研究如何壓縮並加速這些網路同時保持模型的表現(準確性)。神經網路壓縮的目標是生成一個符合給定的資源限制且表現最好的壓縮結果。濾波器剪枝是一種透過結構性的移除CNN中多餘的參數，來有效地減少計算量與所需的儲存空間。目前最成功的方法會根據每個濾波器對模型損失的影響來全局評估每個濾波器的重要性，並反覆的移除重要性值較小的濾波器，直到剪枝過的網路滿足某些資源限制，例如剩餘的濾波器數量（或比率）。但是，在面對像是FLOP總數之類的更實際的資源限制時，這些方法並未考慮資源限制與濾波器重要性估計之間的關係。因此，我們提出了一種稱為可感知資源限制之重要性估計（CAIE）的新方法，我們的方法將濾波器對給定資源的影響的資訊整合進原本重要性估計的方法，也就是只根據過濾器的模型損失的影響的方法。而且，我們的CAIE可以推廣到多個資源限制下的剪枝問題。我們的實驗顯示，在相同的多種資源限制下，與目前最頂尖的方法相比，使用我們的CAIE方法的剪枝結果可以準確地滿足所有資源限制並擁有更好的表現。除了濾波器剪枝之外，量化和知識蒸餾（KD）也是有效的網路壓縮方法。量化通過減少權重或激活的位元寬度來減少網絡的儲存大小和推理延遲。知識蒸餾通過從訓練有素的大型模型中傳遞有用的信息來增強小型或壓縮模型的表現。組合多種壓縮方法可以獲得進一步的壓縮結果，而組合的網絡壓縮的典型流程是根據給定的資源限制手動安排壓縮方法各自的目標，並依次利用這些方法壓縮網路。但是由於不同壓縮方法之間缺少資訊交換，這樣的流程可能無法得到在資源限制下表現最佳的的壓縮結果。因此，我們也提出了一個複合的，可感知資源限制的網路壓縮架構，該架構將量化和知識提煉整合到了我們的CAIE剪枝方法中。透過在CAIE剪枝過程中重要性估計的時候考慮量化相關的資訊並在微調時使用KD的方法，我們的壓縮框架可以在給定的整體資源限制下達到更好的結果。 Convolutional neural networks (CNNs) have achieved noticeable successes in many computer vision tasks. However, most of the CNNs possess a massive number of parameters and high computational complexity, which leads to that they are only suitable for platforms with rich computational resources. To adapt these networks into resource-limited platforms in real-world applications, people study how to compress and accelerate these networks while maintaining their performance or accuracy. The goal of network compression is to generate compression results with the best performance under the given resource constraints. Filter pruning is an efficient way to structurally remove the redundant parameters in CNNs, where at the same time reduce the computation cost and memory storage. Current state-of-the-art methods globally estimate the importance of each filter based on its impact on the loss and iteratively remove those with smaller importance values until the pruned network meets some resource constraints, such as the number (or ratio) of filters left. However, when there is a more practical constraint like the total number of FLOPs, these methods fail to associate the relationship of the resource constraint to the estimation of filter importance. We propose a novel method called Constraint-Aware Importance Estimation (CAIE) that integrates information of the impact on the given resource into the original importance estimation, which is only based on loss when pruning each filter. Moreover, our CAIE can be generalized to the pruning problem under multiple resource constraints simultaneously. Extensive experiments show that under the same multiple resource constraints, the model pruned with our CAIE method can accurately meet the constraints and achieve better performance compared to state-of-the-art methods. Other than filter pruning, quantization and knowledge distillation (KD) are also effective compression approaches. Quantization lessens the storage size and inference latency of networks by reducing the bit-width of their weights or activations. Knowledge distillation enhances the performance of the small or compressed model by transferring useful information from a well-trained and large one. Combining multiple compression methods can achieve further compression results, and a typical procedure of integrated network compression is manually arranging the objectives for the compression techniques respectively based on the given resource constraints and adopting them sequentially. However, the results may not be optimal under the constraints due to the lack of information interchanging between different methods. Therefore, we also propose a compound, constraint-aware compression framework that integrates quantization and knowledge distillation into our CAIE pruning method. By considering the quantization information during the importance estimation of pruning and applying KD while fine-tuning, our compression framework can achieve better results under the given overall resource constraints.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71450
DOI:	10.6342/NTU202004352
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
U0001-2411202022485300.pdf 目前未授權公開取用	2.85 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。