基於混合分布正規化之模型壓縮方法

Chang-Ti Huang; 黃昌第

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340

Title:	基於混合分布正規化之模型壓縮方法 A Method of Mixture-Distributed Regularization for Model Compression
Authors:	Chang-Ti Huang 黃昌第
Advisor:	吳家麟(Ja-Ling Wu)
Keyword:	深度學習,模型壓縮,網路縮減,正規化,最佳化, Deep Learning,Model Compression,Network Reduction,Regularization,Optimization,
Publication Year :	2019
Degree:	碩士
Abstract:	高效能深度學習計算是個重要的課題，它不僅可以節省計算成本，更能將人工智慧實現於行動裝置之中。正規化 (regularization) 是一種常見的模型壓縮方法，而 $L_0$ 範數 (norm) 的正規化是其中一種有效的方法。由於此範數之定義為非零參數的個數，因此相當適合作為神經網路參數稀疏化的約束條件 (sparsity constraints)。然而也因為 $L_0$ 範數的定義使它具離散性並成為數學上棘手的問題。一個早先的研究方法利用 Concrete distribution 來模擬二元邏輯閘，並利用這個邏輯閘的概念來決定哪些參數應該進行剪枝 (pruning)。本論文提出一種更可靠的框架來模擬二元邏輯閘。此框架是一種基於混合分布 (mixture distributions) 建構而成的的正規化項。任何一對對稱且收斂於 $delta(0)$ 與 $delta(1)$ 的分布皆可以在我們提出的框架下成為近似二元的邏輯閘，進而估算 $L_0$ 正規化，達成模型壓縮及網路縮減的目標。此外，我們也推演出一種對混合分布重新參數化 (reparameterization) 的方法至前述的模型壓縮中，使得我們提出的深度學習演算法可以利用隨機梯度下降法進行優化。在 MNIST 與 CIFAR-10/CIFAR-100 資料集訓練下的實驗結果均顯示，我們所提出的方法是非常具有競爭力的。 Efficient deep learning computing has recently received considerable attention. It saves computational costs, and potentially realizes model inference using on-chip devices. Regularization of parameters is a common approach to compress the model. $L_0$ regularization is one of the efficient regularizers since it penalizes the non-zero parameters without any shrinkage of larger values. However, the combinatorial nature of the $L_0$ norm makes it an intractable term. A previous work approximated the $L_0$ norm using the Concrete distribution with emulated binary gates, and collectively determined which weights should be pruned. In this thesis, a more general framework for relaxing binary gates through mixture distributions is proposed. With the proposed method, any mixture pair of distributions converging to $delta(0)$ and $delta(1)$ can be applied to construct smoothed binary gates. We further introduce a reparameterization method for mixture distributions to the field of model compression. Reparameterized smoothed-binary gates drawn from mixture distributions are capable of conducting efficient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments show that we achieve the state-of-the-art in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs).
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340
DOI:	10.6342/NTU201900792
Fulltext Rights:	有償授權
Appears in Collections:	資訊網路與多媒體研究所

Files in This Item:

File	Size	Format
ntu-108-1.pdf Restricted Access	12.95 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets