基於混合分布正規化之模型壓縮方法

Chang-Ti Huang; 黃昌第

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340

標題:	基於混合分布正規化之模型壓縮方法 A Method of Mixture-Distributed Regularization for Model Compression
作者:	Chang-Ti Huang 黃昌第
指導教授:	吳家麟(Ja-Ling Wu)
關鍵字:	深度學習,模型壓縮,網路縮減,正規化,最佳化, Deep Learning,Model Compression,Network Reduction,Regularization,Optimization,
出版年 :	2019
學位:	碩士
摘要:	高效能深度學習計算是個重要的課題，它不僅可以節省計算成本，更能將人工智慧實現於行動裝置之中。正規化 (regularization) 是一種常見的模型壓縮方法，而 $L_0$ 範數 (norm) 的正規化是其中一種有效的方法。由於此範數之定義為非零參數的個數，因此相當適合作為神經網路參數稀疏化的約束條件 (sparsity constraints)。然而也因為 $L_0$ 範數的定義使它具離散性並成為數學上棘手的問題。一個早先的研究方法利用 Concrete distribution 來模擬二元邏輯閘，並利用這個邏輯閘的概念來決定哪些參數應該進行剪枝 (pruning)。本論文提出一種更可靠的框架來模擬二元邏輯閘。此框架是一種基於混合分布 (mixture distributions) 建構而成的的正規化項。任何一對對稱且收斂於 $delta(0)$ 與 $delta(1)$ 的分布皆可以在我們提出的框架下成為近似二元的邏輯閘，進而估算 $L_0$ 正規化，達成模型壓縮及網路縮減的目標。此外，我們也推演出一種對混合分布重新參數化 (reparameterization) 的方法至前述的模型壓縮中，使得我們提出的深度學習演算法可以利用隨機梯度下降法進行優化。在 MNIST 與 CIFAR-10/CIFAR-100 資料集訓練下的實驗結果均顯示，我們所提出的方法是非常具有競爭力的。 Efficient deep learning computing has recently received considerable attention. It saves computational costs, and potentially realizes model inference using on-chip devices. Regularization of parameters is a common approach to compress the model. $L_0$ regularization is one of the efficient regularizers since it penalizes the non-zero parameters without any shrinkage of larger values. However, the combinatorial nature of the $L_0$ norm makes it an intractable term. A previous work approximated the $L_0$ norm using the Concrete distribution with emulated binary gates, and collectively determined which weights should be pruned. In this thesis, a more general framework for relaxing binary gates through mixture distributions is proposed. With the proposed method, any mixture pair of distributions converging to $delta(0)$ and $delta(1)$ can be applied to construct smoothed binary gates. We further introduce a reparameterization method for mixture distributions to the field of model compression. Reparameterized smoothed-binary gates drawn from mixture distributions are capable of conducting efficient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments show that we achieve the state-of-the-art in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs).
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340
DOI:	10.6342/NTU201900792
全文授權:	有償授權
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	12.95 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。