Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340
Title: | 基於混合分布正規化之模型壓縮方法 A Method of Mixture-Distributed Regularization for Model Compression |
Authors: | Chang-Ti Huang 黃昌第 |
Advisor: | 吳家麟(Ja-Ling Wu) |
Keyword: | 深度學習,模型壓縮,網路縮減,正規化,最佳化, Deep Learning,Model Compression,Network Reduction,Regularization,Optimization, |
Publication Year : | 2019 |
Degree: | 碩士 |
Abstract: | 高效能深度學習計算是個重要的課題,它不僅可以節省計算成本,更能將人工智慧實現於行動裝置之中。正規化 (regularization) 是一種常見的模型壓縮方法,而 $L_0$ 範數 (norm) 的正規化是其中一種有效的方法。由於此範數之定義為非零參數的個數,因此相當適合作為神經網路參數稀疏化的約束條件 (sparsity constraints)。然而也因為 $L_0$ 範數的定義使它具離散性並成為數學上棘手的問題。一個早先的研究方法利用 Concrete distribution 來模擬二元邏輯閘,並利用這個邏輯閘的概念來決定哪些參數應該進行剪枝 (pruning)。本論文提出一種更可靠的框架來模擬二元邏輯閘。此框架是一種基於混合分布 (mixture distributions) 建構而成的的正規化項。任何一對對稱且收斂於 $delta(0)$ 與 $delta(1)$ 的分布皆可以在我們提出的框架下成為近似二元的邏輯閘,進而估算 $L_0$ 正規化,達成模型壓縮及網路縮減的目標。此外,我們也推演出一種對混合分布重新參數化 (reparameterization) 的方法至前述的模型壓縮中,使得我們提出的深度學習演算法可以利用隨機梯度下降法進行優化。在 MNIST 與 CIFAR-10/CIFAR-100 資料集訓練下的實驗結果均顯示,我們所提出的方法是非常具有競爭力的。 Efficient deep learning computing has recently received considerable attention. It saves computational costs, and potentially realizes model inference using on-chip devices. Regularization of parameters is a common approach to compress the model. $L_0$ regularization is one of the efficient regularizers since it penalizes the non-zero parameters without any shrinkage of larger values. However, the combinatorial nature of the $L_0$ norm makes it an intractable term. A previous work approximated the $L_0$ norm using the Concrete distribution with emulated binary gates, and collectively determined which weights should be pruned. In this thesis, a more general framework for relaxing binary gates through mixture distributions is proposed. With the proposed method, any mixture pair of distributions converging to $delta(0)$ and $delta(1)$ can be applied to construct smoothed binary gates. We further introduce a reparameterization method for mixture distributions to the field of model compression. Reparameterized smoothed-binary gates drawn from mixture distributions are capable of conducting efficient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments show that we achieve the state-of-the-art in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs). |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340 |
DOI: | 10.6342/NTU201900792 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 資訊網路與多媒體研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-108-1.pdf Restricted Access | 12.95 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.