Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊網路與多媒體研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳家麟(Ja-Ling Wu)
dc.contributor.authorChang-Ti Huangen
dc.contributor.author黃昌第zh_TW
dc.date.accessioned2021-06-17T07:29:21Z-
dc.date.available2022-07-01
dc.date.copyright2019-07-01
dc.date.issued2019
dc.date.submitted2019-06-18
dc.identifier.citationBibliography
[1] E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup. Conditional computation in neural networks for faster models. arXiv preprint arXiv:1511.06297, 2015.
[2] Y. Bengio, N. Léonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
[3] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. arXiv preprint arXiv:1602.02830, 2016.
[4] W. Grathwohl, D. Choi, Y. Wu, G. Roeder, and D. Duvenaud. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In International Conference on Learning Representations (ICLR), 2018.
[5] A. Graves. Stochastic backpropagation through mixture density distributions. arXiv preprint arXiv:1607.05690, 2016.
[6] S. Gu, S. Levine, I. Sutskever, and A. Mnih. Muprop: Unbiased backpropagation for stochastic neural networks. arXiv preprint arXiv:1511.05176, 2015.
[7] E. Gumbel. Statistical theory of extreme values and some practical applications: a series of lectures. Applied mathematics series. U. S. Govt. Print. Office, 1954.
[8] Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems, pages 1379–1387, 2016.
[9] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015.
[10] K.He, X.Zhang, S.Ren, andJ.Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[12] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
[13] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015.
[14] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
[15] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. Searching for mobilenetv3. arXiv preprint arXiv:1905.02244, 2019.
[16] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient convolutional neural networks for mo- bile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[17] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <0.5 mb model size. arXiv preprint arXiv:1602.07360, 2016.
[18] E. Jang, S. Gu, and B. Poole. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
[19] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR), 2015.
[20] D. P. Kingma and M. Welling. Auto-encoding variational Bayes. International Conference on Learning Representations (ICLR), 2014.
[21] A.Krizhevsky and G.Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
[22] Y. LeCun. The mnist database of handwritten digits. 1998.
[23] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. Gradient-based learning applied
to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[24] Y. LeCun, J. S. Denker, and S. A. Solla. Optimal brain damage. In Advances in
neural information processing systems, pages 598–605, 1990.
[25] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell. Rethinking the value of network
pruning. In International Conference on Learning Representations (ICLR), 2019.
[26] C. Louizos, K. Ullrich, and M. Welling. Bayesian compression for deep learning. In
Advances in Neural Information Processing Systems, pages 3288–3298, 2017.
[27] C.Louizos, M.Welling, and D.P.Kingma. Learning sparse neural networks through L0 regularization. In International Conference on Learning Representations (ICLR), 2018.
[28] C. J. Maddison, A. Mnih, and Y. W. Teh. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
[29] C. J. Maddison, D. Tarlow, and T. Minka. A* sampling. In Advances in Neural Information Processing Systems, pages 3086–3094, 2014.
[30] D.Molchanov, A.Ashukha, and D.Vetrov. Variational dropout sparsifies deep neural networks. arXiv preprint arXiv:1701.05369, 2017.
[31] P.Molchanov, S.Tyree, T.Karras, T.Aila, and J.Kautz. Pruning convolutional neural networks for resource efficient inference. In International Conference on Learning Representations (ICLR), 2017.
[32] K. Neklyudov, D. Molchanov, A. Ashukha, and D. P. Vetrov. Structured bayesian pruning via log-normal multiplicative noise. In Advances in Neural Information Processing Systems, pages 6775–6784, 2017.
[33] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
[34] A.Polino, R.Pascanu, and D.Alistarh. Model compression via distillation and quantization. In International Conference on Learning Representations (ICLR), 2018.
[35] L. B. Rall. Automatic differentiation: Techniques and applications. 1981.
[36] D. J. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278–1286, 2014.
[37] J.T.Rolfe. Discrete variational autoencoders. International Conference on Learning Representations (ICLR), 2017.
[38] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. Nature, 323:533–536, 1986.
[39] G. Tucker, A. Mnih, C. J. Maddison, J. Lawson, and J. Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. In Advances in Neural Information Processing Systems, pages 2627–2636, 2017.
[40] K. Ullrich, E. Meeds, and M. Welling. Soft weight-sharing for neural network compression. In International Conference on Learning Representations (ICLR), 2017.
[41] A. Vahdat, E. Andriyash, and W. G. Macready. Dvae#: Discrete variational autoencoders with relaxed Boltzmann priors. In Advances in Neural Information Process- ing Systems, 2018.
[42] A. Vahdat, W. Macready, Z. Bian, A. Khoshaman, and E. Andriyash. Dvae++: Discrete variational autoencoders with overlapping transformations. In International Conference on Machine Learning, pages 5042–5051, 2018.
[43] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li. Learning structured sparsity in deep neural networks. In Advances in neural information processing systems, pages 2074– 2082, 2016.
[44] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
[45] J. Wu, Y. Wang, Z. Wu, Z. Wang, A. Veeraraghavan, and Y. Lin. Deep k-means: Re-training and parameter sharing with harder cluster assignments for compressing deep convolutions. arXiv preprint arXiv:1806.09228, 2018.
[46] R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y. Lin, and L. S. Davis. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9194–9203, 2018.
[47] S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73340-
dc.description.abstract高效能深度學習計算是個重要的課題,它不僅可以節省計算成本,更能將人工智慧實現於行動裝置之中。正規化 (regularization) 是一種常見的模型壓縮方法,而 $L_0$ 範數 (norm) 的正規化是其中一種有效的方法。由於此範數之定義為非零參數的個數,因此相當適合作為神經網路參數稀疏化的約束條件 (sparsity constraints)。然而也因為 $L_0$ 範數的定義使它具離散性並成為數學上棘手的問題。一個早先的研究方法利用 Concrete distribution 來模擬二元邏輯閘,並利用這個邏輯閘的概念來決定哪些參數應該進行剪枝 (pruning)。本論文提出一種更可靠的框架來模擬二元邏輯閘。此框架是一種基於混合分布 (mixture distributions) 建構而成的的正規化項。任何一對對稱且收斂於 $delta(0)$ 與 $delta(1)$ 的分布皆可以在我們提出的框架下成為近似二元的邏輯閘,進而估算 $L_0$ 正規化,達成模型壓縮及網路縮減的目標。此外,我們也推演出一種對混合分布重新參數化 (reparameterization) 的方法至前述的模型壓縮中,使得我們提出的深度學習演算法可以利用隨機梯度下降法進行優化。在 MNIST 與 CIFAR-10/CIFAR-100 資料集訓練下的實驗結果均顯示,我們所提出的方法是非常具有競爭力的。zh_TW
dc.description.abstractEfficient deep learning computing has recently received considerable attention. It saves computational costs, and potentially realizes model inference using on-chip devices. Regularization of parameters is a common approach to compress the model. $L_0$ regularization is one of the efficient regularizers since it penalizes the non-zero parameters without any shrinkage of larger values. However, the combinatorial nature of the $L_0$ norm makes it an intractable term. A previous work approximated the $L_0$ norm using the Concrete distribution with emulated binary gates, and collectively determined which weights should be pruned. In this thesis, a more general framework for relaxing binary gates through mixture distributions is proposed. With the proposed method, any mixture pair of distributions converging to $delta(0)$ and $delta(1)$ can be applied to construct smoothed binary gates. We further introduce a reparameterization method for mixture distributions to the field of model compression. Reparameterized smoothed-binary gates drawn from mixture distributions are capable of conducting efficient gradient-based optimization under the proposed deep learning algorithm. Extensive experiments show that we achieve the state-of-the-art in terms of pruned architectures, structured sparsity and the reduced number of floating point operations (FLOPs).en
dc.description.provenanceMade available in DSpace on 2021-06-17T07:29:21Z (GMT). No. of bitstreams: 1
ntu-108-R06944017-1.pdf: 13261977 bytes, checksum: 148e43a65bf77c7345d6042dd9c59152 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontentsContents
口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
1 Introduction 1
2 Constructing the MDR 4
2.1 Gradient estimators 5
2.1.1 The REINFORCE 6
2.1.2 Control variates 7
2.1.3 The reparameterization trick 8
2.1.4 The Concrete relaxation 8
2.2 Relaxing L0 norms through the binary Concrete 9
2.2.1 The Gumbel-max trick 9
2.2.2 The Concrete distribution 10
2.2.3 The binary Concrete 11
2.3 Mixture distributions 13
2.3.1 Mixtures of exponential distributions 15
2.3.2 Mixtures of exponential-uniform distributions 15
2.3.3 Mixtures of power-law function distributions 16
2.4 The Mixture-Distributed Regularization (MDR) 16
3 Optimizing the MDR 19
3.1 Inverse transform sampling 19
3.2 Reparameterization for the MDR 21
3.3 Estimating ζ∗ for testing 25
3.4 Combining the MDR with other regularizers 26
3.5 Group sparsity constraints 27
4 Related Work 29
5 Experiments 32
5.1 The gradient variance of the MDR 32
5.2 Experimental setup 34
5.2.1 Datasets 34
5.2.2 Architectures 35
5.2.3 Implementation details 35
5.3 LeNet-300-100 on MNIST 39
5.4 LeNet-5-Caffe on MNIST 40
5.5 Wide-ResNet on CIFAR 41
6 Conclusions 45
Bibliography 47
dc.language.isoen
dc.subject深度學習zh_TW
dc.subject模型壓縮zh_TW
dc.subject網路縮減zh_TW
dc.subject正規化zh_TW
dc.subject最佳化zh_TW
dc.subjectModel Compressionen
dc.subjectDeep Learningen
dc.subjectOptimizationen
dc.subjectRegularizationen
dc.subjectNetwork Reductionen
dc.title基於混合分布正規化之模型壓縮方法zh_TW
dc.titleA Method of Mixture-Distributed Regularization for Model Compressionen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee朱威達,鄭文皇,胡敏君
dc.subject.keyword深度學習,模型壓縮,網路縮減,正規化,最佳化,zh_TW
dc.subject.keywordDeep Learning,Model Compression,Network Reduction,Regularization,Optimization,en
dc.relation.page51
dc.identifier.doi10.6342/NTU201900792
dc.rights.note有償授權
dc.date.accepted2019-06-19
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊網路與多媒體研究所zh_TW
顯示於系所單位:資訊網路與多媒體研究所

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
12.95 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved