利用資料熵進行神經網路之壓縮

Tse-Wen Chen; 陳則彣

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59139

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉邦鋒(Pang-Feng Liu)
dc.contributor.author	Tse-Wen Chen	en
dc.contributor.author	陳則彣	zh_TW
dc.date.accessioned	2021-06-16T09:16:37Z	-
dc.date.available	2020-08-24
dc.date.copyright	2020-08-24
dc.date.issued	2020
dc.date.submitted	2020-08-18
dc.identifier.citation	1. Z. Cai and N. Vasconcelos. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018 2. A. Chaddad, B. Naisiri, M. Pedersoli, E. Granger, C. Desrosiers, and M. Toews. Modeling information flow through deep neural networks. arXiv preprint arXiv:1712.00003, 2017 3. Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014 4. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015 5. S. H. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou. Lets keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv preprint arXiv:1608.06037, 2016 6. H. Hu, R. Peng, Y.-W. Tai, and C.-K. Tang. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016 7. M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014 8. B. Kayalibay, G. Jensen, and P. van der Smagt. Cnn-based segmentation of medical imaging data. arXiv preprint arXiv:1701.03056, 2017 9. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016 10. Y. Li, S. Lin, B. Zhang, J. Liu, D. Doermann, Y. Wu, F. Huang, and R. Ji. Exploiting kernel sparsity and entropy for interpretable cnn compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2800–2809, 2019 11. S. Lin, R. Ji, C. Chen, D. Tao, and J. Luo. Holistic cnn compression via low-rank decomposition with knowledge transfer. IEEE transactions on pattern analysis and machine intelligence, 41(12):2889–2905, 2018 12. J.-H. Luo and J. Wu. An entropy-based pruning method for cnn compression. arXiv preprint arXiv:1706.05791, 2017 13. F. Milletari, N. Navab, and S.-A. Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. IEEE, 2016 14. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, pages 8026–8037, 2019 15. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015 16. A. Salama, O. Ostapenko, T. Klein, and M. Nabi. Pruning at a glance: Global neural pruning for model compression. arXiv preprint arXiv:1912.00200, 2019 17. R. Shwartz-Ziv and N. Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017 18. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014 19. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence, 2017 20. M. Tan and Q. V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946, 2019 21. N. Tishby and N. Zaslavsky. Deep learning and the information bottleneck principle. In 2015 IEEE Information Theory Workshop (ITW), pages 1–5. IEEE, 2015 22. H. Touvron, A. Vedaldi, M. Douze, and H. Jégou. Fixing the train-test resolution discrepancy: Fixefficientnet. arXiv preprint arXiv:2003.08237, 2020 23. Wikipedia. Mutual information — Wikipedia, The Free Encyclopedia, 2020. Accessed 5-August-2020 24. T.-J. Yang, Y.-H. Chen, and V. Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5687–5695, 2017 25. A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/59139	-
dc.description.abstract	捲積神經網路在各種電腦視覺領域中都獲得了巨大的成功。然而，受限於硬體以及軟體的資源限制，網路壓縮是一項可以使捲積神經網路更小並且加速網路訓練以及推論的速度。我們著重在通道剪枝這一部分，這屬於網路壓縮的一個部分。首先評估捲積層各個通道的重要程度，並且剪枝去掉較不重要的通道。在這篇論文中，我們提出了權重相互資訊這個方法，相較於傳統的L1范數方法以及其他熵相關的方法有更好的表現。我們首先計算特徵圖譜以及標籤兩者的相互資訊，並使用該相互資訊來移除熵中與分類無關的資訊。接著，我們會考慮通過一組權重對於一個連續的隨機變數的影響，並計算輸出的熵總和。我們在三個資料集上使用網路Simplenet實現通道剪枝。這三個資料集分別為SVHN，CIFAR-10以及CIFAR-100。當參數百分比對所有捲積層設置為0.3時，我們提出的權重相互資訊方法會有比輸出L1這個方法分別對於SVHN，CIFAR-10和CIFAR-100高出1.52%，13.24%與7.90%的準確度。在全域剪枝的實驗中，我們提出的權重相互資訊分法在參數百分比為0.45時對於SVHN資料集會比輸出L1方法高出2%的準確度。當參數百分比為0.53時，權重相互資訊方法對應CIFAR-100資料集相較輸出L1方法高出1.5%的準確度。唯一的例外是在對應CIFAR-10資料集時，權重相互資訊這個方法在參數百分比為0.40時，較輸出L1方法少了5%的準確度。但同樣情況下，利用高斯分布估計出的熵方法卻出乎意料的高出輸出L1方法約51%的準確度。	zh_TW
dc.description.abstract	Convolutional neural network (CNN) achieves great success especially in computer vision tasks. However, due to the limitation of hardware/software resources, model compression is an important technique to make CNNs smaller and even faster to train or inference. We focus on channel pruning, which is a part of model compression, that evaluates the importance of each channel in a convolution layer and prune away the less importance channels. In this paper, we propose the weighted mutual information metric which outperforms l1-norm pruning metric and other entropy metrics. We first compute the mutual information between feature maps and labels in order to remove information which is not relevant to classification task. We then consider the effect of passing a continuous random variable through filter weights and estimate the output entropy. We perform channel pruning on three datasets, SVHN, CIFAR-10 and CIFAR-100 using the model Simplenet. When parameter percentage is 0.3 for all convolution layers, our weighted mutual information method has 1.52%, 13.24% and 7.90% more accuracy than the output L1 metric on SVHN dataset, CIFAR-10 dataset and CIFAR-100 dataset. In the global pruning experiment, our weighted mutual information metric has about 2% more accuracy than output L1 metric when parameter ratio is about 0.45 on SVHN dataset. On CIFAR-100 dataset, our metric has about 1.5% more accuracy than output L1 metric when parameter ratio is about 0.53. The only exception is CIFAR-10 dataset, where our metric has about 5% worse than output L1 metric when parameter ratio is about 0.40. The entropy metric estimated according to Gaussian distribution unexpectedly outperforms output L1 metric by about 51% under same condition.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T09:16:37Z (GMT). No. of bitstreams: 1 U0001-1408202015505700.pdf: 1063076 bytes, checksum: 96cbe7ff0a591f93fb616ba4dd1434e7 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Chapter 2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Information theory on neural network . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Unstructured pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Structured pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Channel Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 4.1 L1-norm of filter weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Entropy of feature map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 10 4.3 Mutual information between feature map and label . . . . . . . . . . . 10 4.4 Weighted Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 5.1 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 14 5.2 Local Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.3 Global Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
dc.language.iso	zh-TW
dc.subject	機器學習	zh_TW
dc.subject	網路壓縮	zh_TW
dc.subject	濾波器剪枝	zh_TW
dc.subject	熵	zh_TW
dc.subject	捲積神經網路	zh_TW
dc.subject	entropy	en
dc.subject	CNN	en
dc.subject	model compression	en
dc.subject	filter pruning	en
dc.subject	machine learning	en
dc.title	利用資料熵進行神經網路之壓縮	zh_TW
dc.title	Exploiting Data Entropy for Neural Network Compression	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	吳真貞(Jan-Jan Wu),許俊琛(Jun-Chen XU)
dc.subject.keyword	網路壓縮,濾波器剪枝,熵,捲積神經網路,機器學習,	zh_TW
dc.subject.keyword	model compression,filter pruning,entropy,CNN,machine learning,	en
dc.relation.page	30
dc.identifier.doi	10.6342/NTU202003440
dc.rights.note	有償授權
dc.date.accepted	2020-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-1408202015505700.pdf 未授權公開取用	1.04 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。