Information Bottleneck應用於深度學習網路之行為特徵分析

Shou-Chun Kao; 高紹鈞

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56594

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳家麟(Ja-Ling Wu)
dc.contributor.author	Shou-Chun Kao	en
dc.contributor.author	高紹鈞	zh_TW
dc.date.accessioned	2021-06-16T05:36:44Z	-
dc.date.available	2020-08-03
dc.date.copyright	2020-08-03
dc.date.issued	2020
dc.date.submitted	2020-07-27
dc.identifier.citation	1. Cover, T. M.; Thomas, J. A., Elements of information theory. John Wiley Sons: 2012. 2. Elad, A.; Haviv, D.; Blau, Y.; Michaeli, T., The effectiveness of layer-by-layer training using the information bottleneck principle. 2018. 3. Wu, T.; Fischer, I.; Chuang, I.; Tegmark, M., Learnability for the Information Bottleneck. 2019. 4. Tishby, N.; Pereira, F. C.; Bialek, W. J. a. p. p., The information bottleneck method. 2000. 5. Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. In Robust physical-world attacks on deep learning visual classification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; 2018; pp 1625-1634. 6. Andrychowicz, M.; Denil, M.; Gomez, S.; Hoffman, M. W.; Pfau, D.; Schaul, T.; Shillingford, B.; De Freitas, N. In Learning to learn by gradient descent by gradient descent, Advances in Neural Information Processing Systems, 2016; 2016; pp 3981-3989. 7. Shwartz-Ziv, R.; Tishby, N. J. a. p. a., Opening the black box of deep neural networks via information. 2017. 8. Shamir, O.; Sabato, S.; Tishby, N. J. T. C. S., Learning and generalization with the information bottleneck. 2010, 411, (29-30), 2696-2711. 9. Cheng, H.; Lian, D.; Gao, S.; Geng, Y. J. E., Utilizing Information Bottleneck to Evaluate the Capability of Deep Neural Networks for Image Classification. 2019, 21, (5), 456. 10. Tishby, N.; Zaslavsky, N. In Deep learning and the information bottleneck principle, 2015 IEEE Information Theory Workshop (ITW), 2015; IEEE: 2015; pp 1-5. 11. Saxe, A. M.; Bansal, Y.; Dapello, J.; Advani, M.; Kolchinsky, A.; Tracey, B. D.; Cox, D. D., On the information bottleneck theory of deep learning. 2018. 12. Chaudhari, P.; Soatto, S. In Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks, 2018 Information Theory and Applications Workshop (ITA), 2018; IEEE: 2018; pp 1-10. 13. Goldfeld, Z.; Van Den Berg, E.; Greenewald, K.; Melnyk, I.; Nguyen, N.; Kingsbury, B.; Polyanskiy, Y. In Estimating information flow in deep neural networks, Proceedings of the 36th International Conference on Machine Learning, 2019; 2019; pp 2299-2308. 14. Amjad, R. A.; Geiger, B. C., Learning representations for neural network-based classification using the information bottleneck principle. IEEE Transactions on Pattern Analysis and Machine Intelligence 2019. 15. Belghazi, M. I.; Baratin, A.; Rajeswar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, R. D., Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 2018. 16. Fang, H.; Wang, V.; Yamaguchi, M. J. E., Dissecting Deep Learning Networks—Visualizing Mutual Information. 2018, 20, (11), 823. 17. Schulz, K.; Sixt, L.; Tombari, F.; Landgraf, T., Restricting the flow: Information bottlenecks for attribution. arXiv preprint arXiv:2001.00396 2020. 18. Ver Steeg, G.; Galstyan, A. In Maximally informative hierarchical representations of high-dimensional data, Artificial Intelligence and Statistics, 2015; 2015; pp 1004-1012. 19. Han, S.; Mao, H.; Dally, W. J. J. a. p. a., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. 2015. 20. Dai, X.; Yin, H.; Jha, N. K. J. a. p. a., NeST: A neural network synthesis tool based on a grow-and-prune paradigm. 2017. 21. Yu, R.; Li, A.; Chen, C.-F.; Lai, J.-H.; Morariu, V. I.; Han, X.; Gao, M.; Lin, C.-Y.; Davis, L. S. In Nisp: Pruning networks using neuron importance score propagation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; 2018; pp 9194-9203. 22. Dai, B.; Zhu, C.; Wipf, D., Compressing neural networks using the variational information bottleneck. arXiv preprint arXiv:1802.10399 2018. 23. Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. J. a. p. a., Intriguing properties of neural networks. 2013. 24. Goodfellow, I. J.; Shlens, J.; Szegedy, C. J. a. p. a., Explaining and harnessing adversarial examples. 2014. 25. Alemi, A. A.; Fischer, I.; Dillon, J. V.; Murphy, K. J. a. p. a., Deep variational information bottleneck. 2016. 26. Chechik, G.; Globerson, A.; Tishby, N.; Weiss, Y. J. J. o. m. l. r., Information bottleneck for Gaussian variables. 2005, 6, (Jan), 165-188. 27. Doersch, C. J. a. p. a., Tutorial on variational autoencoders. 2016. 28. Kingma, D. P.; Welling, M., Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 2013. 29. Kolchinsky, A.; Tracey, B. D.; Wolpert, D. H. J. E., Nonlinear information bottleneck. 2019, 21, (12), 1181. 30. Kolchinsky, A.; Tracey, B. J. E., Estimating mixture entropy with pairwise distances. 2017, 19, (7), 361. 31. Achille, A.; Soatto, S., Information dropout: learning optimal representations through noise. 2016. 32. Sinha, S.; Bharadhwaj, H.; Goyal, A.; Larochelle, H.; Garg, A.; Shkurti, F. J. a. p. a., DIBS: Diversity inducing Information Bottleneck in Model Ensembles. 2020. 33. Strouse, D.; Schwab, D. J., The deterministic information bottleneck. Neural computation 2017, 29, (6), 1611-1630. 34. Achille, A.; Soatto, S. J. I. t. o. p. a.; intelligence, m., Information dropout: Learning optimal representations through noisy computation. 2018, 40, (12), 2897-2905.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56594	-
dc.description.abstract	深度類神經網路(Deep Neural Network : DNN)可以說是現代人工智慧崛起的主要技術核心，它能夠透過學習將非常複雜的問題轉換成一個輸出輸入之間的非線性函數關係; 在電腦視覺、語言處理、影像處理等多個領域都屢創佳績，其問題處理的準確率甚至超越了人類的能力。但我們對DNN模型(model)內部的運作原理仍幾乎是一無所知。例如，無法設計出一個衡量標準來幫助我們對於不同的任務建構最合適的模型架構(model architecture)、無法解釋DNN如何從學到的知識進行判斷、對於DNN已知的對抗性攻擊(adversarial attack)無法有穩定性的保證等問題。這也使得DNN被詬病為是一個 '黑盒子'，無法讓人對其做出的決策有絕對的信心。近年來越來越多人嘗試想要為DNN提供運作行為的可解釋性，其中一個較理論的方向就是基於資訊理論(Information Theory)。資訊理論已經在數位通訊、資料壓縮等領域運用已久，研究者們試圖建立起DNN與資訊理論間的連結，來分析甚至進一步地優化DNN，相關論述中又以Tishby等人的資訊瓶頸(Information Bottleneck : IB)最為著名。本文介紹了一些資訊理論與DNN之間的連結及應用，並主要探討基於IB的DNN運作原理分析以及相關文獻中出現過的正反觀點，總結出在不同情況所觀察到的資訊變化未必能真實反映出網路學習到的資訊含量，這會影響到基於資訊理論在DNN上的應用其效能或可行性。	zh_TW
dc.description.abstract	Deep neural networks have become the technical cores of modern artificial intelligence research in recent years. They are able to turn a complex problem into non-linear relation between inputs and outputs by learning, and they have achieved practical success in many tasks, such as computer vision, natural language processing and image processing. Surprisingly, there are many scenarios in which DNNs could even outperform human. Despite their great success, very little is known about inner organization or theoretic principle of DNNs. For instance, we could not find a standard to construct the most appropriate model architecture for specific task or explain how DNNs learn from structured knowledge (training data); nor could we guarantee robustness of DNNs against adversarial attacks. These make DNNs be called “black box”, which makes it difficult for people to have confidence in the decisions being made by DNNs. However, more and more research attempts to offer reasonable explanations of learning with DNNs in recent years, one of these directions is based on information theory which has been applied in digital communication and data compression for many years. Researchers have attempted to connect DNNs and information theory in order to analyze or further optimize DNNs performance. In these related points of view, “information bottleneck” proposed by Tishby et al. is widely used in different applications. The aim of this thesis is to investigate the different perspectives of information bottleneck and to introduce applications or analytical methods in DNNs behavior. In conclusion, we discuss whether there exist genuine physical meanings in observed information variation while training DNNs.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T05:36:44Z (GMT). No. of bitstreams: 1 U0001-2407202021543000.pdf: 6293066 bytes, checksum: 031b946a7f7b3c2ef8534c3f18d341e5 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	中文摘要 1 Abstract 2 目錄 3 表格列表 5 圖片列表 6 1. 背景知識 9 1.1 資訊理論回顧 9 1.2 資訊-失真理論 (Rate-Distortion Theory) 12 1.3 資訊瓶頸 (Information Bottleneck) 14 1.4 IB和DNN之間的相關性 (Relevance) 15 2. 資訊瓶頸之矛盾論述 23 2.1 壓縮與雙邊飽和非線性的關聯性 (Compression and Double-Saturating Nonlinearity) 23 2.2 梯度下降方案 -- 分批和隨機 (Gradient Descent Scheme --- Batch and Stochastic) 25 2.3 擬合和壓縮的時間階段 25 2.4 IB之爭論 (Debate on IB Theory) 26 3. 誤用資訊理論導致的問題 32 3.1 帶有雜訊的DNN估計 (Estimation with Noisy DNN) 32 3.2 基於Bin估算器中的雜訊 (Noise in Bin-based Estimator) 33 4. Mutual Information Estimator 35 4.1 Bin-based MI Estimator 35 4.2 相互資訊神經估計器 (Mutual Information Neural Estimator : MINE) 36 5. IB的應用 40 5.1 效能分析 (Performance Analysis) 40 5.1.1 模型選擇 (Model Selection) 40 5.1.2 優化器分析 (Optimizer Analysis) 42 5.1.3 不平衡數據的影響 (Unbalanced Data) 43 5.1.4 樣本中不同維度的影響 46 5.2 模型改進 (Model Improvement) 47 5.2.1 基於 IB 進行訓練 (Training with IB) 47 5.2.2 模型壓縮 (Model Compression) 50 5.2.3 對抗性攻擊 (Adversarial Attack) 51 5.2.4 IB的變體版本 (Variational Version of IB) 53 6 實驗 59 6.1 Mutual Information Neural Estimator 59 6.2 不同的激活函數 63 6.2.1 Hyperbolic Tangent (tanh) 63 6.2.2 Rectified Linear Unit (ReLU) 65 6.2.3 sigmoid 65 6.3 Stochasticity of Model Optimizer 66 6.3.1 tanh SGD 67 6.3.2 ReLU SGD 67 6.4 表示分佈 (Representation Distribution) 68 6.4.1 Tanh 網路之表示分佈 69 6.4.2 ReLU 網路之表示分佈 70 6.4.3 sigmoid 網路之表示分佈 72 6.5 權重分佈 74 7 結論 80 參考資料 83
dc.language.iso	zh-TW
dc.subject	類神經網路	zh_TW
dc.subject	網路泛化能力	zh_TW
dc.subject	資訊瓶頸	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	資訊理論	zh_TW
dc.subject	網路可解釋性	zh_TW
dc.subject	Information Bottleneck	en
dc.subject	Machine Learning	en
dc.subject	Information Theory	en
dc.subject	Neural Networks	en
dc.subject	Model Explainability	en
dc.subject	Model Generalization	en
dc.title	Information Bottleneck應用於深度學習網路之行為特徵分析	zh_TW
dc.title	Information Bottleneck in DNN Behavior Analysis and its Applications	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.advisor-orcid	吳家麟(0000-0002-3631-1551)
dc.contributor.oralexamcommittee	曾維新(Wei-Xin Zeng),陳祝嵩(Chu-Song Chen),黃俊翔(Chun-Hsiang Huang),陳駿丞(Jun-Cheng Chen)
dc.subject.keyword	機器學習,類神經網路,資訊理論,資訊瓶頸,網路泛化能力,網路可解釋性,	zh_TW
dc.subject.keyword	Machine Learning,Neural Networks,Information Theory,Information Bottleneck,Model Generalization,Model Explainability,	en
dc.relation.page	85
dc.identifier.doi	10.6342/NTU202001845
dc.rights.note	有償授權
dc.date.accepted	2020-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-2407202021543000.pdf 未授權公開取用	6.15 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。