Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7266
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
---|---|---|
dc.contributor.advisor | 林宗男(Tsung-Nan Lin) | |
dc.contributor.author | Wen-Yu Chang | en |
dc.contributor.author | 張文于 | zh_TW |
dc.date.accessioned | 2021-05-19T17:40:46Z | - |
dc.date.available | 2024-08-05 | |
dc.date.available | 2021-05-19T17:40:46Z | - |
dc.date.copyright | 2019-08-05 | |
dc.date.issued | 2019 | |
dc.date.submitted | 2019-07-31 | |
dc.identifier.citation | [1] M. Abramowitz. Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables. Dover Publications, Inc., New York, NY, USA, 1974.
[2] R. Arora, A. Basu, P. Mianjy, and A. Mukherjee. Understanding deep neural net works with rectified linear units. ICLR, 2018. [3] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv:1607.06450, 2016. [4] M. Chen, J. Pennington, and S. S. Schoenholz. Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. Proceedings of the 35th International Conference on Machine Learning, 80:873– 882, 2018. [5] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015. [6]G.Cybenko.Approximationsbysuperpositionsofsigmoidalfunction.Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989. [7] Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, and Y. Bengio. Identi fying and attacking the saddle point problem in highdimensional nonconvex opti mization. Advances in Neural Information Processing Systems 27, pages 2933–2941, 2014. [8]F.Gantmacher.Thetheoryofmatrices.Number1inTheTheoryofMatrices.Chelsea Pub. Co., 1960. [9] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artifi cial Intelligence and Statistics, 9:249–256, 2010. [10] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. AIS TATS, 15:275, 2011. [11] I. J. Goodfellow, O. Vinyals, and A. M. Saxe. Qualitatively characterizing neural network optimization problems. ICLR 2015, 2014. [12]B.Hanin.Universalfunctionapproximationbydeepneuralnetswithboundedwidth and relu activations. arXiv:1708.02691, 2017. [13] B. Hanin. Which neural net architectures give rise to exploding and vanishing gradi ents? Neural Information Processing Systems, 2018. [14] B. Hanin and D. Rolnick. Complexity of linear regions in deep networks. ICML, 2019. [15] K. He and J. Sun. Convolutional neural networks at constrained time cost. CVPR, 2015. [16]K.He,X.Zhang,S.Ren,andJ.Sun.Delvingdeepintorectifiers:Surpassinghuman level performance on imagenet classification. IEEE International Conference on Computer Vision, 2015. [17] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [18] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. European Conference on Computer Vision, pages 630–645, 2016. [19] G. Hinton, G. Dahl, A.r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, B. Kings bury, and T. Sainath. Deep neural networks for acoustic modeling in speech recog nition. IEEE Signal Processing Magazine, 29:82–97, 2012. [20] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, 2015. [21] J. R. Ipsen. Products of independent gaussian random matrices. arXiv:1510.06128, 2015. [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pages 1106–1114, 2012. [23] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [24] Y. LeCun, L. Bottou, G. B. Orr, and K.R. Müller. Efficient backprop. Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop, pages 9–50, 1998. [25] V. I. Oseledets. Multiplicative ergodic theorem: Characteristic lyapunov exponents of dynamical systems. Trudy MMO (in Russian), 19:179–210, 1968. [26] J. Pennington and S. S. Schoenholz. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. Advances in Neural Information Processing Systems 30, pages 4788–4798, 2017. [27] J. Pennington, S. S. Schoenholz, and S. Ganguli. The emergence of spectral uni versality in deep networks. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 911 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1924–1932, 2018. [28] B. Poole, S. Lahiri, M. Raghu, J. SohlDickstein, and S. Ganguli. Exponential ex pressivity in deep neural networks through transient chaos. Neural Information Pro cessing Systems, 2016. [29] M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. SohlDickstein. On the expres sive power of deep neural networks. arXiv:1606.05336, 2016. [30] D. Rolnick and M. Tegmark. The power of deeper networks for expressing natural functions. ICLR, 2018. [31] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. International Conference on Learning Representations (ICLR), 2013. [32] S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. SohlDickstein. Deep information propagation. International Conference on Learning Representations (ICLR), 2017. [33] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, I. S. Nal Kalchbrenner, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of go without human knowledge. Nature, 529(7587):484–489, 2016. [34] R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. ICML 2015 Deep Learning workshop, 2015. [35]N.Tajbakhsh,J.Y.Shin,S.R.Gurudu,R.T.Hurst,C.B.Kendall,M.B.Gotway,and J. Liang. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE Transactions on Medical Imaging, 35(5):1299–1312, 2016. [36]M.Taki.Deepresidualnetworksandweightinitialization.arXiv:1709.02956,2017. [37] M. Telgarsky. Representation benefits of deep feedforward networks. CoRR, abs/1509.08101, 2015. [38] A. Veit, M. J. Wilber, and S. J. Belongie. Residual networks are exponential ensem bles of relatively shallow networks. CoRR, abs/1605.06431, 2016. [39] S. Wiesler and H. Ney. A convergence analysis of loglinear training. Advances in Neural Information Processing Systems 24, pages 657–665, 2011. [40] Y. Wu, M. Schuster, Z. Chen, M. N. Quoc V. Le, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean. Google’s neural machine translation system: Bridging the gap. arXiv:1609.08144, 2016. [41] L. Xiao, Y. Bahri, J. SohlDickstein, S. S. Schoenholz, and J. Pennington. Dynam ical isometry and a mean field theory of cnns: How to train 10,000layer vanilla convolutional neural networks. Proceedings of the 35th International Conference on Machine Learning, 2018. [42] G. Yang and S. S. Schoenholz. Mean field residual networks: On the edge of chaos. Advances in neural information processing systems, pages 7103–7114, 2017. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7266 | - |
dc.description.abstract | 梯度爆炸/消失,一直被認為是訓練深層神經網路的一大挑戰。在這篇論文裡,我們發現一種被稱為「神經元消失 (Vanishing Nodes)」的新現象同樣也會使訓練更加困難。當神經網路的深度增加,神經元彼此之間的會呈現高度相關。這種行為會導致神經元之間的相似程度提高。也就是隨著神經網路變深,網路內的神經元冗餘程度會提高。我們把這個問題稱為「神經元消失 (Vanishing Nodes)」。可以藉由神經網路的相關參數來對神經元消失的程度做推算;結果可以得出神經元消失的程度與網路深度成正比、與網路寬度成反比。從數值分析的結果呈現出:在反向傳播算法的訓練下,神經元消失的現象會變得更明顯。我們也提出:神經元消失是除了梯度爆炸/消失以外,訓練深層神經網路的另一道難關。 | zh_TW |
dc.description.abstract | It is well known that the problem of vanishing/exploding gradients creates a challenge when training deep networks. In this paper, we show another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes show more highly correlated behavior. This correlated behavior results in great similarity between these nodes. The redundancy of hidden nodes thus increases as the network becomes deeper. We call this problem 'Vanishing Nodes.' This behavior of vanishing nodes can be characterized quantitatively by the network parameters, which is shown analytically to be proportional to the network depth and inversely proportional to the network width. The numerical results suggest that the degree of vanishing nodes will become more evident during back-propagation training. Finally, we show that vanishing/exploding gradients and vanishing nodes are two different challenges that increase the difficulty of training deep neural networks. | en |
dc.description.provenance | Made available in DSpace on 2021-05-19T17:40:46Z (GMT). No. of bitstreams: 1 ntu-108-R06942064-1.pdf: 17746185 bytes, checksum: 20cb8912b917d7d7e3810b4ac5063af0 (MD5) Previous issue date: 2019 | en |
dc.description.tableofcontents | 摘要 iii
Abstract v 1 Introduction 1 2 Related Work 3 2.1 Difficultiesintrainingdeepnerualnetworks . . . . . . . . . . . . . . . . 3 2.2 Representationpowerofdeepneuralnetwork . . . . . . . . . . . . . . . 3 3 Vanishing Nodes: correlation between hidden nodes 5 3.1 VanishingNodeIndicator.......................... 6 3.2 Impactsofbackpropagation ........................ 14 3.3 Relationship between the VNI and the redundancy of nodes . . . . . . . . 15 3.4 Thevanishingoftherepresentationpower . . . . . . . . . . . . . . . . . 18 3.5 The effect of the orthogonal weight matrices to the representation power . 21 3.6 Representation power of residuallike architectures . . . . . . . . . . . . 23 4 Variance propagation of deep neural networks 39 4.1 Comparison of exploding/vanishing gradients and vanishing nodes . . . . 39 4.2 Normpreservingweightinitialization ................... 41 4.3 Thetwoobstaclesfortrainingdeepnerualnetworks . . . . . . . . . . . . 42 5 Experiments 45 5.1 Probability of failed training caused by vanishing nodes . . . . . . . . . . 45 5.2 Analyses of failed training caused by vanishing nodes . . . . . . . . . . . 50 6 Conclusion 53 Bibliography 55 | |
dc.language.iso | en | |
dc.title | 神經元消失:影響深層神經網路之表現能力,並使其難以訓練的新現象 | zh_TW |
dc.title | Vanishing Nodes: The Phenomena That Affects The Representation Power and The Training Difficulty of Deep Neural Networks | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林昌鴻,李宏毅,李琳山,吳沛遠 | |
dc.subject.keyword | 深度學習,梯度消失,機器學習理論,表現能力,神經網路架構,網路訓練問題,正交參數初始化,冗餘神經元,隨機矩陣, | zh_TW |
dc.subject.keyword | Deep learning,Vanishing gradient,Learning theory,Representation power,Network architecture,Training difficulty,Orthogonal initialization,Node redundancy,Random matrices, | en |
dc.relation.page | 59 | |
dc.identifier.doi | 10.6342/NTU201901446 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2019-07-31 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
dc.date.embargo-lift | 2024-08-05 | - |
Appears in Collections: | 電信工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-108-1.pdf | 17.33 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.