牛頓法於卷積神經網路之應用

Kent Loong Tan; 陳勁龍

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7542

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林智仁(Chih-Jen Lin)
dc.contributor.author	Kent Loong Tan	en
dc.contributor.author	陳勁龍	zh_TW
dc.date.accessioned	2021-05-19T17:46:01Z	-
dc.date.available	2021-08-06
dc.date.available	2021-05-19T17:46:01Z	-
dc.date.copyright	2018-08-06
dc.date.issued	2018
dc.date.submitted	2018-07-30
dc.identifier.citation	A. Botev, H. Ritter, and D. Barber. Practical gauss-newton optimisation for deep learning. In Procceedings of International Conference on Machine Learning (ICML), pages 557-565, 2017. R. H. Byrd, G. M. Chin, W. Neveitt, and J. Nocedal. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM Journal on Optimization, 21(3):977-995, 2011. F. Chollet et al. Keras. https://keras.io, 2015. J. J. Dongarra, J. Du Croz, S. Hammarling, and I. S. Duff. A set of level 3 basic linear algebra subprograms. ACM Transactions on Mathematical Software, 16(1): 1-17, 1990. K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015. X. He, D. Mudigere, M. Smelyanskiy, and M. Takáč. Large scale distributed Hessian-free optimization for deep neural network, 2016. arXiv preprint arXiv:1606.00511. R. Kiros. Training neural networks with stochastic Hessian-free optimization, 2013. arXiv preprint arXiv:1301.3641. A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097-1105. 2012. Q. V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, and A. Y. Ng. On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning, pages 265-272, 2011. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, November 1998. MNIST database available at http://yann.lecun.com/exdb/mnist/. Y. LeCun, F. J. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 97-104, 2004. K. Levenberg. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164-168, 1944. C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for large-scale logistic regression. In Proceedings of the 24th International Conference on Machine Learning (ICML), 2007. Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear. D. W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431-441, 1963. J. Martens. Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7):1723-1738, 2002. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014. arXiv preprint arXiv:1409.1556. A. Vedaldi and K. Lenc. MatConvNet: Convolutional neural networks for matlab. In Proceedings of the 23rd ACM International Conference on Multimedia, pages 689-692, 2015. O. Vinyals and D. Povey. Krylov subspace descent for deep learning. In Proceedings of Artificial Intelligence and Statistics, pages 1261-1268, 2012. C.-C. Wang, C.-H. Huang, and C.-J. Lin. Subsampled Hessian Newton methods for supervised learning. Neural Computation, 27:1766-1795, 2015. URL http://www.csie.ntu.edu.tw/~cjlin/papers/sub_hessian/sample_hessian.pdf. C.-C. Wang, K. L. Tan, C.-T. Chen, Y.-H. Lin, S. S. Keerthi, D. Mahajan, S. Sundararajan, and C.-J. Lin. Distributed Newton methods for deep learning. Neural Computation, 30:1673-1724, 2018a. URL http://www.csie.ntu.edu.tw/~cjlin/papers/dnn/dsh.pdf. C.-C. Wang, K. L. Tan, and C. J. Lin. Newton methods for convolutional neural networks. Technical report, National Taiwan University, 2018b. M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Proceedings of European Conference on Computer Vision, pages 818-833, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7542	-
dc.description.abstract	深度學習包含困難的非凸優化問題。大多數研究經常使用隨機梯度演算法(SG)來優化這類模型。使用SG通常很有效，但有時並不那麼強大。近代的研究探討了利用牛頓法作為替代的優化方法，但絕大部分研究只將其應用於全連接神經網路。他們沒有探討諸如卷積神經網路等更為廣泛使用的深度學習模型。其中一個原因是應用牛頓法於卷積神經網路的過程中牽涉到多個複雜的運算，因此目前未有仔細的相關研究。在這篇論文中，我們給出詳細的建構模組，當中包括函數、梯度及賈可比矩陣的運算和高斯-牛頓矩陣向量的乘積。這些基本的模組非常重要。因為沒有它們，任何牛頓法於全連接神經網路的進步沒辦法在卷積神經網路上嘗試。因此我們的研究將可能推動更多牛頓法於卷積神經網路上的發展。我們完成一個簡單的MATLAB實作。實驗結果顯示這個方法具有競爭力。	zh_TW
dc.description.abstract	Deep learning involves a difficult non-convex optimization problem, which is often solved by stochastic gradient (SG) methods. While SG is usually effective, it is sometimes not very robust. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feedforward neural networks. They do not investigate other types of networks such as Convolutional Neural Networks (CNN), which are more commonly used in deep-learning applications. One reason is that Newton methods for CNN involve complicated operations, and so far no works have conducted a thorough investigation. In this thesis, we give details of building blocks including function, gradient, and Jacobian evaluation, and Gauss-Newton matrix-vector products. These basic components are very important because without them none of any recent improvement of Newton methods for fully-connected networks can even be tried. Thus we will enable possible further developments of Newton methods for CNN. We finish a simple MATLAB implementation and show that it gives competitive test accuracy.	en
dc.description.provenance	Made available in DSpace on 2021-05-19T17:46:01Z (GMT). No. of bitstreams: 1 ntu-107-R04944005-1.pdf: 1385441 bytes, checksum: 32fcfe67c01c52ce8f8b06c1d3da68fb (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員會審定書.............................................. i 中文摘要...................................................... ii ABSTRACT...................................................... iii LIST OF FIGURES .............................................. vi LIST OF TABLES ............................................... vii CHAPTER I.Introduction................................................. 1 II. Optimization Problem of Feedforward CNN ................... 3 2.1 Convolutional Layer...................................... 3 2.1.1 Zero Padding......................................... 9 2.1.2 Pooling Layer........................................ 9 2.1.3 Summary of a Convolutional Layer .................... 11 2.2 Fully-Connected Layer ................................... 12 2.3 The Overall Optimization Problem ........................ 13 III. Hessian-free Newton Methods for Training CNN ............. 14 3.1 Procedure of the Newton Method........................... 14 3.2 Gradient Evaluation ..................................... 17 3.2.1 Padding, Pooling, and the Overall Procedure ......... 21 3.3 Jacobian Evaluation ..................................... 22 3.4 Gauss-Newton Matrix-Vector Products ..................... 24 IV.Implementation Details...................................... 27 4.1 Generation of φ(Z^{m−1},i)............................... 27 4.2 Construction of P^{m−1,i}_{pool} ........................ 32 4.3 Details of Padding Operation ............................ 34 4.4 Evaluation of v^TP^{m−1}_φ .............................. 36 4.5 Gauss-Newton Matrix-Vector Products ..................... 39 4.6 Mini-Batch Function and Gradient Evaluation ............. 41 V.Analysis of Newton Methods for CNN........................... 49 5.1 Memory Requirement....................................... 49 5.2 Computational Cost ...................................... 51 VI.Experiments................................................. 54 6.1 Comparison Between Newton and Stochastic Gradient Methods 56 VII.Conclusions ............................................... 59 APPENDICES .................................................... 60 BIBLIOGRAPHY .................................................. 66
dc.language.iso	en
dc.title	牛頓法於卷積神經網路之應用	zh_TW
dc.title	Newton Methods For Convolutional Neural Networks	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	卷積神經網路,多類別分類,大規模學習,抽樣海森矩陣,	zh_TW
dc.subject.keyword	Convolutional neural networks,multi-class classification,large-scale classification,subsampled Hessian,	en
dc.relation.page	69
dc.identifier.doi	10.6342/NTU201802108
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2018-07-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	1.35 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。