牛頓法於大規模線性分類問題的預處理器研究

Chih-Yang Hsia; 夏誌陽

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69054

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林智仁(Chih-Jen Lin)
dc.contributor.author	Chih-Yang Hsia	en
dc.contributor.author	夏誌陽	zh_TW
dc.date.accessioned	2021-06-17T02:49:23Z	-
dc.date.available	2018-08-24
dc.date.copyright	2017-08-24
dc.date.issued	2017
dc.date.submitted	2017-08-15
dc.identifier.citation	R. H. Byrd, G. M. Chin, W. Neveitt, and J. Nocedal. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM Journal on Optimization, 21(3):977–995, 2011. W.-S. Chin, B.-W. Yuan, M.-Y. Yang, and C.-J. Lin. A note on diagonal precon- ditioner in large-scale logistic regression. Technical report, Department of Com- puter Science and Information Engineering, National Taiwan University, 2016. URL https://www.csie.ntu.edu.tw/~cjlin/papers/logistic/pcgnote.pdf. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIB- LINEAR: a library for large linear classification. Journal of Machine Learn- ing Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/ papers/liblinear.pdf. G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, third edition, 1996. R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge university press, 2012. C.-Y. Hsia, Y. Zhu, and C.-J. Lin. A study on trust region update rules in newton methods for large-scale linear classification. Technical report, Department of Computer Science and Information Engineering, National Taiwan University, 2017. URL http://www.csie.ntu.edu.tw/~cjlin/papers/newtron/newtron. pdf. S. S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. JMLR, 6:341–361, 2005. C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for large- scale logistic regression. Journal of Machine Learning Research, 9:627–650, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/logistic.pdf. C. Ma and M. Taka ́ˇc. Distributed inexact damped newton method: Data parti- tioning and load-balancing. arXiv preprint arXiv:1603.05191, 2016. O. L. Mangasarian. A finite Newton method for classification. Optimization Methods and Software, 17(5):913–929, 2002. M. Roma. Dynamic scaling based preconditioning for truncated newton methods in large scale unconstrained optimization. Optimization Methods and Software, 20(6):693–713, 2005. T. Steihaug. The conjugate gradient method and trust regions in large scale optimization. SIAM Journal on Numerical Analysis, 20:626–637, 1983. M. A. Woodbury. Inverting modified matrices. Memorandum report, 42(106):336, 1950. Y. Zhang and X. Lin. Disco: Distributed optimization for self-concordant empir- ical loss. In ICML, pages 362–370, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69054	-
dc.description.abstract	截斷牛頓法在大規模線性分類問題中是最有效的最優化方法之一。其最主要的計算是在每個牛頓迭代中，利用像共軛梯度的迭代法去近似解出一個二次子問題。已知在子問題是非良置條件時，共軛梯度法的收斂速度很慢。預處理共軛梯度法是個常用來改善共軛梯度法收斂性的技術，但要找到能在所有情況都表現良好的預處理器是困難的，而且Hessian-free 的最優化方法通常用來處理大型的資料，許多現有的預處理器無法直接使用。本篇論文中，我們深入研究一些曾經被使用在線性分類問題上的預處理器，並指出他們在某些情況並沒有辦法增進訓練速度。經過深入研究，藉由避免預處理器造成比原本更壞的條件數，我們提出一種簡單且有效的辦法使得預處理共軛梯度法在截斷牛頓架構中更能適應各種情況，並且給出理論上的驗證。最後透過仔細設計的實驗，顯示我們的方法能有效地減少模型在大型問題上面的訓練時間。	zh_TW
dc.description.abstract	Truncated Newton method is one of the most effective optimization methods for large-scale linear classification. The main computational task at each Newton iteration is to approximately solve a quadratic sub-problem by an iterative method such as the conjugate gradient (CG) method. It is known that CG has slow convergence if the sub-problem is ill-conditioned. Preconditioned CG (PCG) methods have been used to improve the convergence of the CG method, but it is difficult to find a preconditioner that performs well in most situations. Further, because Hessian-free optimization techniques are incorporated for handling large data, many existing preconditioners are not directly applicable. In this work, we detialedly study some preconditioners that have been considered in past works for linear classification. We show that these preconditioners may not help to improve the training speed in some cases. After some investigation, we propose a simple and effective technique to make the PCG method more robust in a truncated Newton framework. The idea is to avoid the situation when a preconditioner leads to a much worse condition number than when it is not applied. We provide theoretical justification. Through carefully designed experiments, we demonstrate that our method can effectively reduce the training time for large-scale problems.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T02:49:23Z (GMT). No. of bitstreams: 1 ntu-106-R04922021-1.pdf: 6805480 bytes, checksum: 244e97769e35bb33ae99e5e6fff8e754 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	口試委員會審定書 ................................ i 中文摘要 ...................................... ii ABSTRACT .................................... iii LISTOFFIGURES ................................ vi LISTOFTABLES ................................. ix CHAPTER I.Introduction ............................... 1 II. Conjugate Gradient in Newton Methods for Linear Classification 4 2.1 Truncated Newton Methods for Linear Classification ........... 4 2.2 Trust-regionNewtonMethod ..................................... 6 III. Preconditioned CG Method for Trust-region Sub-problem .. 9 IV. Existing Preconditioners and Our Proposed Method .. 13 4.1 Diagonal Preconditioner............................ 13 4.2 Subsampled Hessian as Preconditioner .............. 14 V. Our Proposed Method ........................................... 16 5.1 Run CG and PCG in Parallel.................................... 17 5.2 Weighted Average of the Preconditioner and the Identity Matrix 17 VI. Experiments ......................................... 19 6.1 Experimental Settings and DataSets................... 20 6.2 Diagonal Preconditioner and the Proposed Extension in Section 5.1 and (5.2) .......................... 23 6.3 Using Subsampled Hessian as the Preconditioner ...... 24 6.4 Comparison of Different Preconditioners ............. 26 6.5 Improving Numerical Stability by Preconditioning .... 29 VII. Conclusion and Future work ..................30 APPENDICES ...................................... 31 BIBLIOGRAPHY .................................... 55
dc.language.iso	en
dc.title	牛頓法於大規模線性分類問題的預處理器研究	zh_TW
dc.title	A Study on Preconditioners in Newton Method for Large-scale Linear Classification	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	大規模線性分類,預處理共軛梯度,牛頓法,	zh_TW
dc.subject.keyword	large-scale linear classification,preconditioned conjugate gradient,Newton method,	en
dc.relation.page	56
dc.identifier.doi	10.6342/NTU201701767
dc.rights.note	有償授權
dc.date.accepted	2017-08-15
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 目前未授權公開取用	6.65 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。