求解大規模一次正規化線性分類最佳化方法的比較

Guo-Xun Yuan; 袁國訓

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44089

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林智仁(Chih-Jen Lin)
dc.contributor.author	Guo-Xun Yuan	en
dc.contributor.author	袁國訓	zh_TW
dc.date.accessioned	2021-06-15T02:39:26Z	-
dc.date.available	2009-08-18
dc.date.copyright	2009-08-18
dc.date.issued	2009
dc.date.submitted	2009-08-12
dc.identifier.citation	G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML), 2007. S. Benson and J. J. Mor e. A limited memory variable metric method for bound constrained minimization. Technical report, Argonne Lab., 2001. D. P. Bertsekas. Nonlinear Programming. Athena Scienti c, Belmont, MA 02178- 9998, second edition, 1999. R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Statist. Comput., 16:1190{1208, 1995. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. Coordinate descent method for largescale L2-loss linear SVM. Journal of Machine Learning Research, 9:1369{1398, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/cdl2.pdf. I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a a sparsity constraint. Communications on Pure and Applied Mathematics, 57:1413{1457, 2004. J. Duchi and Y. Singer. Boosting with structural sparsity. In Proceedings of the 26th International Conference on Machine Learning (ICML), 2009. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. E cient projections onto the L1-ball for learning in high dimensions. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407{499, 2004. J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. 2008. W. J. Fu. Penalized regressions: The bridge versus the lasso. Journal of Compu- tational and Graphical Statistics, 7(3):397{416, 1998. A. Genkin, D. D. Lewis, and D. Madigan. Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3):291{304, 2007. J. Goodman. Exponential priors for maximum entropy models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2004. E. T. Hale, W. Yin, and Y. Zhang. Fixed-point continuation for l1-minimization: Methodology and convergence. SIAM Journal on Optimization, 19(3):1107{1130, 2008. C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML), 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf. T. Joachims. Making large-scale SVM learning practical. In B. Sch olkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press. J. Kazama and J. Tsujii. Evaluation and extension of maximum entropy models with inequality constraints. In EMNLP, 2003. J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1{63, 1997. K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1- regularized logistic regression. Journal of Machine Learning Research, 8:1519{ 1555, 2007a. URL http://www.stanford.edu/~boyd/l1_logistic_reg.html. K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale l1- regularized logistic regression. JMLR, 2007b. To appear. P. Komarek and A. W. Moore. Making logistic regression a core data mining tool. Technical report, Carnegie Mellon University, 2005. J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 10:771{801, 2009. C.-Y. Lee. A comparison of optimization methods for large-scale l1-regularized logistic regression. Master's thesis, Department of Computer Science and Information Engineering, National Taiwan University, 2008. S.-I. Lee, H. Lee, P. Abbeel, and A. Y. Ng. E cient l1 regularized logistic regression. In Proceedings of the Twenty- rst National Conference on Arti cial Intelligence (AAAI-06), pages 1{9, Boston, MA, USA, July 2006. C.-J. Lin and J. J. Mor e. Newton's method for large-scale bound constrained problems. SIAM Journal on Optimization, 9:1100{1127, 1999. C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for largescale logistic regression. Journal of Machine Learning Research, 9:627{650, 2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/logistic.pdf. S. Perkins, K. Lacker, and J. Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3: 1333{1356, 2003. M. Schmidt, G. Fung, and R. Rosales. Fast optimization methods for l1 regularization: A comparative study and two new approaches. In Proceedings of ECML, pages 286{297, 2007. S. Shalev-Shwartz and A. Tewari. Stochastic methods for l1 regularized loss minimization. In Proceedings of the 26th International Conference on Machine Learn- ing (ICML), 2009. S. K. Shevade and S. S. Keerthi. A simple and e cient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246{2253, 2003. J. Shi, W. Yin, S. Osher, and P. Sajda. A fast hybrid algorithm for large scale l1- regularized logistic regression. 2008. Submitted to Journal of Machine Learning Research. C. H. Teo, S. Vishwanathan, A. Smola, and Q. V. Le. Bundle methods for regularized risk minimization. 2009. Submitted to Journal of Machine Learning Research. S. Yun and K.-C. Toh. A coordinate gradient descent method for l1-regularized convex minimization. 2009. To appear. T. Zhang and F. J. Oles. Text categorization based on regularized linear classi - cation methods. Information Retrieval, 4(1):5{31, 2001.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44089	-
dc.description.abstract	大規模線性分類器在文件分類以及計算語言學的領域上受到廣泛的運用。一次正規化的線性分類器則可以應用在特徵選擇上，然而其不可微分的性質卻造成求解時諸多困難。近幾年，各種求解一次正規化線性分類問題的最佳化方法相繼被提出，而至今卻並未受到嚴謹地討論與比較。本論文嚴格地探討一些具代表性的方法與實作上的相關議題，並且透過實驗進行徹底地比較。實驗結果顯示出：座標下降法在一般的狀況下，是最適合用來求解一次正規化線性分類問題的最佳化方法；然而，以牛頓法為基礎的最佳化方法則在求解後期能有最快速的收斂特性。	zh_TW
dc.description.abstract	A large-scale linear classifier is useful for document classification and computational linguistics. The L1-regularized form can be used for feature selection, but its non-differentiability causes more difficulties in training. Various optimization methods have been proposed in recent years, but no serious comparison among them has been made. In this paper, we carefully address implementation issues of some representative methods and conduct a comprehensive comparison. Results show that coordinate descent type methods may be the most suitable in general situations though Newton method has the fastest final convergence.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T02:39:26Z (GMT). No. of bitstreams: 1 ntu-98-R96922042-1.pdf: 9284767 bytes, checksum: 97579a72693a39b601ee044a37b8f3af (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	口試委員會審定書 : : : : : : : : : : : : : : : : : : : : : : : : i 中文摘要 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ii ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : iii LIST OF TABLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : vi CHAPTER I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II. A Survey of Existing Methods . . . . . . . . . . . . . . . . . . . . 4 2.1 Methods by Solving Constrained Optimization Problems . . . . 4 2.1.1 Problems with Smooth Constraints . . . . . . . . . . 4 2.1.2 Problems with Nonsmooth Constraints . . . . . . . . 6 2.2 Decomposition Methods . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Coordinate Descent Methods . . . . . . . . . . . . . . 7 2.2.2 Gauss-Southwell Method . . . . . . . . . . . . . . . . 8 2.3 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 III. Methods by Solving Constrained Optimization Problems . . . 11 3.1 A Trust Region Newton Method . . . . . . . . . . . . . . . . . 11 3.1.1 Cauchy Point . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2 Newton Direction . . . . . . . . . . . . . . . . . . . . 14 3.1.3 Hessian-vector Product . . . . . . . . . . . . . . . . . 16 3.2 An Interior Point Method . . . . . . . . . . . . . . . . . . . . . 17 IV. Decomposition Methods . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 Coordinate Descent Methods . . . . . . . . . . . . . . . . . . . 20 4.1.1 BBR . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1.2 CD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 A Coordinate Gradient Descent Method . . . . . . . . . . . . . 28 V. Other Methods for Solving L1-regularized Problem . . . . . . . 30 5.1 Generalized Linear Model with Elastic Net . . . . . . . . . . . 30 5.2 Bundle Methods for Regularized Risk Minimization . . . . . . . 31 VI. Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . 34 6.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.2 Comparing Solvers for L1-regularized Logistic Regression . . . 35 6.2.1 Experiment Setting . . . . . . . . . . . . . . . . . . . 35 6.2.2 Experiments on Micro-array Data sets . . . . . . . . . 38 6.2.3 Experiments on Large Data sets . . . . . . . . . . . . 39 6.3 Comparing Solvers for L1-regularized L2-loss Linear SVM . . . 41 VII. Discussions and Conclusions . . . . . . . . . . . . . . . . . . . . . 55 APPENDICES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 BIBLIOGRAPHY : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 74
dc.language.iso	en
dc.title	求解大規模一次正規化線性分類最佳化方法的比較	zh_TW
dc.title	A Comparison of Optimization Methods for Large-scale L1-regularized Linear Classification	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	一次正規化,線性分類,最佳化方法,	zh_TW
dc.subject.keyword	L1-regularization,linear classification,optimization methods,	en
dc.relation.page	76
dc.rights.note	有償授權
dc.date.accepted	2009-08-12
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 目前未授權公開取用	9.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。