請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48991完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林守德(Shou-De Lin) | |
| dc.contributor.author | Yu-Chen Lu | en |
| dc.contributor.author | 盧昱辰 | zh_TW |
| dc.date.accessioned | 2021-06-15T11:13:09Z | - |
| dc.date.available | 2019-11-02 | |
| dc.date.copyright | 2016-11-02 | |
| dc.date.issued | 2016 | |
| dc.date.submitted | 2016-08-21 | |
| dc.identifier.citation | [1] G. Andrew and J. Gao. Scalable training of l 1-regularized log-linear models. In Proceedings of the 24th international conference on Machine learning, pages 33–40. ACM, 2007.
[2] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Structured sparsity through convex optimization. Statistical Science, pages 450–468, 2012. [3] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009. [4] Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and testing low-degree polynomial data mappings via linear svm. The Journal of Machine Learning Research, 11:1471–1490, 2010. [5] X. Chen, W. Pan, J. T. Kwok, and J. G. Carbonell. Accelerated gradient method for multi-task sparse learning problem. In 2009 Ninth IEEE International Conference on Data Mining, pages 746–751. IEEE, 2009. [6] I. S. Dhillon, P. K. Ravikumar, and A. Tewari. Nearest neighbor based greedy coordinate descent. In Advances in Neural Information Processing Systems, pages 2160–2168, 2011. [7] C.-J. Hsieh, I. S. Dhillon, P. K. Ravikumar, S. Becker, and P. A. Olsen. Quic & dirty: A quadratic approximation approach for dirty statistical models. In Advances in Neural Information Processing Systems, pages 2006–2014, 2014. [8] C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, P. K. Ravikumar, and R. Poldrack. Big & quic: Sparse inverse covariance estimation for a million variables. In Advances in Neural Information Processing Systems, pages 3165–3173, 2013. [9] L. Jacob, G. Obozinski, and J.-P. Vert. Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning, pages 433–440. ACM, 2009. [10] J. D. Lee, Y. Sun, and M. A. Saunders. Proximal newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420– 1443, 2014. [11] C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression. The Journal of Machine Learning Research, 9:627–650, 2008. [12] Z. Lu, A. May, K. Liu, A. B. Garakani, D. Guo, A. Bellet, L. Fan, M. Collins, B. Kingsbury, M. Picheny, et al. How to scale up kernel methods to be as good as deep neural nets. arXiv preprint arXiv:1411.4000, 2014. [13] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006. [14] G. Obozinski, B. Taskar, and M. Jordan. Multi-task feature selection. Statistics Department, UC Berkeley, Tech. Rep, 2, 2006. [15] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177–1184, 2007. [16] K. Scheinberg and X. Tang. Practical inexact proximal quasi-newton method with global complexity analysis. arXiv preprint arXiv:1311.6547, 2013. [17] N. Sokolovska, T. Lavergne, O. Cappé, and F. Yvon. Efficient learning of sparse conditional random fields for supervised sequence labeling. Selected Topics in Signal Processing, IEEE Journal of, 4(6):953–964, 2010. [18] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. [19] P. Tseng and S. Yun. A coordinate gradient descent method for nonsmooth separable minimization. Mathematical Programming, 117(1-2):387–423, 2009. [20] D. Tuia, R. Flamary, and N. Courty. Multiclass feature learning for hyperspectral image classification: Sparse and hierarchical solutions. ISPRS Journal of Photogrammetry and Remote Sensing, 105:272–285, 2015. [21] M. Wytock and J. Z. Kolter. Sparse gaussian conditional random fields: Algorithms, theory, and application to energy forecasting. In ICML (3), pages 1265–1273, 2013. [22] T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou. Nyström method vs random fourier features: A theoretical and empirical comparison. In Advances in neural information processing systems, pages 476–484, 2012. [23] I. E.-H. Yen, T.-W. Lin, S.-D. Lin, P. K. Ravikumar, and I. S. Dhillon. Sparse random feature algorithm as coordinate descent in hilbert space. In Advances in Neural Information Processing Systems, pages 2456–2464, 2014. [24] G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classification. The Journal of Machine Learning Research, 11:3183–3234, 2010. [25] G.-X. Yuan, C.-H. Ho, and C.-J. Lin. An improved glmnet for l1-regularized logistic regression. The Journal of Machine Learning Research, 13(1):1999–2030, 2012. [26] K. Zhong, I. E.-H. Yen, I. S. Dhillon, and P. K. Ravikumar. Proximal quasinewton for computationally intensive l1-regularized m-estimators. In Advances in Neural Information Processing Systems, pages 2375–2383, 2014. [27] J. Zhu, N. Lao, and E. P. Xing. Grafting-light: fast, incremental feature selection and structure learning of markov random fields. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 303–312. ACM, 2010. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48991 | - |
| dc.description.abstract | 一次正則化(L1 regularization)已經是標準的特徵選取(feature selection)方法之一。當我們對於資料特徵有額外的結構資訊,我們可以將特徵選取延伸至挑選一些預先定義的特徵集合,而這可以由分組一次正規化(group-L1 regularization)來達成。面對高維度的機器學習問題,shrinking 以及貪婪方法常被用來維護一個工作集以縮小問題。然而shrinking 採取太過保守的策略,而貪婪方法需要過多的花費。在這篇論文中,我們提出了隨機活動集方法(Stochastic Active-Set method),透過無偏的近似貪婪方法維護一個活動集,並有夠高的機率達到收斂。藉由一個雙重採樣的策略,我們平衡了搜尋特徵空間的花費以及活動集的效果。除此之外,為了應對那些計算密集的損失函數,我們建構了一個擬牛頓法的新變體來配合我們的活動集方法,因此不需要計算所有的損失函數梯度。在我們的實驗中,對於計算密集的損失函數的一次正規化以及分組一次正規化問題,我們所提出的方法相較於目前最先進的方法,展現出了顯著的加速。 | zh_TW |
| dc.description.abstract | L1 regularization has become one of the standard techniques for feature selection. Furthermore, when structural prior knowledge is available, we can extend the feature selection problem to selecting (predefined) groups of variables, which can be achieved via solving a Lp;1 regularization. To deal with high dimensional learning tasks, shrinking heuristics and greedy method are often exploited to maintain a relatively small working set. However, in practice, shrinking could be too conservative while greedy method could be too expensive in the search of greedy variable. In this work, we propose a Stochastic Active-Set method that maintains an active-set by an approximate and unbiased greedy search method that finds active variables with certain probability suffices to achieve fast convergence. A balance between exploiting active set and exploring high-dimensional space is achieved via a double sampling strategy. In addition, to handle loss function of expensive evaluation cost, we construct a new variant of Quasi-Newton approximation for our Active-Set method, which does not require computation of the whole gradient vector. In our experiments, the proposed algorithm leads to significant speed up over state-of-the-art solvers for L1, L2;1 and Linf;1 regularized problems of computationally expensive loss function. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T11:13:09Z (GMT). No. of bitstreams: 1 ntu-105-R03944026-1.pdf: 1881662 bytes, checksum: 1d8755d5e2b51204af40f20d9c38fbd8 (MD5) Previous issue date: 2016 | en |
| dc.description.tableofcontents | 口試委員會審定書i
摘要ii Abstract iii 1 Introduction 1 2 L1 and Lp;1 regularized Empirical Risk Minimization 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Block Coordinate Descent Method . . . . . . . . . . . . . . . . . . . 6 3 Stochastic Active-Set Method 8 3.1 Stochastic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Search via Residual Sampling . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Stochastic Active-Set Proximal Quasi-Newton Method 13 4.1 L-BFGS for Active-Set Methods . . . . . . . . . . . . . . . . . . . . . 14 4.2 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Experiment 18 A Proofs of Theorems 25 A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 A.2 Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 A.3 Proof for Convergence of Exact Active Method . . . . . . . . . . . . 27 A.4 Proof for Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 A.5 Proof for Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Bibliography 32 | |
| dc.language.iso | en | |
| dc.subject | 大規模學習 | zh_TW |
| dc.subject | 特徵選擇 | zh_TW |
| dc.subject | feature selection | en |
| dc.subject | large-scale learning | en |
| dc.title | 針對一次正則化及分組一次正則化問題的隨機活動集近端擬牛頓法 | zh_TW |
| dc.title | Stochastic Active-Set Proximal Quasi-Newton Method for L1 and Group-L1 Regularized Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 104-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 李育杰(Yuh-Jye Lee),林軒田(Hsuan-Tien Lin),吳尚鴻(Shan-Hung Wu) | |
| dc.subject.keyword | 特徵選擇,大規模學習, | zh_TW |
| dc.subject.keyword | feature selection,large-scale learning, | en |
| dc.relation.page | 35 | |
| dc.identifier.doi | 10.6342/NTU201603353 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2016-08-22 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊網路與多媒體研究所 | zh_TW |
| 顯示於系所單位: | 資訊網路與多媒體研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-105-1.pdf 未授權公開取用 | 1.84 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
