稀疏隨機特徵近似-希伯特空間下的座標下降法

Ting-Wei Lin; 林廷韋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56419

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德(Shou-de Lin)
dc.contributor.author	Ting-Wei Lin	en
dc.contributor.author	林廷韋	zh_TW
dc.date.accessioned	2021-06-16T05:27:43Z	-
dc.date.available	2014-08-21
dc.date.copyright	2014-08-21
dc.date.issued	2014
dc.date.submitted	2014-08-14
dc.identifier.citation	[1] P. Ricktarik and M. Takac. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. School of Mathematics, University of Edinburgh, Tech. Rep., 2011. [2] Husan-Tien Lin and Ling Li. Support vector machinery for infinite ensemble learnings. JMLR, 2008. [3] Bernhard Scholkopf and A. J. Smola. Learning with kernels. MIT Press, Cambridge, MA, 2002. [4] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. NIPS, 2004. [5] Ingo Steinwart and Andreas. Christmann. Support vector machines. Springer, 2008. [6] A. Rahimi and B. Recht. Random features for large-scale kernel machines. NIPS, 2007. [7] A. Rahimi and B. Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. NIPS, 2008. [8] A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. CVPR, 2010. [9] P. Kar and H. Karnick. Random feature maps for dot product kernels. In Proceedings of AISTATS’12, pages 583 591, 2012. [10] T. Sarlos Q. Le and A. J. Smola. Fastfood - approximating kernel expansions in loglinear time. ICML, 2013. [11] T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou. Nystrom method vs. random fourier features: A theoretical and empirical comparison. NIPS, 2012. [12] G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41:495502, 1970. [13] Saharon Rosset, Ji Zhu, and Trevor Hastie. Boosting as a regularized path to a maximum margin classifier. JMLR, 2004. [14] Shai Fine and Katya Scheinberg. Efficient svm training using low-rank kernel representations. JMLR, 2001. [15] Petros Drineas and Michael W. Mahoney. On the nystrom method for approximating a gram matrix for improved kernel-based learning. JMLR, 2005. [16] Yu. Nesterov. Efficientciency of coordinate descent methods on huge-scale optimization problems. SIAM, 2010. [17] J Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Royal Society London, A 209:415 446, 1909. [18] Saharon Rosset, Grzegorz Swirszcz, Nathan Srebro, and Ji Zhu. L1-regularization in infinite dimensional feature spaces. In Learning Theory: 20th Annual Conference on Learning Theory, 2007. [19] Gunnar Ratsch, Sebastian Mika, and Manfred K. Warmuth. On the convergence of leveraging. NIPS, 2001. [20] Matus Telgarsky. The fast convergence of boosting. NIPS, 2011. [21] S.-T. Chen, H.-T. Lin, and C.-J. Lu. An online boosting algorithm with theoretical justifications. ICML, 2012. [22] Ling Li and Hsuan-Tien Lin. Optimizing 0/1 loss for perceptrons by random coordinate descent. IJCNN, 2007. [23] Gael Guennebaud, Benoit Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010. [24] C.-C. Chang and C.-J. Lin. Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology., 2011. [25] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. JMLR 9:1871 1874, 2008.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/56419	-
dc.description.abstract	在這篇論文中，我們提出了「稀疏隨機特徵近似—希伯特空間下的座標下降法」(Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space )。這個演算法的大致流程是，首先我們會在希伯特空間上去最小化L1正規化(L1-Regularization)的目標函式(objective)，然後經過數個迴圈的隨機座標下降法(Randomized Coordinate Descent, RCD)訓練(training)，最後會得到一個稀疏非線性的預測器(predictor)。藉由將這個演算法解釋成在無窮維度的隨機特徵座標下降，可以證明我們提出的方法會收斂至一個不差於實際核心解(exact kernel solution)ϵ-準確度(precision)的解，也就是只需抽取O(1/ϵ)數量的隨機特徵，就能將期望上的錯誤率降低到ϵ以下，反而是目前隨機特徵法使用的蒙地卡羅分析法(Monte-Carlo analysis) 卻需要抽取O(1/ϵ^2 )數量的隨機特徵，才能達到相同的正確率。在我們的實驗中，稀疏隨機特徵演算法會得到一個稀疏的解，且在訓練的過程中使用了較少的記憶體(memory) 以及較少的預測(prediction) 時間，且不論是在迴歸(regression)或是分類(classification)的問題上，都同時保持與核心機器(kernel machine)相當的表現(performance)。除此之外，在無窮維度(infinite-dimensional)下的L1正規化問題上，當提升演算法(boosting approach) 無法使用貪婪步驟(greedy step)求得準確的值的時候，我們使用的近似求解器(approximate solver)以及隨機方法(randomized approach)可以得到一個比提升演算法更好的解。	zh_TW
dc.description.abstract	In this paper, we propose a Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space, which learns a sparse non-linear predictor by minimizing an ℓ1-regularized objective function over the Hilbert Space induced from kernel function. By interpreting the algorithm as Randomized Coordinate Descent in the infinite-dimensional space, we show the proposed approach converges to a solution comparable within ϵ-precision to exact kernel method by drawing O(1/ϵ) number of random features, contrasted to the O(1/ϵ^2)-type convergence achieved by Monte-Carlo analysis in current Random Feature literature. In our experiments, the Sparse Random Feature algorithm obtains sparse solution that requires less memory and prediction time while maintains comparable performance on tasks of regression and classification. In the meantime, as an approximate solver for infinite-dimensional ℓ1-regularized problem, the randomized approach converges to better solution than Boosting approach when the greedy step of Boosting cannot be performed exactly.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T05:27:43Z (GMT). No. of bitstreams: 1 ntu-103-R01944011-1.pdf: 694027 bytes, checksum: 4b7ff8d980e937a086fbcabba7235781 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	致謝 i 中文摘要 ii Abstract iii Contents iv List of Figures vii List of Tables viii 1 Introduction 1 2 Related Work 3 2.1 Kernel Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1.1 Low-Rank Approximation . . . . . . . . . . . . . . . . . . . . . 3 2.1.2 Random Feature . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Randomized Coordinate Descent . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Why existing RCD analysis cannot be applied in our setting . . . 5 3 Problem Setup 7 3.1 Preliminary Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.1 Representer Theorem . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Kernel Approximation . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.3 Reproducing Kernel Hilbert Space (RKHS) . . . . . . . . . . . . 8 3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2.1 Kernel and Feature Map . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 Random Feature as Monte-Carlo Approximation . . . . . . . . . 11 4 Theorem and Algorithm 13 4.1 Sparse Random Feature as Coordinate Descent . . . . . . . . . . . . . . 13 4.1.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.2 Relation to Kernel Method . . . . . . . . . . . . . . . . . . . . . 19 4.1.3 Relation to Boosting Method . . . . . . . . . . . . . . . . . . . . 20 5 Implementation 21 5.1 Implementation for Random Feature Generator . . . . . . . . . . . . . . 22 5.1.1 Random Fourier Feature . . . . . . . . . . . . . . . . . . . . . . 23 5.1.2 Random Perceptron Feature . . . . . . . . . . . . . . . . . . . . 23 5.2 Implementation for different solvers . . . . . . . . . . . . . . . . . . . . 24 5.2.1 Sparse Random Feature as Coordinate Descent . . . . . . . . . . 24 5.2.2 Random Feature with L2-Regularization setting . . . . . . . . . . 25 5.2.3 Kernel Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Experiments 26 6.1 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.3.1 Training Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.3.2 Testing Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3.3 Memory Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3.4 Testing Performance . . . . . . . . . . . . . . . . . . . . . . . . 29 7 Conclusions 32 7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 7.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 A Proof of Lemma 1 34 B Proof of Corollary 2 35 Bibliography 36
dc.language.iso	en
dc.subject	核心逼近	zh_TW
dc.subject	隨機座標下降法	zh_TW
dc.subject	正規化	zh_TW
dc.subject	隨機特徵	zh_TW
dc.subject	支持向量機	zh_TW
dc.subject	regularization	en
dc.subject	random feature	en
dc.subject	kernel approximation	en
dc.subject	support vector machine	en
dc.subject	randomized coordinate descent	en
dc.title	稀疏隨機特徵近似-希伯特空間下的座標下降法	zh_TW
dc.title	Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	隨機特徵,核心逼近,支持向量機,隨機座標下降法,正規化,	zh_TW
dc.subject.keyword	random feature,kernel approximation,support vector machine,randomized coordinate descent,regularization,	en
dc.relation.page	38
dc.rights.note	有償授權
dc.date.accepted	2014-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	677.76 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。