請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78158
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林守德(Shou-De Lin) | |
dc.contributor.author | Shan-Wei Lin | en |
dc.contributor.author | 林善偉 | zh_TW |
dc.date.accessioned | 2021-07-11T14:44:09Z | - |
dc.date.available | 2021-08-23 | |
dc.date.copyright | 2016-08-23 | |
dc.date.issued | 2016 | |
dc.date.submitted | 2016-08-05 | |
dc.identifier.citation | [1] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of Multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
[2] K.-W. Chang and D. Roth. Selective block minimization for faster convergence of limited memory large-scale linear models. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 699–707. ACM, 2011. [3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009. [4] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871–1874, 2008. [5] M. Hong and Z.-Q. Luo. On the linear convergence of the alternating direction method of multipliers. arXiv preprint arXiv:1208.3922, 2012. [6] C.-J. Hsieh, I. S. Dhillon, P. K. Ravikumar, S. Becker, and P. A. Olsen. Quic & dirty: A quadratic approximation approach for dirty statistical models. In Advances in Neural Information Processing Systems, pages 2006–2014, 2014. [7] M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1), pages 427–435, 2013. [8] M. Jaggi, E. CH, M. I. Jordan, B. EDU, P. Richtárik, and M. Takác. Adding vs.averaging in distributed primal-dual optimization. 2015. [9] M. Jaggi, V. Smith, M. Takác, J. Terhorst, S. Krishnan, T. Hofmann, and M. I. Jordan. Communication-efficient distributed dual coordinate ascent. In Advances in Neural Information Processing Systems, pages 3068–3076, 2014. [10] T. Joachims. A support vector method for multivariate performance measures. In Proceedings of the 22nd international conference on Machine learning, pages 377–384. ACM, 2005. [11] S. Kakade, S. Shalev-Shwartz, and A. Tewari. Applications of strong convexity–strong smoothness duality to learning with matrices. CoRR, abs/0910.0610, 2009. [12] G. Obozinski, L. Jacob, and J.-P. Vert. Group lasso with overlaps: the latent group lasso approach. arXiv preprint arXiv:1110.0413, 2011. [13] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, pages 1177–1184, 2007. [14] P. Richtárik and M. Takáč. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Mathematical Programming, 144(1-2):1–38, 2014. [15] S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming, 127(1):3–30, 2011. [16] S. Shalev-Shwartz and A. Tewari. Stochastic methods for l1-regularized loss minimization. Journal of Machine Learning Research, 12(Jun):1865–1892, 2011. [17] V. Smith, S. Forte, M. I. Jordan, and M. Jaggi. L1-regularized distributed optimization:A communication-efficient primal-dual framework. arXiv preprint arXiv:1512.04011, 2015. [18] N. Srebro, K. Sridharan, and A. Tewari. On the universality of online mirror descent.In Advances in neural information processing systems, pages 2645–2653, 2011. [19] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. [20] I. Trofimov and A. Genkin. Distributed coordinate descent for l1-regularized logistic regression. In International Conference on Analysis of Images, Social Networks and Texts, pages 243–254. Springer, 2015. [21] I. E. Yen, X. Lin, K. Zhong, P. Ravikumar, and I. Dhillon. A convex exemplar-based approach to mad-bayes dirichlet process mixture models. In Proceedings of The 32nd International Conference on Machine Learning, pages 2418–2426, 2015. [22] I. E.-H. Yen, C.-F. Chang, T.-W. Lin, S.-W. Lin, and S.-D. Lin. Indexed block coordinate descent for large-scale linear classification with limited memory. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 248–256. ACM, 2013. [23] I. E.-H. Yen, S.-W. Lin, and S.-D. Lin. A dual augmented block minimization framework for learning with limited memory. In Advances in Neural Information Processing Systems, pages 3582–3590, 2015. [24] I. E.-H. Yen, T.-W. Lin, S.-D. Lin, P. K. Ravikumar, and I. S. Dhillon. Sparse random feature algorithm as coordinate descent in hilbert space. In Advances in Neural Information Processing Systems, pages 2456–2464, 2014. [25] H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. ACM Transactions on Knowledge Discovery from Data (TKDD), 5(4):23, 2012. [26] G.-X. Yuan, K.-W. Chang, C.-J. Hsieh, and C.-J. Lin. A comparison of optimization methods and software for large-scale l1-regularized linear classification. Journal of Machine Learning Research, 11(Nov):3183–3234, 2010. [27] K. Zhong, I. E.-H. Yen, I. S. Dhillon, and P. K. Ravikumar. Proximal quasi-newton for computationally intensive l1-regularized m-estimators. In Advances in Neural Information Processing Systems, pages 2375–2383, 2014. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78158 | - |
dc.description.abstract | 我們考慮在有限記憶體中的最小經驗風險(Empirical Risk Minimization)問題。我們提出一個區塊最小化框架來解決有限記憶體中的四種情況。第一種情況是資料(data)比模型(model)大而且對偶函式不是平滑的。這個情況在現在的大資料的趨勢當中很容易遇到,當問題的正則項(regularizer)是二範數(norm)的平方時,就會是這種情況,像是二範數支持向量機(support vector machine)、二範數邏輯回歸(logistic regression)以及二範數線性回歸。第二種情況是資料比模型大但是在對偶函式中存在一項是無法被區塊分割且不平滑的,這種情形發生在範數不是二範數的平方時,這種問題包括一範數線性回歸問題、矩陣填補中的核範數(nuclear norm)以及群範數(group norm)家族。第三種情況是模型比資料大,而且問題函式在不可切成區塊的部分是平滑的,這種情況會出現在極端分類問題(extreme classification)裡,因為這種問題有非常多類(class)以及非常高維度的特徵(feature)向量,稀疏編碼(sparse coding)也會出現特徵值比資料多的情況,也就是模型比資料大。第四種情況是模型比資料大,但是問題函式中有不可區塊分割的部分同時也不平滑,支持向量機的合頁損失(hinge loss)函式就會造成這種情況。在這篇論文裡,我們假設資料或是模型其中之一因為太大而不能全部放進記憶體當中,而另外一個可以放進記憶體中。當資料比較大的時候,我們用對偶區塊最小化方法來解,當模型比較大的時候,我們則用原始(primal)區域最小化。在原始區域最小化當中,我們把模型分割成很多個區塊,把這些區塊存在硬碟中,每次讀取一個區塊來解,並且只更新這個區塊的參數。而在對偶最小化問題時,我們把資料切成很多塊,把這些區塊存在硬碟,一次讀取一個區塊的資料來做最小化。當有無法被區域分開而且不平滑的項存在時,我們使用拉格朗日增強法使得那一項變成平滑項,來保證最後會收斂到全局最佳解(global optimum)。當在做原始區塊最小化時,我們在對偶問題加上一個強凸(strongly convex)項,使得原始問題變的平滑;而在做對偶區塊最小化時,我們在原始問題加上一項強凸項,使得對偶問題變平滑。而這多加的一項在演算法接近最佳解的時候會消失,所以我們的方法會收斂到和那些在足夠記憶體下的方法一樣的解。我們的這個框架是靈活的,因為給定一個在足夠記憶體下的解決方法,我們可以只在解決方法上加上一項,就能很容易的把這個解決方法用在有限記憶體之下。 | zh_TW |
dc.description.abstract | We consider the regularized Empirical Risk Minimization (ERM) in the limited memory environment. We provide a block minimization framework to deal with four situations when training with limited memory. First, when data is larger than the model and the terms in dual objective function are either smooth or block separable. This situation happens frequently in the recently big data trend with L2-regularizer, including L2 Support Vector Machine (SVM), L2 Logistic Regression, and L2 Linear Regression. Second, when data is larger than the model and the non-separable part of dual objective function is non-smooth. It could happen when the regularizer is not L2 norm, including L1-regularizer in Lasso problem, nuclear norm in matrix completion problem, and a family of structure group norms. Third, when model is larger than the data and the terms in primal objective function are either smooth or block separable. This situation appears sometime in the extreme classification problem, which will have large model due to large number of classes and large number of features. Sparse coding is another area that the number of features is larger than the number of samples. Fourth, when model is larger than the data and the non-separable part of primal objective function is non-smooth. Multi-class SVM is an example of this situation, Hinge Loss causes the non-smoothness of the primal objective function. In this paper, we assume that either model or data is too large to fit into memory, and that the other one can fit into memory. We use dual block minimization to solve the problem when the data size is larger than the memory size, using primal block minimization to solve it when the model size is larger than the memory. To be specific, in the primal block minimization framework, we split the model into several blocks, store the model in the disk, sequentially load part of the model into memory and update this part only, write this part back to the disk, and read the next part. In the dual block minimization framework, we split data in to blocks and store it in the disk, sequentially load part of the data into memory, update the parameters base on this partial data, than load the next block of data. Also, we apply Augmented Lagrangian technique to force the smoothness of the corresponding objective function when it is non-smooth and non-block-separable to achieve global convergence for the general convex ERM problem. In particular, in the primal block minimization case, we add a strongly convex term to the dual objective in order to force the primal objective function to be smooth. In the dual block minimization case, we add a strongly convex term to the primal problem. This additional term will disappear when the model converge to near the optimal point. Therefore, this algorithm will converge to the same point achieved by the algorithm with sufficient memory. This framework is flexible in the sense that, given a solver working under sufficient memory, one can integrate it into our framework with additional term in the update rule if necessary and obtain a solver globally convergent under limited memory condition. | en |
dc.description.provenance | Made available in DSpace on 2021-07-11T14:44:09Z (GMT). No. of bitstreams: 1 ntu-105-R03922067-1.pdf: 950759 bytes, checksum: ced89414da75eb479d1c592c1721dcea (MD5) Previous issue date: 2016 | en |
dc.description.tableofcontents | 摘要 iii
Abstract v 1 Introduction 1 2 Problem Setup 5 3 Primal and Dual Block Minimization 9 3.1 Data Size is Larger than The Memory Size . . . . . . . . . . . . . . . . . 9 3.2 Model Size is Larger than The Memory Size . . . . . . . . . . . . . . . . 12 3.2.1 Solve Block SubProblem in Primal . . . . . . . . . . . . . . . . 14 4 Augmented Block Minimization 15 4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.1.1 Dual Augmented Block Minimization . . . . . . . . . . . . . . . 17 4.1.2 Primal Augmented Block Minimization . . . . . . . . . . . . . . 18 4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Practical Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.1 Solving Sub-Problem Inexactly . . . . . . . . . . . . . . . . . . 26 4.3.2 Random Selection w/o Replacement . . . . . . . . . . . . . . . . 27 4.3.3 Storage of Dual Variables . . . . . . . . . . . . . . . . . . . . . 27 5 Experiment 29 5.1 When Data is Larger than Memory . . . . . . . . . . . . . . . . . . . . . 29 5.2 When Model is Larger than Memory . . . . . . . . . . . . . . . . . . . . 35 6 Conclusion 41 Bibliography 43 A ADMM under Limited Memory 1 | |
dc.language.iso | en | |
dc.title | 在有限記憶體下學習的區塊最小化框架 | zh_TW |
dc.title | Block Minimization Framework for Learning with Limited Memory | en |
dc.type | Thesis | |
dc.date.schoolyear | 104-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee),吳尚鴻(Shan-Hung Wu) | |
dc.subject.keyword | 機器學習,有限記憶體,增強拉格朗日,最小經驗風險, | zh_TW |
dc.subject.keyword | Machine Learning,Limited Memory,Augmented Lagrangian,Empirical Risk Minimization, | en |
dc.relation.page | 50 | |
dc.identifier.doi | 10.6342/NTU201601838 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2016-08-05 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-R03922067-1.pdf 目前未授權公開取用 | 928.48 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。