在有限記憶體下學習的區塊最小化框架

Shan-Wei Lin; 林善偉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78158

標題:	在有限記憶體下學習的區塊最小化框架 Block Minimization Framework for Learning with Limited Memory
作者:	Shan-Wei Lin 林善偉
指導教授:	林守德(Shou-De Lin)
關鍵字:	機器學習,有限記憶體,增強拉格朗日,最小經驗風險, Machine Learning,Limited Memory,Augmented Lagrangian,Empirical Risk Minimization,
出版年 :	2016
學位:	碩士
摘要:	我們考慮在有限記憶體中的最小經驗風險(Empirical Risk Minimization)問題。我們提出一個區塊最小化框架來解決有限記憶體中的四種情況。第一種情況是資料(data)比模型(model)大而且對偶函式不是平滑的。這個情況在現在的大資料的趨勢當中很容易遇到，當問題的正則項(regularizer)是二範數(norm)的平方時，就會是這種情況，像是二範數支持向量機(support vector machine)、二範數邏輯回歸(logistic regression)以及二範數線性回歸。第二種情況是資料比模型大但是在對偶函式中存在一項是無法被區塊分割且不平滑的，這種情形發生在範數不是二範數的平方時，這種問題包括一範數線性回歸問題、矩陣填補中的核範數(nuclear norm)以及群範數(group norm)家族。第三種情況是模型比資料大，而且問題函式在不可切成區塊的部分是平滑的，這種情況會出現在極端分類問題(extreme classification)裡，因為這種問題有非常多類(class)以及非常高維度的特徵(feature)向量，稀疏編碼(sparse coding)也會出現特徵值比資料多的情況，也就是模型比資料大。第四種情況是模型比資料大，但是問題函式中有不可區塊分割的部分同時也不平滑，支持向量機的合頁損失(hinge loss)函式就會造成這種情況。在這篇論文裡，我們假設資料或是模型其中之一因為太大而不能全部放進記憶體當中，而另外一個可以放進記憶體中。當資料比較大的時候，我們用對偶區塊最小化方法來解，當模型比較大的時候，我們則用原始(primal)區域最小化。在原始區域最小化當中，我們把模型分割成很多個區塊，把這些區塊存在硬碟中，每次讀取一個區塊來解，並且只更新這個區塊的參數。而在對偶最小化問題時，我們把資料切成很多塊，把這些區塊存在硬碟，一次讀取一個區塊的資料來做最小化。當有無法被區域分開而且不平滑的項存在時，我們使用拉格朗日增強法使得那一項變成平滑項，來保證最後會收斂到全局最佳解(global optimum)。當在做原始區塊最小化時，我們在對偶問題加上一個強凸(strongly convex)項，使得原始問題變的平滑;而在做對偶區塊最小化時，我們在原始問題加上一項強凸項，使得對偶問題變平滑。而這多加的一項在演算法接近最佳解的時候會消失，所以我們的方法會收斂到和那些在足夠記憶體下的方法一樣的解。我們的這個框架是靈活的，因為給定一個在足夠記憶體下的解決方法，我們可以只在解決方法上加上一項，就能很容易的把這個解決方法用在有限記憶體之下。 We consider the regularized Empirical Risk Minimization (ERM) in the limited memory environment. We provide a block minimization framework to deal with four situations when training with limited memory. First, when data is larger than the model and the terms in dual objective function are either smooth or block separable. This situation happens frequently in the recently big data trend with L2-regularizer, including L2 Support Vector Machine (SVM), L2 Logistic Regression, and L2 Linear Regression. Second, when data is larger than the model and the non-separable part of dual objective function is non-smooth. It could happen when the regularizer is not L2 norm, including L1-regularizer in Lasso problem, nuclear norm in matrix completion problem, and a family of structure group norms. Third, when model is larger than the data and the terms in primal objective function are either smooth or block separable. This situation appears sometime in the extreme classification problem, which will have large model due to large number of classes and large number of features. Sparse coding is another area that the number of features is larger than the number of samples. Fourth, when model is larger than the data and the non-separable part of primal objective function is non-smooth. Multi-class SVM is an example of this situation, Hinge Loss causes the non-smoothness of the primal objective function. In this paper, we assume that either model or data is too large to fit into memory, and that the other one can fit into memory. We use dual block minimization to solve the problem when the data size is larger than the memory size, using primal block minimization to solve it when the model size is larger than the memory. To be specific, in the primal block minimization framework, we split the model into several blocks, store the model in the disk, sequentially load part of the model into memory and update this part only, write this part back to the disk, and read the next part. In the dual block minimization framework, we split data in to blocks and store it in the disk, sequentially load part of the data into memory, update the parameters base on this partial data, than load the next block of data. Also, we apply Augmented Lagrangian technique to force the smoothness of the corresponding objective function when it is non-smooth and non-block-separable to achieve global convergence for the general convex ERM problem. In particular, in the primal block minimization case, we add a strongly convex term to the dual objective in order to force the primal objective function to be smooth. In the dual block minimization case, we add a strongly convex term to the primal problem. This additional term will disappear when the model converge to near the optimal point. Therefore, this algorithm will converge to the same point achieved by the algorithm with sufficient memory. This framework is flexible in the sense that, given a solver working under sufficient memory, one can integrate it into our framework with additional term in the update rule if necessary and obtain a solver globally convergent under limited memory condition.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/78158
DOI:	10.6342/NTU201601838
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-R03922067-1.pdf 目前未授權公開取用	928.48 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。