Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69054
Title: | 牛頓法於大規模線性分類問題的預處理器研究 A Study on Preconditioners in Newton Method for Large-scale Linear Classification |
Authors: | Chih-Yang Hsia 夏誌陽 |
Advisor: | 林智仁(Chih-Jen Lin) |
Keyword: | 大規模線性分類,預處理共軛梯度,牛頓法, large-scale linear classification,preconditioned conjugate gradient,Newton method, |
Publication Year : | 2017 |
Degree: | 碩士 |
Abstract: | 截斷牛頓法在大規模線性分類問題中是最有效的最優化方法之一。其最主要的計算是在每個牛頓迭代中,利用像共軛梯度的迭代法去近似解出一個二次子問題。已知在子問題是非良置條件時,共軛梯度法的收斂速度很慢。預處理共軛梯度法是個常用來改善共軛梯度法收斂性的技術,但要找到能在所有情況都表現良好的預處理器是困難的,而且Hessian-free 的最優化方法通常用來處理大型的資料,許多現有的預處理器無法直接使用。本篇論文中,我們深入研究一些曾經被使用在線性分類問題上的預處理器,並指出他們在某些情況並沒有辦法增進訓練速度。經過深入研究,藉由避免預處理器造成比原本更壞的條件數,我們提出一種簡單且有效的辦法使得預處理共軛梯度法在截斷牛頓架構中更能適應各種情況,並且給出理論上的驗證。最後透過仔細設計的實驗,顯示我們的方法能有效地減少模型在大型問題上面的訓練時間。 Truncated Newton method is one of the most effective optimization methods for large-scale linear classification. The main computational task at each Newton iteration is to approximately solve a quadratic sub-problem by an iterative method such as the conjugate gradient (CG) method. It is known that CG has slow convergence if the sub-problem is ill-conditioned. Preconditioned CG (PCG) methods have been used to improve the convergence of the CG method, but it is difficult to find a preconditioner that performs well in most situations. Further, because Hessian-free optimization techniques are incorporated for handling large data, many existing preconditioners are not directly applicable. In this work, we detialedly study some preconditioners that have been considered in past works for linear classification. We show that these preconditioners may not help to improve the training speed in some cases. After some investigation, we propose a simple and effective technique to make the PCG method more robust in a truncated Newton framework. The idea is to avoid the situation when a preconditioner leads to a much worse condition number than when it is not applied. We provide theoretical justification. Through carefully designed experiments, we demonstrate that our method can effectively reduce the training time for large-scale problems. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69054 |
DOI: | 10.6342/NTU201701767 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 資訊工程學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-106-1.pdf Restricted Access | 6.65 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.