Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57520
Title: | 大規模羅吉斯回歸與線性支持向量機在Spark上之應用 Large-scale Logistic Regression and Linear Support Vector Machines Using Spark |
Authors: | Chieh-Yen Lin 林玠言 |
Advisor: | 林智仁(Chih-Jen Lin) |
Keyword: | 大規模學習,分散式運算,羅吉斯回歸,支持向量機,牛頓法, large scale learning,distributed computing,logistic regression,support vector machine,Newton method, |
Publication Year : | 2014 |
Degree: | 碩士 |
Abstract: | 對於大規模分類問題之學習,羅吉斯回歸與線性支持向量機都是相當有用的方法。然而,此兩種模型的分散式實作,並沒有被徹底及完整地研究。另外,因為典型的映射化簡架構對於機器學習的迭代法之實作遭受到計算效率的瓶頸,所以叢集式記憶體內的運算平台─Spark在最近數年內逐漸嶄露頭角。由於Spark對於資料處理與分析的能力,此平台成為一個被廣泛使用的架構。在這篇論文裡,我們提出牛頓法之分散式演算法,並實作於Spark上。我們點出與分析會強烈影響計算效能與溝通時間的細節,並對這些問題提出解決辦法。最後,在經過謹慎的考量與研究後,我們將此論文中提出的演算法實作為一個有效率並且公開的工具以供使用。 Logistic regression and linear SVM are useful methods for large-scale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an in-memory cluster-computing platform, has been proposed. It has emerged as a popular framework for large-scale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easy-to-use tool for the Spark community. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57520 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 資訊網路與多媒體研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-103-1.pdf Restricted Access | 1.46 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.