Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65286
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor孫雅麗
dc.contributor.authorChun-Fu Changen
dc.contributor.author張淳富zh_TW
dc.date.accessioned2021-06-16T23:34:37Z-
dc.date.available2015-07-27
dc.date.copyright2012-07-27
dc.date.issued2012
dc.date.submitted2012-07-27
dc.identifier.citation[1] C .-J. Hsieh, K.-W. Chang, C.-J. Lin , S. S. Keerthi, and S. Sundararajan , “A dual coordinate descent method for large-scale linear SVM,” in ICML , 2008.
[2] S. Shalev-Shwartz, Y. Singer , and N. Srebro, “Pegasos : primal estimated sub-gradient solver for SVM,” in ICML , 2007.
[3] A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th Intl Conf. on Very Large Data Bases, 1999.
[4] E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871– 1874, 2008.
[5] R. Collobert, F. Sinz, J. Weston, and L. Bottou, “Trading Convexity for Scalability,” ICML, 2006
[6] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975.
[7] Gaelle Loosli, Stephane Canu, and Leon Bottou. Training invariant support vector machines using selective sampling. In Leon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston, editors, Large Scale Kernel Machines, pages 301-320. MIT Press, Cambridge, MA., 2007.
[8] P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing Hyper- plane Queries to Near Points with Applications to Large-Scale Active Learning. In NIPS, 2010.
[9] H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. In Proceedings of the 16th ACM SIGKDD 2010.
[10] Support Vector Machine – http://en.wikipedia.org/wiki/Support_vector_machine
[11] Kddcup1999 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[12] Covtype http://archive.ics.uci.edu/ml/datasets/Covertype
[13] Mnist http://yann.lecun.com/exdb/mnist/
[14] Mnist8m http://leon.bottou.org/papers/loosli-canu-bottou-2006
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65286-
dc.description.abstract近幾年來,線性分類器在大規模資料分類問題上有良好的發展與表現。然而,實際上仍存在著兩個重要的議題尚未被解決。第一個問題是現實生活中所收集到的資料有很大部分是沒有辦法被線性分類器所解釋,如果線性分類器將這些雜訊納入考慮的話,將會影響線性分類器的表現。第二個問題,對於一般使用者而言,記憶體的容量遠小於硬碟容量,所以很容易發生資料無法放進記憶體的情形,這時候線性分類器便需要不斷地在硬碟和記憶體之間不斷地讀取以及寫入,這是個非常花時間的過程。於是本篇論文提出一個索引架構套用在線性分類器的最佳化過程以同時解決這兩個議題。每筆資料都用高維度的特徵向量空間表示,我們將這些資料透過概似高維索引技術建立索引值,讓我們可以很有效率地取出具有用的資料,忽略對線性分類器有害的資料;也因為這樣,我們可以只將這些有用的資料讀進記憶體,進而同時解決兩個議題,
我們做了數個實驗比較我們的架構和其他目前最先進的方法,而結果顯示我們的架構有較佳的表現。
zh_TW
dc.description.abstractRecently, linear classifier has been shown to be able to handle large-scale classification problem well. However, there are two main issues accompanied by large-scale classification problem. First, there may exist many unexplainable or noise instances in the datasets which will hurt the linear classifier’s performance. Second, when data is too large to load in memory, the linear classifier will spend much time on reading/writing between memory and disk. In this thesis, we propose an indexing optimization framework to solve these two issues simultaneously. We apply approximate indexing technique on high dimensional features space to help us efficiently retrieve the informative instances rather than outliers, and so that we can only load those instances into memory.
We conduct several experiments to compare our framework with the state-of-the art methods, and the results show that we have a better performance.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T23:34:37Z (GMT). No. of bitstreams: 1
ntu-101-R99725033-1.pdf: 2917753 bytes, checksum: b04c516c3086afb0d56a20c324728575 (MD5)
Previous issue date: 2012
en
dc.description.tableofcontentsTable of Content
口試審定書 III
謝詞 IV
論文摘要 V
THESIS ABSTRACT VI
Table of Content VII
Figure List IX
Table List X
Chapter 1. Introduction 1
Chapter 2. Preliminary Review 3
2.1 Linear Classifier with Outlier Detection 3
2.2 Approximate High Dimensional Indexing 5
Chapter 3. Methodology 7
3.1 Solving Ramp Loss 7
3.2 Tree-based Indexing 9
3.3 Framework to Solve Primal and Dual Problem 11
3.3.1 Primal Problem 11
3.3.2 Dual Problem 13
Chapter 4. Related Methods 16
4.1 Online Learning 16
4.2 Active Learning 16
4.3 Block Minimization 17
Chapter 5. Experiments 19
5.1 Datasets and Environment 19
5.2 Performance on Primal Problem 22
5.3 Performance on Dual Problem 25
5.3 Performance on Limited Memory 27
5.4 Indexing Property 28
Chapter 6. Discussion and Limitation 32
Chapter 7. Conclusion 34
Reference 35
dc.language.isozh-TW
dc.subject支持向量機zh_TW
dc.subject機器學習&#8195zh_TW
dc.subjectRamp losszh_TW
dc.subject高維索引zh_TW
dc.subjecthigh dimensional indexingen
dc.subjectramp lossen
dc.subjectsupport vector machineen
dc.subjectmachine learningen
dc.title利用高維索引技術解決大規模分類問題zh_TW
dc.titleSolving large-scale classification problem with approximate high dimensional indexing frameworken
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳孟彰,陳建錦,彭文志
dc.subject.keyword高維索引,Ramp loss,支持向量機,機器學習&#8195,zh_TW
dc.subject.keywordhigh dimensional indexing,ramp loss,support vector machine,machine learning,en
dc.relation.page36
dc.rights.note有償授權
dc.date.accepted2012-07-27
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
2.85 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved