請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65286完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 孫雅麗 | |
| dc.contributor.author | Chun-Fu Chang | en |
| dc.contributor.author | 張淳富 | zh_TW |
| dc.date.accessioned | 2021-06-16T23:34:37Z | - |
| dc.date.available | 2015-07-27 | |
| dc.date.copyright | 2012-07-27 | |
| dc.date.issued | 2012 | |
| dc.date.submitted | 2012-07-27 | |
| dc.identifier.citation | [1] C .-J. Hsieh, K.-W. Chang, C.-J. Lin , S. S. Keerthi, and S. Sundararajan , “A dual coordinate descent method for large-scale linear SVM,” in ICML , 2008.
[2] S. Shalev-Shwartz, Y. Singer , and N. Srebro, “Pegasos : primal estimated sub-gradient solver for SVM,” in ICML , 2007. [3] A. Gionis, P. Indyk, and R. Motwani. Similarity Search in High Dimensions via Hashing. In Proceedings of the 25th Intl Conf. on Very Large Data Bases, 1999. [4] E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871– 1874, 2008. [5] R. Collobert, F. Sinz, J. Weston, and L. Bottou, “Trading Convexity for Scalability,” ICML, 2006 [6] J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509-517, 1975. [7] Gaelle Loosli, Stephane Canu, and Leon Bottou. Training invariant support vector machines using selective sampling. In Leon Bottou, Olivier Chapelle, Dennis DeCoste, and Jason Weston, editors, Large Scale Kernel Machines, pages 301-320. MIT Press, Cambridge, MA., 2007. [8] P. Jain, S. Vijayanarasimhan, and K. Grauman. Hashing Hyper- plane Queries to Near Points with Applications to Large-Scale Active Learning. In NIPS, 2010. [9] H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. In Proceedings of the 16th ACM SIGKDD 2010. [10] Support Vector Machine – http://en.wikipedia.org/wiki/Support_vector_machine [11] Kddcup1999 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [12] Covtype http://archive.ics.uci.edu/ml/datasets/Covertype [13] Mnist http://yann.lecun.com/exdb/mnist/ [14] Mnist8m http://leon.bottou.org/papers/loosli-canu-bottou-2006 | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65286 | - |
| dc.description.abstract | 近幾年來,線性分類器在大規模資料分類問題上有良好的發展與表現。然而,實際上仍存在著兩個重要的議題尚未被解決。第一個問題是現實生活中所收集到的資料有很大部分是沒有辦法被線性分類器所解釋,如果線性分類器將這些雜訊納入考慮的話,將會影響線性分類器的表現。第二個問題,對於一般使用者而言,記憶體的容量遠小於硬碟容量,所以很容易發生資料無法放進記憶體的情形,這時候線性分類器便需要不斷地在硬碟和記憶體之間不斷地讀取以及寫入,這是個非常花時間的過程。於是本篇論文提出一個索引架構套用在線性分類器的最佳化過程以同時解決這兩個議題。每筆資料都用高維度的特徵向量空間表示,我們將這些資料透過概似高維索引技術建立索引值,讓我們可以很有效率地取出具有用的資料,忽略對線性分類器有害的資料;也因為這樣,我們可以只將這些有用的資料讀進記憶體,進而同時解決兩個議題,
我們做了數個實驗比較我們的架構和其他目前最先進的方法,而結果顯示我們的架構有較佳的表現。 | zh_TW |
| dc.description.abstract | Recently, linear classifier has been shown to be able to handle large-scale classification problem well. However, there are two main issues accompanied by large-scale classification problem. First, there may exist many unexplainable or noise instances in the datasets which will hurt the linear classifier’s performance. Second, when data is too large to load in memory, the linear classifier will spend much time on reading/writing between memory and disk. In this thesis, we propose an indexing optimization framework to solve these two issues simultaneously. We apply approximate indexing technique on high dimensional features space to help us efficiently retrieve the informative instances rather than outliers, and so that we can only load those instances into memory.
We conduct several experiments to compare our framework with the state-of-the art methods, and the results show that we have a better performance. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T23:34:37Z (GMT). No. of bitstreams: 1 ntu-101-R99725033-1.pdf: 2917753 bytes, checksum: b04c516c3086afb0d56a20c324728575 (MD5) Previous issue date: 2012 | en |
| dc.description.tableofcontents | Table of Content
口試審定書 III 謝詞 IV 論文摘要 V THESIS ABSTRACT VI Table of Content VII Figure List IX Table List X Chapter 1. Introduction 1 Chapter 2. Preliminary Review 3 2.1 Linear Classifier with Outlier Detection 3 2.2 Approximate High Dimensional Indexing 5 Chapter 3. Methodology 7 3.1 Solving Ramp Loss 7 3.2 Tree-based Indexing 9 3.3 Framework to Solve Primal and Dual Problem 11 3.3.1 Primal Problem 11 3.3.2 Dual Problem 13 Chapter 4. Related Methods 16 4.1 Online Learning 16 4.2 Active Learning 16 4.3 Block Minimization 17 Chapter 5. Experiments 19 5.1 Datasets and Environment 19 5.2 Performance on Primal Problem 22 5.3 Performance on Dual Problem 25 5.3 Performance on Limited Memory 27 5.4 Indexing Property 28 Chapter 6. Discussion and Limitation 32 Chapter 7. Conclusion 34 Reference 35 | |
| dc.language.iso | zh-TW | |
| dc.subject | 支持向量機 | zh_TW |
| dc.subject | 機器學習  | zh_TW |
| dc.subject | Ramp loss | zh_TW |
| dc.subject | 高維索引 | zh_TW |
| dc.subject | high dimensional indexing | en |
| dc.subject | ramp loss | en |
| dc.subject | support vector machine | en |
| dc.subject | machine learning | en |
| dc.title | 利用高維索引技術解決大規模分類問題 | zh_TW |
| dc.title | Solving large-scale classification problem with approximate high dimensional indexing framework | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 100-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳孟彰,陳建錦,彭文志 | |
| dc.subject.keyword | 高維索引,Ramp loss,支持向量機,機器學習 , | zh_TW |
| dc.subject.keyword | high dimensional indexing,ramp loss,support vector machine,machine learning, | en |
| dc.relation.page | 36 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2012-07-27 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-101-1.pdf 未授權公開取用 | 2.85 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
