以資料選擇技術幫助大規模支持向量機排序

Ken-Yi Lin; 林庚毅

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46983

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林軒田
dc.contributor.author	Ken-Yi Lin	en
dc.contributor.author	林庚毅	zh_TW
dc.date.accessioned	2021-06-15T05:44:37Z	-
dc.date.available	2011-08-20
dc.date.copyright	2010-08-20
dc.date.issued	2010
dc.date.submitted	2010-08-19
dc.identifier.citation	[1] K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. pages 43–50. ACM Press, 2006. [2] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd Annual International Conference on Machine Learning, pages 89–96. ACM Press, 2005. [3] C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems 19, pages 193– 200. MIT Press, 2006. [4] J. P. Callan. Passage-level evidence in document retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 302–310. Springer-Verlag New York, Inc., 1994. [5] Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting Ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 186–193. ACM Press, 2006. [6] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 129–136. ACM Press, 2007. [7] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. 37 [8] O. Chapelle and S. S. Keerthi. Efficient algorithms for ranking with SVMs. Information Retrieval, 13(3):201–215, 2010. [9] C. Cortes, M. Mohri, and A. Rastogi. Magnitude-preserving ranking algorithms. In Proceedings of the 24th Annual International Conference on Machine Learning, pages 169–176. Omnipress, 2007. [10] D. Cossock and T. Zhang. Subset ranking using regression. In Proceedings of the 19th Annual Conference on Learning Theory, pages 605–619. Springer, 2006. [11] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R.Wang, and C.-J. Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9:1871– 1874, 2008. [12] R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6:1889– 1918, 2005. [13] G. W. Flake and S. Lawrence. Efficient SVM regression training with SMO. Machine Learning, 46(1-3):271–290, 2002. [14] A. Frank and A. Asuncion. UCI machine learning repository, 2010. [15] Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933–969, 2003. [16] R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132. MIT Press, 2000. [17] J. Herlocker, J. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 230–237. ACM Press, 1999. 38 [18] T. Joachims. Making large-scale support vector machine learning practical. In Advances in Kernel Methods: Support Vector Learning, pages 169–184. MIT Press, 1998. [19] T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th Annual International ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 133–142. ACM Press, 2002. [20] L. Li, A. Pratap, H.-T. Lin, and Y. S. Abu-Mostafa. Improving generalization by data categorization. In Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 157–168. Springer, 2005. [21] P. Li, C. J. C. Burges, and Q.Wu. McRank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems 18. MIT Press, 2007. [22] T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank - theory and algorithm. In Proceedings of the 25th Annual International Conference on Machine Learning, pages 1192–1199. ACM Press, 2008. [23] H.-Y. Lo, K.-W. Chang, S.-T. Chen, T.-H. Chiang, C.-S. Ferng, C.-J. Hsieh, Y.-K. Ko, T.-T. Kuo, H.-C. Lai, K.-Y. Lin, C.-H. Wang, H.-F. Yu, C.-J. Lin, H.-T. Lin, and S.-D. Lin. An ensemble of three classifiers for KDD Cup 2009: expanded linear model, Heterogeneous Boosting, and Selective Naive Bayes. Proceedings of KDD-Cup 2009 competition, vol. 7 of JMLR Workshop and Conference Proceedings, pages 57–64, 2009. [24] K. Pace and R. Johnson. StatLib-Datasets Archive, 2010. [25] D. Parra and P. Brusilovsky. Collaborative filtering for social tagging systems: an experiment with CiteULike. In Proceedings of the 3rd ACM conference on Recommender systems, pages 237–240. ACM Press, 2009. 39 [26] J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning, pages 185–208. MIT Press, 1998. [27] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. [28] T. Weixin and Z. Fuxi. Learning to rank using semantic features in document retrieval. In Proceedings of the 2009 (1st) WRI Global Congress on Intelligent Systems, pages 500–504. IEEE Computer Society, 2009. [29] J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 391–398. ACM Press, 2007. [30] D. Yimam. Expert finding systems for organizations: Domain analysis and the demoir approach. In ECSCW 99 Beyond Knowledge Management: Management Expertise Workshop, pages 276–283. MIT Press, 2000. [31] Y. Yue and T. Finley. A support vector method for optimizing average precision. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 271–278. ACM Press, 2007. [32] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21th Annual International Conference on Machine Learning, pages 919–926. Omnipress, 2004. 40
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46983	-
dc.description.abstract	Learning to rank has become a popular research topic in several areas including information retrieval and machine learning. Pair-wise ranking, which learns all the order preferences between every two examples, is one typical method for solving the ranking problem. In pair-wise ranking, RankSVM is a widely-used machine learning algorithm and has been successfully applied to the ranking problem in the previous work. However, RankSVM suffers a critical problem which is the long training time because of the huge number of pairs. In this thesis, we propose a data selection technique, Pruned RankSVM, that selects the most informative pairs before training. If we use partial pairs instead of total ones, we can train a large-scale data set efficiently. In the experiment, we show the performance of Pruned RankSVM is overall comparable with RankSVM while using significantly fewer pairs. To show the efficiency of Pruned RankSVM, we also compare with one point-wise ranking algorithm : support vector regression. Experimental results demonstrate that Pruned RankSVM outperforms support vector regression on most data sets.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T05:44:37Z (GMT). No. of bitstreams: 1 ntu-99-R97922117-1.pdf: 2963993 bytes, checksum: 5f8689b37c0493ed05b00db62b97016e (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	致謝i 中文摘要iii Abstract v 1 Introduction 1 1.1 Ranking Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Pair-wise Ranking 5 2.1 Existing Pair-wise Ranking Algorithms . . . . . . . . . . . . . . . . . . 5 2.1.1 RankSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 RankLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Problem of Pair-wise Ranking Algorithms . . . . . . . . . . . . . . . . . 9 2.3 Why RankSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Pruned RankSVM with Closest Pairs 11 3.1 Informative Pairs in RankSVM . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Support Vector Distribution . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Important Pairs during Optimization . . . . . . . . . . . . . . . . 15 3.2 Pruned RankSVM (closest) . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.1 Experiment on Artificial Data . . . . . . . . . . . . . . . . . . . 18 3.2.2 Experiment on Real-world Data . . . . . . . . . . . . . . . . . . 20 3.3 Extended Approach with Numerical Information . . . . . . . . . . . . . 21 3.3.1 Order-based versus Label-based . . . . . . . . . . . . . . . . . . 22 3.3.2 Emphasis on Label-closest Pairs . . . . . . . . . . . . . . . . . . 23 3.3.3 Emphasis on Label-farthest Pairs . . . . . . . . . . . . . . . . . . 24 3.3.4 Comparison with Pruned RankSVM (closest) . . . . . . . . . . . 25 4 Pruned RankSVM with Mixed Pairs 29 4.1 Pruned RankSVM (mixed) . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Pruned RankSVM (mixed) on Artificial Data . . . . . . . . . . . 30 4.1.2 Pruned RankSVM (mixed) on Real-world Data . . . . . . . . . . 31 4.2 Comparison between Closest and Mixed Technique . . . . . . . . . . . . 32 4.3 Coupling Pruned RankSVM (mixed) with Shrinking . . . . . . . . . . . 32 vii 5 Conclusion 35 Bibliography 37 viii
dc.language.iso	en
dc.subject	支持向量機排序法	zh_TW
dc.subject	排序學習	zh_TW
dc.subject	排序問題	zh_TW
dc.subject	資料選擇技術	zh_TW
dc.subject	data selection	en
dc.subject	RankSVM	en
dc.subject	pair-wise ranking	en
dc.subject	learning to rank	en
dc.title	以資料選擇技術幫助大規模支持向量機排序	zh_TW
dc.title	Data selection techniques for large-scale RankSVM	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	徐宏民,李育杰,王鈺強
dc.subject.keyword	排序問題,排序學習,支持向量機排序法,資料選擇技術,	zh_TW
dc.subject.keyword	learning to rank,pair-wise ranking,RankSVM,data selection,	en
dc.relation.page	40
dc.rights.note	有償授權
dc.date.accepted	2010-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	2.89 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。