一種合併多個搜尋結果之兩階段排序方法

You-Lin Lin; 林佑霖

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44531

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	鄭卜壬(Pu-Jen Cheng)
dc.contributor.author	You-Lin Lin	en
dc.contributor.author	林佑霖	zh_TW
dc.date.accessioned	2021-06-15T03:03:19Z	-
dc.date.available	2010-07-31
dc.date.copyright	2009-07-31
dc.date.issued	2009
dc.date.submitted	2009-07-30
dc.identifier.citation	[1] Spärck Jones, Karen. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11-21, 1972. [2] Salton, Gerard, Edward A. Fox & Harry Wu. Extended Boolean information retrieval. Communications of the ACM, 26(11):1022–1036, 1983. [3] G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Indexing. Communications of the ACM, 18(11): 613–620, 1975. [4] J M Ponte and W B Croft. A Language Modeling Approach to Information Retrieval. Research and Development in Information Retrieval, 275-281, 1998. [5] F Song and W B Croft. A General Language Model for Information Retrieval. Research and Development in Information Retrieval, 279-280, 1999. [6] S. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal ,31: 249-268, 2007. [7] Rob Schapire. Strength of Weak Learnability. Journal of Machine Learning,5:197-227, 1990. [8] Leo Breiman. Bagging predictors. Machine Learning, 24 (2): 123–140, 1996. [9] J. A. Aslam and M. Montague. Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 379-381, 2000. [10] E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The collection fusion problem. In Proceedings of the Third Text REtrieval Conference (TREC-3), 95-104, 1994. [11] E. A. Fox and J. A. Shaw. Combination of multiple searches. In Harman , 243-249. [12] J. A. Shaw and E. A. Fox. Combination of multiple searches. Overview of the Third Text REtrieval Conference (TREC-3), 105-108, 1995. [13] James P. Callan, Zhihong Lu, and W. Bruce Croft. Search distributed collections with inference networks. Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, 1995. [14] Jingfang Xu and Xing Li. Learning to Rank Collections. ACM SIGIR. 2007 [15] E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), National Institute of Standards and Technology Special Publication 500-215, pages 243–252, 1994. [16] J. A. Aslam and M. Montague. Models for metasearch. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 276–284, 2001. [17] The mathematics of voting: Democratic symmetry. The Economist, page 83, Mar. 2000. [18] D. G. Saari. Explaining all three-alternative voting outcomes. Journal of Economic Theory, 87(2):313–355, 1999. [19] J. A. Aslam and M. Montague. Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems. In SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, 379–381 2000. [20] David Lillis, Fergus Toolan, Rem Collier, and John Dunnion. ProbFuse: A Probabilistic Approach to Data Fusion. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. [21] Zhaohui Zheng, Hongyuan Zha, Keke Chen, and Gordon Sun. A Regression Framework for Learning Ranking Functions Using Relative Relevance Judgments. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. [22] Nir Ailon. Aggregation of Partial Rankings, p-Ratings and Top-m Lists. ACM SIAM Symposium on Discrete Algorithm.2006. [23] Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to Rank with Ties. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008. [24] Mark Montague and Javed A. Aslam. Condorcet Fusion for Improved Retrieval. In CIKM’02. ACM 11st Conference on Information and Knowledge Management. 2002. [25] Xiubo Geng, Tie-Yan Liu, Tao Qin, and Hang Li. Feature Selection for Ranking. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007. [26] Jeremy Pickens, Gene Golovchinsky. Ranked Feature Fusion Models for Ad Hoc Retrieval In CIKM’08. ACM 17th Conference on Information and Knowledge Management. 2008. [27] Fernando Mart´inez-Santiago • L. Alfonso Ure ˜na-L´opez • Maite Mart´in-Valdivia. A merging strategy proposal: The 2-step retrieval status value method. Inf Retrieval , 9: 71–93, 2006. [28] Qing Li, Sung-Hyon Myaeng, Yun Jin, and Bo-yeong Kang. Concept Unification of Terms in Different Languages for IR. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 641–648, 2006. [29] Rong Yan and Alexander G. Hauptmann. Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006. [30] Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-Rong Wen, Wei-Ying Ma. Web Object Retrieval. In WWW’07: Proceedings of the 16th International World Wide Web Conference, 2007. [31] Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference, 1994. [32] Peter D. Turney. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research ,2: 369-409, 1995. [33] Pedro Domingos. MetaCost: A General Method for Making Classifiers Cost-Sensitive. In KDD-99: The 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999. [34] Ian H. Witten and Eibe Frank (2005) 'Data Mining: Practical machine learning tools and techniques', 2nd Edition, Morgan Kaufmann, San Francisco, 2005. [35] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44531	-
dc.description.abstract	集合型搜尋旨在探討如何將多個搜尋系統的結果當成既有的資訊，利用這些資訊，合併產生一個較好的結果。在這篇論文當中，我們藉由機器學習的技術提出了一個新的兩階段排序演算法用來解決如何合併多的搜尋結果的問題。兩階段的排序方法是基於分類的概念，希望對於所有的文件都能先進行分類的動作，利用這些結果進行排序而產生最後的答案。在第一個階段，我們對所有的文件分成四種相關性程度。一但每個文件有了這些資訊，在第二階段我們利用線性組合的方法可以簡單的對這些文件近一步排序，進而得到最後的結果。在實驗的方面，我們將方法實作在NTCIR4英英文件的標準測試集上面，在我們的實驗中，兩階段排序方法的結果皆能夠顯著的勝過數個基準的方法所產生之結果，也證明我們的演算法是有效的。	zh_TW
dc.description.abstract	Metasearch is the problem that discusses how to combine the results of multiple independent search algorithms into one single result list and tries to improve the effectiveness of the retrieval. We propose a novel 2-stage ranking method to do this by applying the technology of machine learning. The 2-stage ranking method aims to use the concept of classification to solve the metasearch problem. In the first stage, we try to label each document in the search result with relevance or irrelevance by classification, where we discuss the differences between general classification and cost-sensitive classification in our algorithm. Once we have labeled all of the documents in the search result, in stage 2, we can use this information to produce the final ranking result by using linear combination. The 2-stage ranking method performs well on NTCIR4 English-English IR data. The experiment result shows that our method outperforms the existed metasearch algorithms and gives a significant improvement.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T03:03:19Z (GMT). No. of bitstreams: 1 ntu-98-R96922124-1.pdf: 639043 bytes, checksum: ab8a49e4f0333cb1dde4efc0c16b47ef (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	摘要 ii Abstract iii Acknowledgement iv Table of Contents v List of Figures vii List of Tables viii Chapter 1: Introduction 1 1.1 Motivations 1 1.2 Problem Specification 3 1.3 Basic Idea 5 1.4 Proposed Approaches 8 1.5 Thesis Organization 8 Chapter 2: Related Work 9 Chapter 3: Methodology 14 3.1 Framework 14 3.2 Settings 18 3.3 Stage 1: Classification 18 3.3.1 Feature Extraction 19 3.3.2 Classifier 21 3.3.3 Cost-Sensitive Classification 29 3.4 Stage 2: Ranking with Classes 33 Chapter 4: Experiment 36 4.1 Data Set 36 4.2 Environment 37 4.3 Measurement 37 4.4 Baseline Models 38 4.5 Experiment Results and Discussion 39 4.5.1 Exp 1: Methods Evaluation 39 4.5.2 Exp 2: Effect of Input Result’s Size 41 4.5.3 Exp 3: Feature Analysis 42 Chapter 5: Conclusion 45 5.1 Summary of Contributions 45 5.2 Future Work 46 Bibliography 47
dc.language.iso	en
dc.subject	集合型搜尋	zh_TW
dc.subject	metaseach	en
dc.subject	search result merging	en
dc.subject	learning to rank	en
dc.title	一種合併多個搜尋結果之兩階段排序方法	zh_TW
dc.title	A 2-Stage Ranking Method to Merge Multiple Search Results	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳信希(Hsin-Hsi Chen),曾新穆,張嘉惠
dc.subject.keyword	集合型搜尋,	zh_TW
dc.subject.keyword	metaseach,learning to rank,search result merging,	en
dc.relation.page	51
dc.rights.note	有償授權
dc.date.accepted	2009-07-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	624.07 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。