Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47514
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor鄭卜壬
dc.contributor.authorTing-Chu Linen
dc.contributor.author林庭竹zh_TW
dc.date.accessioned2021-06-15T06:03:44Z-
dc.date.available2012-08-20
dc.date.copyright2010-08-20
dc.date.issued2010
dc.date.submitted2010-08-16
dc.identifier.citation[1] N. Ailon. Aggregation of partial rankings, p-ratings and top-m lists. In SODA ’07: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 415–424, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.
[2] J. A. Aslam and M. Montague. Models for metasearch. In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 276–284, New York, NY, USA, 2001. ACM.
[3] J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR ’95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21– 28, New York, NY, USA, 1995. ACM.
[4] N. K. G. E. M. Voorhees and B. Johnson-Laird. The collection fusion problem. In The Third Text REtrieval Conference (TREC-3), pages 95–104, 1994.
[5] J. A. S. Edward A. Fox. Combination of multiple searches. In The 2nd Text REtrieval Conference (TREC-2), page 243–252. National Institute of Standards and Technology Special Publication 500-215, 1994.
[6] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT ’95: Proceedings of the Second European Conference on Computational Learning Theory, pages 23–37, London, UK, 1995. Springer-Verlag.
[7] X. Geng, T.-Y. Liu, T. Qin, and H. Li. Feature selection for ranking. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 407–414, New York, NY, USA, 2007. ACM.
[8] J. H. Lee. Analyses of multiple evidence combination. SIGIR Forum, 31(SI):267– 276.
[9] Q. Li, S.-H. Myaeng, Y. Jin, and B.-y. Kang. Concept unification of terms in different languages for ir. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 641–648, Morristown, NJ, USA, 2006. Association for Computational Linguistics.
[10] D. Lillis, F. Toolan, R. Collier, and J. Dunnion. Probfuse: a probabilistic approach to data fusion. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 139–146, New York, NY, USA, 2006. ACM.
[11] F. Mart’ınez-Santiago, L. A. Ure na-L’opez, and M. Mart’ın-Valdivia. A merging strategy proposal: The 2-step retrieval status value method. Inf. Retr., 9(1):71–93, 2006.
[12] M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management, pages 538–548, New York, NY, USA, 2002. ACM.
[13] Z. Nie, Y. Ma, S. Shi, J.-R. Wen, and W.-Y. Ma. Web object retrieval. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 81–90, New York, NY, USA, 2007. ACM.
[14] J. Pickens and G. Golovchinsky. Ranked feature fusion models for ad hoc retrieval. In CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management, pages 893–900, New York, NY, USA, 2008. ACM.
[15] R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Mach. Learn., 37(3):297–336, 1999.
[16] J. Xu and X. Li. Learning to rank collections. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 765–766, New York, NY, USA, 2007. ACM.
[17] R. Yan and A. G. Hauptmann. Probabilistic latent query analysis for combining multiple retrieval sources. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 324–331, New York, NY, USA, 2006. ACM.
[18] Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 287–294, New York, NY, USA, 2007. ACM.
[19] K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Learning to rank with ties. In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275–282, New York, NY, USA, 2008. ACM.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47514-
dc.description.abstract搜尋結果合併問題,是在於合併數個資訊檢索系統或搜尋引擎的結果,以達到更準確的相關排序。提升搜尋引擎的品質有很多方法,合併數種不同的搜尋引擎是其中一個研究方向,如何能截長補短,是各家研究的重點。有效地合併數個資料檢索系統的結果,在眾多研究當中已顯示對於增加排序的準確性有顯著功效。它已被證明,這可以提高檢索效率和精度超過該原來的數個資訊檢索系統。
本文的目標是,選擇一個訓練數據的子集合,其中以此子集合為訓練數據,產生的融合模型,以此融合模型融合所有訓練數據,會得到最多的改善。我們已probF use算法[10]為例。提出兩種方法:貪婪演算法和皮匠演算法,貪婪的方法有兩種選擇:獨立選得訓練數據與考慮訓練數據之間的關係。皮匠演算法是一種數據融合問題的框架,他會選出數個訓練數據的子集,以這些子集產生融合模型,每一個融合模型,都針對訓練數據的部分做做最佳化,最後線性將這些融合模型合併起來成最終融合模型。
經由訓練數據的篩選,我們提高了合併之後的搜尋結果,大量的實驗包含TREC – 3,4,5及NTCIR – 3,4顯示,經過訓練數據篩選的融合演算法的融合結果,顯著地較相同的融合演算法卻沒有經過數據篩選更好,因為選擇了良好的訓練數據產生適當的融合模型。我們提出兩個有用的訓練數據篩選方法,並且定義成正式的框架,任何融合演算法都可廣泛應用。
zh_TW
dc.description.abstractSearch-result merging is to merge several results from different search en- gines to get better performance. Several early studies have shown combining different information retrieval (IR) models can greatly improve the retrieval effectiveness and accuracy over any individual model can get. In machine learning and statistics applications, researchers often apply data fusion algorithm to ensemble the results from different models to combine their different abilities as well.
One of the state of arts data fusion algorithms in recent years is probF use [10]. It considers the past information of each model that is then used to predict the confidence of the model. Although it has shown promising performance in many studies, it doesn’t take the diversity of query into consideration together but the model only. In our analysis for the experiments performed on TREC-3,4,5 and NTCIR-3,4, we found that the performance of one model varies from one query to another. Inspired by the discovery, we assume that not all training examples are effective.
We proposed two novel approaches, Greedy approach and Boosting approach, to select training data to optimize the improvement from data fusion algorithm. Greedy approach has two selection policies, dependent and independent, and both of them greedily select training examples. Dependent selection takes the concurrence of training examples into account; independent selection chooses every training example individually one after one. Boosting approach is a framework we design for data fusion problem, which emphasizes different training examples and generates a linear ensemble base on the weights of the different training data.
Extensive experiments were performed on several data sets, including TREC-3,4,5 and NTCIR-3,4, the outcome was very promising. With either of our data selection methods, probF use algorithm clearly performs better than before.
Our work can not only improved the effectiveness of existing fusion algorithm, but also reduce the training time consuming. One can apply the boosting framework we proposed to any data fusion algorithm on the fly as well.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T06:03:44Z (GMT). No. of bitstreams: 1
ntu-99-R97922134-1.pdf: 3288384 bytes, checksum: c9a17c5b75644438fc2fb782124eb5d1 (MD5)
Previous issue date: 2010
en
dc.description.tableofcontentsContents
致謝iii
中文摘要v
Abstract vii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Supervised Learning for Search Result Merging . . . . . . . . . . . . . . 3
1.3 Diversity of Search results . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Proposed Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Related Work 11
3 Methodology 15
3.1 Greedy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Independently Greedy Approach . . . . . . . . . . . . . . . . . . 15
3.1.2 Dependently Greedy Approach . . . . . . . . . . . . . . . . . . . 18
3.2 Boosting Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Boosting Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 A Generalize Version of AdaBoost . . . . . . . . . . . . . . . . . 23
3.2.3 Data fusion Version of Boosting . . . . . . . . . . . . . . . . . . 27
3.2.4 Base Learner: probF use algorithm with data selection . . . . . . 30
3.2.5 Analysis of data fusion version of boosting . . . . . . . . . . . . 31
4 Experiment 35
4.1 Description of Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Exp 1: Greedy Approach . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Exp 2: Boosting Approach . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5 Discussion 45
6 Conclusion and Future Works 49
Bibliography 51
dc.language.isoen
dc.subject資訊檢索zh_TW
dc.subjectinformation retrievalen
dc.title監督式學習之搜尋結果合併問題中訓練資料篩選方法zh_TW
dc.titleTraining Data Selection for Supervised Learning Based Search-result Mergingen
dc.typeThesis
dc.date.schoolyear98-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳信希,盧文祥
dc.subject.keyword資訊檢索,zh_TW
dc.subject.keywordinformation retrieval,en
dc.relation.page49
dc.rights.note有償授權
dc.date.accepted2010-08-16
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-99-1.pdf
  未授權公開取用
3.21 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved