Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor鄭卜壬(Pu-Jen Cheng)
dc.contributor.authorYi-Chun Linen
dc.contributor.author林怡君zh_TW
dc.date.accessioned2021-06-13T15:27:45Z-
dc.date.available2009-07-30
dc.date.copyright2008-07-30
dc.date.issued2008
dc.date.submitted2008-07-17
dc.identifier.citation[1] Kwok, K. L. (2005). “An Attempt to Identify Weakest and Strongest Queries.” Proceeding of the ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, Salvador, Brazil, 2005.
[2] Yom-Tov, E., Fine, S., Carmel, D., Darlow, A. and Amitay, E. Juru. (2004). “Experiments with Prediction of Query Difficulty.” Proceeding of the 13th Text Retrieval Conference (TREC-2004).
[3] Macdonald, C., He, B. and Ounis, I. (2005). “Predicting Query Performance in Intranet Search.” Proceeding of the ACM SIGIR 2005 Workshop on Predicting Query Difficulty - Methods and Applications, Salvador, Brazil, 2005.
[4] Cronen-Townsend, S., Zhou, Y. and Croft, W. B. (2002). “Predicting Query Performance.” Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[5] Ben He and Iadh Ounis. (2004). “Inferring Query Performance Using Pre-Retrieval Predictors.” Proceedings of the 11th International Conference of String Processing and Information Retrieval (SPIRE 2004), 43–54.
[6] Ben He and Iadh Ounis. (2006). “Query Performance Prediction.” Proceedings of the Information Systems of Elsevier Science.
[7] Christina Lioma and Iadh Ounis. (2006). “Examining the Content Load of Part of Speech Blocks for Information Retrieval.” Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions.
[8] Amati, G., Carpineto, C. and Romano, G. (2004). “Query Difficulty, Robustness, and Selective Application of Query Expansion.” Proceedings of the 26th European Conference on IR Research, ECIR 2004, Sunderland, UK, 2004.
[9] Christof Monz. (2007). “Model Tree Learning for Query Term Weighting in Question Answering.” Proceedings of the 29th European Conference on Information Retrieval, ECIR 2007.
[10] David Carmel, Elad Yom-Tov, and Ian Soboroff. (2004). “SIGIR WORKSHOP REPORT: Predicting Query Difficulty - Methods and Applications.” Proceeding of the ACM SIGIR 2005.
[11] Donna Harman, and Chris Buckley. (2004). “The NRRC Reliable Information Access (RIA) Workshop.” Proceeding of the ACM SIGIR 2004.
[12] Pu-Jen Cheng, Jei-Wen Teng, Ruei-Cheng Chen, Jenq-Haur Wang, Wen-Hsiang Lu, and Lee-Feng Chien. (2004). “Translating Unknown Queries with Web Corpora for Cross-Language Information Retrieval”, Proceeding of the ACM SIGIR 2004.
[13] Surendra Sarnikar, Zhu Zhang, and J.Leon Zhao. (2007). “Predicting Query Performance in Domain-Specific Corpora.” Proceeding of IEEE 2007.
[14] Vishwa Vinay, Ingemar J. Cox. and Natasa Milic-Fraying, Ken Wood. (2006). “On Ranking the Effectiveness of Searches.” Proceeding of the ACM SIGIR Conference.
[15] Giridhar Kumaran and James Allan. (2007). “A Case for Shorter Queries, and Helping Users Create Them.” Proceeding of Association for Computational Linguistics , ACL 2007.
[16] Yun Zhou and W. Bruce Croft. (1992). “Ranking Robustness: A Novel Framework to Predict Query Performance.” Proceedings of the 15th ACM international conference on Information and knowledge management, 567 – 574.
[17] David Carmel, Elad Yom-Tov, Adam Darlow, and Dan Pelleg. (2006). “What Makes A Query Difficult?” Proceedings of the 29th ACM SIGIR Proceedings, 390–397.
[18] Mandl, T. and Womser-Hacker, C. (2002). “Linguistic and Statistic Analysis of the CLEF Topics.” Proceedings of Cross-Language Evaluation Forum (CLEF) Workshop.
[19] Andreas Krause, Jure Leskovec, and Carlos Guestrin. (2006). “Data Association for Topic Intensity Tracking.” Proceedings of International Conference on Machine Learning.
[20] Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, and Jinwen Ma. (2004). “Learning to Cluster Web Search Results.” Proceedings of the 27th ACM SIGIR conference on Research and development in information retrieval, 210 – 217.
[21] Grivolla, J., Jourlin, P. and de Mori, R. (2005). ”Automatic Classification of Queries by Expected Retrieval Performance.” Proceeding of the ACM SIGIR conference.
[22] Yom-Tov, E, Fine, S, Carmel, D, Darlow, A and Amitay,E. (2004). “Improving Document Retrieval According to Prediction of Query Difficulty.” Working Notes of Text Retrieval Conference (TREC 2004) Gaithersburg, MD , 393-402.
[23] Steve R. Gunn. (1998). “Support Vector Machines For Classification and Regression.”
[24] Eugene Agichtein, Steve Lawrence, and Luis Gravano. (2004). “Learning to Find Answers to Questions on the Web. Proceeding of the ACM Transactions on Internet Technology, 129–162.
[25] Donna Harman, and Chris Buckley. (2004). “Reliable Information Access Final Workshop Report.” Proceeding of the Nevada Rangeland Resources Commission.
[26] R. Baeza-Yates and B. Ribeiro-Neto. (1999). “Modern Information Retrieval.” Published by Addison Wesley.
[27] S. Gauch, J. Wang, and S. M. Rachakonda. (1999). “A Corpus Analysis Approach for Automatic Query Expansion and its Extension to Multiple Databases.” Proceeding of the ACM Transactionson Information Systems, 250–269.
[28] C. Carpineto, R. De Mori, G. Romano, and B. Bigi. (2001). “An Information-Theoretic Approach to Automatic Query Expansion.” Proceeding of the ACM Transactions on Information Systems, 1–27.
[29] K. L. Kwok. (1996). “A New Method of Weighting Query Terms for Ad-hoc Retrieval.” Proceeding of the ACM SIGIR Conference, 187-195.
[30] D. Carmel, E. Farchi, Y. Petruschka, and A. Soffer. (2002). “Automatic Query Refinement Using Lexical Affinities with Maximal Information Gain.” Proceeding of the ACM SIGIR Conference, 283-290.
[31] G. Salton and C. Buckley. (1988). “Term Weighting Approaches in Automatic Text Retrieval.” Proceeding of Information Processing and Management, 513-523.
[32] Youjin Chang, Iadh Ounis, and Minkoo Kim. (2005). “Query Reformulation Using Automatically Generated Query Concepts from A Document Space.” Proceeding of Information Processing and Management of Elsevier Science.
[33] Mothe, J. and Tanguy, L. (2005). “Linguistic Features to Predict Query Difficulty.” Proceeding of the ACM SIGIR Conference.
[34] Rosie Jones, and Daniel C. Fain. (2003). 'Query Word Deletion Prediction.' Proceeding of the ACM SIGIR Conference.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426-
dc.description.abstract隨著網路的普及與發展,使用者慣於透過資訊檢索系統搜尋未知的知識文章,然而,使用者無法得知查詢詞在語料庫的背後文章特徵,進一步的評估查詢詞的重要性,事實證明,使用者認為重要的查詢字在資訊檢索系統上並非是重要的查詢字。基於這樣的理由,文中試著透過一個學習機制來評估查詢字在資訊檢索中的效能,幫助使用者決定查詢詞在資訊檢索上的重要性。
本文探討兩個重要主題,首先,我們發展出一個學習機制對查詢字進行效能預測,再者,透過此學習機制,自動地根據查詢詞的特徵進行查詢詞重製,以產生較精確、具有較好資訊檢索效能的查詢詞。文中考慮了三個層面的查詢詞特徵,語言特徵、統計特徵(包含語料庫內容特徵)以及其他論文提及的特徵模型,發現使用多種特徵的組合其效能較單獨的特徵來得好,以及,統計上的特徵表現得較語言的特徵來得好,但是,儘管如此,統計的特徵必須付出較高的運算成本。透過這些查詢詞特徵,我們選用了分類以及線性迴歸訓練模型預測查詢詞之效能,接著,我們利用兩種演算法來重製查詢詞,並將重製後的查詢詞實作於NTCIR4、NTCIR5以改進原有之效能。
實驗驗證本文提出的系統確實能改善約8% 的平均效能,以及不論使用分類或迴歸訓練模型,其在不同的檢索模型、不同的語料庫上皆能獲得較好的資訊檢索效能。
zh_TW
dc.description.abstractAs the broadness and convenience of web in recent years, users are used to look for the unknown knowledge through the web search. However, users didn't have enough information about statistical characteristics of corpora to estimate the effectiveness of queries and the importance of query terms. The important keyword terms users believed may not be real important keywords for retrieval system. Based on this reason, in our work, we try to measure the effectiveness of a query term to help users determine what query term is important, then employ a mechanism for information retrieval.
In the light of the above, we develop a learning function to measure the impact of query terms and create concise high-quality reformulations of queries automatically by exploring these queries characteristics. The features we taken are linguistic, co-occurrence, contextual features, and other features papers sited. In general, we found that the effectiveness of combination of features performed better than features alone, and statistic features performed better than linguistic features. But, although the statistic features performed better than other features, they were not the best features due to the higher cost computing. Using the features and performance measure, we select the classification and regression training model to measure the effectiveness of query terms. Then, we reformulate queries with generation and reduction procedures and perform the reformulate query on the benchmark of NTCIR-4 and NTCIR-5.
Our experiments implemented with NTCIR4 and NTCIR5 reveal that we will improve the mean average precision on average 8% better than the baseline and can be applicative on different retrieval models, different training models, and different topics.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T15:27:45Z (GMT). No. of bitstreams: 1
ntu-97-R95922021-1.pdf: 1106726 bytes, checksum: 90a8b1d3a551d090ed62ab93dadeb153 (MD5)
Previous issue date: 2008
en
dc.description.tableofcontents口試委員審定書...........................................................................................i
致謝 ii
中文摘要 iii
Abstract iv
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Main Issues 3
1.3 Thesis Structure 4
Chapter 2 Related Work 5
2.1 Query Term Weighting & Expansion & Reformulation 5
2.2 Query Performance Prediction 10
Chapter 3 Approach Overview 15
3.1 Observation 15
3.2 Problem Definition 17
3.3 Procedure of Query Term Measurement 18
Chapter 4 Measurement of the impact of Query Term 21
4.1 Term Impact as a Classification Problem 21
4.2 Term Impact as a Regression Problem 24
4.3 Features 27
4.3.1 Linguistic Features – Class 1 28
4.3.2 Co-occurrence Statistics Features – Class 2 and Class 4 28
4.3.3 Context Features – Class 3 and Class 5 31
4.3.4 Other Features 32
Chapter 5 Query Reformulation for Information Retrieval 35
5.1 Generation 35
5.2 Reduction 37
Chapter 6 Experiments and Discussions 41
6.1 Performance of SVM Classifier 43
6.2 Feature Importance Analysis 45
6.3 Performance Evaluation on Information Retrieval 52
6.3.1 Select Query Terms from NTCIR-4 Description Queries(Cross
Validation) 53
6.3.2 Select Query Terms from NTCIR-5 Description Queries 61
6.3.3 Select Query Terms from NTCIR-4 Description Queries with Title
Model 67
6.3.4 Select Query Terms from NTCIR-4 Title and Description Queries 70
6.3.5 Extend Query Terms from NTCIR-4 Description Queries 73
Chapter 7 Conclusion and Future Work 75
7.1 Conclusion 75
7.2 Future Work 76
7.2.1 Context-Based Retrieval 77
7.2.2 Query Term Weighting 78
References 81
dc.language.isoen
dc.subject查詢詞效能預估zh_TW
dc.subject查詢詞重製zh_TW
dc.subject查詢詞zh_TW
dc.subjectquery performance predictionen
dc.subjectqueryen
dc.subjectquery reformulationen
dc.title查詢詞在資訊檢索中之效能評估zh_TW
dc.titleLearning to Measure the Effectiveness of Query Terms for Information Retrievalen
dc.typeThesis
dc.date.schoolyear96-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳信希(Hsin-Hsi Chen),梁婷(Tyne Liang)
dc.subject.keyword查詢詞,查詢詞效能預估,查詢詞重製,zh_TW
dc.subject.keywordquery,query performance prediction,query reformulation,en
dc.relation.page84
dc.rights.note有償授權
dc.date.accepted2008-07-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-97-1.pdf
  Restricted Access
1.08 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved