查詢詞在資訊檢索中之效能評估

Yi-Chun Lin; 林怡君

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426

標題:	查詢詞在資訊檢索中之效能評估 Learning to Measure the Effectiveness of Query Terms for Information Retrieval
作者:	Yi-Chun Lin 林怡君
指導教授:	鄭卜壬(Pu-Jen Cheng)
關鍵字:	查詢詞,查詢詞效能預估,查詢詞重製, query,query performance prediction,query reformulation,
出版年 :	2008
學位:	碩士
摘要:	隨著網路的普及與發展，使用者慣於透過資訊檢索系統搜尋未知的知識文章，然而，使用者無法得知查詢詞在語料庫的背後文章特徵，進一步的評估查詢詞的重要性，事實證明，使用者認為重要的查詢字在資訊檢索系統上並非是重要的查詢字。基於這樣的理由，文中試著透過一個學習機制來評估查詢字在資訊檢索中的效能，幫助使用者決定查詢詞在資訊檢索上的重要性。本文探討兩個重要主題，首先，我們發展出一個學習機制對查詢字進行效能預測，再者，透過此學習機制，自動地根據查詢詞的特徵進行查詢詞重製，以產生較精確、具有較好資訊檢索效能的查詢詞。文中考慮了三個層面的查詢詞特徵，語言特徵、統計特徵(包含語料庫內容特徵)以及其他論文提及的特徵模型，發現使用多種特徵的組合其效能較單獨的特徵來得好，以及，統計上的特徵表現得較語言的特徵來得好，但是，儘管如此，統計的特徵必須付出較高的運算成本。透過這些查詢詞特徵，我們選用了分類以及線性迴歸訓練模型預測查詢詞之效能，接著，我們利用兩種演算法來重製查詢詞，並將重製後的查詢詞實作於NTCIR4、NTCIR5以改進原有之效能。實驗驗證本文提出的系統確實能改善約8% 的平均效能，以及不論使用分類或迴歸訓練模型，其在不同的檢索模型、不同的語料庫上皆能獲得較好的資訊檢索效能。 As the broadness and convenience of web in recent years, users are used to look for the unknown knowledge through the web search. However, users didn't have enough information about statistical characteristics of corpora to estimate the effectiveness of queries and the importance of query terms. The important keyword terms users believed may not be real important keywords for retrieval system. Based on this reason, in our work, we try to measure the effectiveness of a query term to help users determine what query term is important, then employ a mechanism for information retrieval. In the light of the above, we develop a learning function to measure the impact of query terms and create concise high-quality reformulations of queries automatically by exploring these queries characteristics. The features we taken are linguistic, co-occurrence, contextual features, and other features papers sited. In general, we found that the effectiveness of combination of features performed better than features alone, and statistic features performed better than linguistic features. But, although the statistic features performed better than other features, they were not the best features due to the higher cost computing. Using the features and performance measure, we select the classification and regression training model to measure the effectiveness of query terms. Then, we reformulate queries with generation and reduction procedures and perform the reformulate query on the benchmark of NTCIR-4 and NTCIR-5. Our experiments implemented with NTCIR4 and NTCIR5 reveal that we will improve the mean average precision on average 8% better than the baseline and can be applicative on different retrieval models, different training models, and different topics.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 未授權公開取用	1.08 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。