Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426
Title: 查詢詞在資訊檢索中之效能評估
Learning to Measure the Effectiveness of Query Terms for Information Retrieval
Authors: Yi-Chun Lin
林怡君
Advisor: 鄭卜壬(Pu-Jen Cheng)
Keyword: 查詢詞,查詢詞效能預估,查詢詞重製,
query,query performance prediction,query reformulation,
Publication Year : 2008
Degree: 碩士
Abstract: 隨著網路的普及與發展,使用者慣於透過資訊檢索系統搜尋未知的知識文章,然而,使用者無法得知查詢詞在語料庫的背後文章特徵,進一步的評估查詢詞的重要性,事實證明,使用者認為重要的查詢字在資訊檢索系統上並非是重要的查詢字。基於這樣的理由,文中試著透過一個學習機制來評估查詢字在資訊檢索中的效能,幫助使用者決定查詢詞在資訊檢索上的重要性。
本文探討兩個重要主題,首先,我們發展出一個學習機制對查詢字進行效能預測,再者,透過此學習機制,自動地根據查詢詞的特徵進行查詢詞重製,以產生較精確、具有較好資訊檢索效能的查詢詞。文中考慮了三個層面的查詢詞特徵,語言特徵、統計特徵(包含語料庫內容特徵)以及其他論文提及的特徵模型,發現使用多種特徵的組合其效能較單獨的特徵來得好,以及,統計上的特徵表現得較語言的特徵來得好,但是,儘管如此,統計的特徵必須付出較高的運算成本。透過這些查詢詞特徵,我們選用了分類以及線性迴歸訓練模型預測查詢詞之效能,接著,我們利用兩種演算法來重製查詢詞,並將重製後的查詢詞實作於NTCIR4、NTCIR5以改進原有之效能。
實驗驗證本文提出的系統確實能改善約8% 的平均效能,以及不論使用分類或迴歸訓練模型,其在不同的檢索模型、不同的語料庫上皆能獲得較好的資訊檢索效能。
As the broadness and convenience of web in recent years, users are used to look for the unknown knowledge through the web search. However, users didn't have enough information about statistical characteristics of corpora to estimate the effectiveness of queries and the importance of query terms. The important keyword terms users believed may not be real important keywords for retrieval system. Based on this reason, in our work, we try to measure the effectiveness of a query term to help users determine what query term is important, then employ a mechanism for information retrieval.
In the light of the above, we develop a learning function to measure the impact of query terms and create concise high-quality reformulations of queries automatically by exploring these queries characteristics. The features we taken are linguistic, co-occurrence, contextual features, and other features papers sited. In general, we found that the effectiveness of combination of features performed better than features alone, and statistic features performed better than linguistic features. But, although the statistic features performed better than other features, they were not the best features due to the higher cost computing. Using the features and performance measure, we select the classification and regression training model to measure the effectiveness of query terms. Then, we reformulate queries with generation and reduction procedures and perform the reformulate query on the benchmark of NTCIR-4 and NTCIR-5.
Our experiments implemented with NTCIR4 and NTCIR5 reveal that we will improve the mean average precision on average 8% better than the baseline and can be applicative on different retrieval models, different training models, and different topics.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/37426
Fulltext Rights: 有償授權
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-97-1.pdf
  Restricted Access
1.08 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved