請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/75116
標題: | 中文查詢問句擴展之研究 A Study on Query Expansion for Chinese Information Retrieval |
作者: | Ya-chen Chuang 莊雅蓁 |
出版年 : | 2000 |
學位: | 碩士 |
摘要: | 摘要 本研究主要探討下列三項議題:(1)自動建構之同義詞典對資訊檢索之輔助效益;(2)以何種索引典詞彙關係擴展查詢問句有最佳的效益;(3)同義詞典與索引典整合輔助的效果。由於詞彙資源取得不易,本研究以實驗檔資料庫為基礎,自動建構同義詞典,首先選定同義詞典詞彙來源之檔資料庫,使用斷詞的方式擷取詞彙,擷取檔中詞彙共現資訊計算詞彙關係,再依據詞彙相關程度聚引相關詞彙而產生詞彙群,進而組織為同義詞典。查詢問句擴展實驗,首先搜集原始查詢問句,再以不同的詞彙來源建構多組不同的查詢問句擴展模型,包括分別以同義詞典及索引典擴展以及整合兩種詞彙資源的擴展形式。查詢問句擴展檢索效益的評估由人工進行相關判斷,再依據判斷結果計算檢索結果的求準率。 研究結果顯示,以同義詞典詞彙群內詞彙數量較少的層次擴展查詢問句有較好的檢索效益,索引典各種詞彙關係擴展查詢問句的檢索結果則沒有顯著的差異,但大致以整合所有詞彙關係的擴展形式有較好的檢索效益,以同義詞典擴展再以索引典修正後的檢索效益略為提升,但再以索引典二次擴展的檢索效益反而降低。實驗亦發現自動建構的同義詞典內容受斷詞品質的優劣所影響,對查詢問句擴展的檢索效益而言,字串比對方式亦是重要的影響因素,可見查詢問句擴展的實驗除了擴展機制與形式的設計之外,還有許多影響實驗結果的因素,如斷詞、索引、字串比對、評估方法等,因此有關同義詞典建構與查詢問句擴展研究的未來發展,仍有許多複雜課題等待解決。 ABSTRACT This thesis aims at three important issues for query expansion: whether the automatic constructed synonym dictionary could enhance the retrieval effectiveness, which relationship of thesaurus has the best performance, and the effectiveness of the integration of synonym dictionary and thesaurus. In order to proceed the experiments of query expansion, we have to develop a methodology for automatic construction of synonym dictionary. Firstly, the appropriate text corpus is selected, word segmentation is carried out, and words are clustered based on co-occurrence statistics. Then, the synonym dictionary is constructed hierarchically. In the experiments of query expansion, the queries are collected from various users and each query is expanded in different models, including expanding by either synonym dictionary or thesaurus or both. Finally, performance is evaluated in precision, which is calculated according to the manual relevance judgments. The results show that query expansion using second level of synonym dictionary has better performance. Though the effects of different relationships prescribed in the thesaurus are similar, expanding by union of all relationships shows better performance. The model of first expanding query by synonym dictionary then modifying it by thesaurus has improved retrieval performance slightly, but the performance is decreased in further expansion. We also find that the correctness of word segmentation has a great impact on the quality of synonym dictionary. The mode of string mapping is another important factor, too. Therefore for the further study of synonym dictionary construction and query expansion, there are still many problems to be solved. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/75116 |
全文授權: | 未授權 |
顯示於系所單位: | 圖書資訊學系 |
文件中的檔案:
沒有與此文件相關的檔案。
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。