Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387| Title: | 改善以序列為基礎之文件檢索系統之有效性與彈性 Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System |
| Authors: | Chun-Chih Huang 黃俊誌 |
| Advisor: | 蔡益坤(Yih-Kuen Tsay) |
| Keyword: | 累加式更新,索引切割設計,資訊檢索,平行化反轉索引,平行化處理,文件檢索, Incremental Update,Index Partitioning Schemes,Information Retrieval,Parallel Inverted Index,Parallel Processing,Text Retrieval, |
| Publication Year : | 2005 |
| Degree: | 碩士 |
| Abstract: | The purpose of a text retrieval system is to locate documents from a large, textual
document collection that meet a user’s needs. The SIR system is such a system that is based on the sequence model. As it was designed and implemented as a sequential, rather than a parallel application, it becomes less efficient when the size of the data collection gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely when the data collections are modified. Also, compared with other models, the query evaluation process of the sequence model is time consuming. In this thesis, we seek to make improvements that address these problems. To facilitate parallel query processing, we implement three kinds of index partitioning schemes in the system, and evalauete their load balancing characteristics. To improve the scalability of index building, we design and implement a mechanism that allows the SIR system to support incremental index updates. We also make other improvements such as support of queries with homophones and support of more types of token, that make the system more flexible. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387 |
| Fulltext Rights: | 有償授權 |
| Appears in Collections: | 資訊管理學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-94-1.pdf Restricted Access | 521.86 kB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
