請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 蔡益坤(Yih-Kuen Tsay) | |
dc.contributor.author | Chun-Chih Huang | en |
dc.contributor.author | 黃俊誌 | zh_TW |
dc.date.accessioned | 2021-06-13T16:32:07Z | - |
dc.date.available | 2005-07-13 | |
dc.date.copyright | 2005-07-13 | |
dc.date.issued | 2005 | |
dc.date.submitted | 2005-07-11 | |
dc.identifier.citation | Bibliography
[1] Ricardo Baeza-Yates and Bertheir Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. [2] Eric W. Brown, James P. Callan, and W. Bruce Croft. Fast incremental indexing for full-text information retrieval. In VLDB ’94: Proceedings of the 20th International Conference on Very Large Data Bases, pages 192–202. Morgan Kaufmann Publishers Inc., 1994. [3] B. Ribeiro-Neto C. Badue, R. Baeza-Yates and N. Ziviani. Distributed query processing using partitioned inverted files. In Proc. Symp. String Processing and Information Retrieval, pages 10–20. IEEE CS Press, 2001. [4] James P. Callan, W. Bruce Croft, and Stephen M. Harding. The INQUERY retrieval system. In Proceedings of DEXA-92, 3rd International Conference on Database and Expert Systems Applications, pages 78–83, 1992. [5] Yu-Fang Chen. A Text Retrieval System Based on Sequence Similarity. Master’s thesis, Department of Information Management, National Taiwan University, 2003. [6] Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance. In Proceedings of the 13th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 405–411, 1990. [7] Michael J. Flynn. Very high-speed computing systems. pages 519–527, 2000. [8] O. Frieder, D. Grossman, A. Chowdhury, and G. Frieder. Efficiency considerations for scalable information retrieval servers. Journal of Digital Information, 1(5), January 2000. [9] Byeong-Soo Jeong and Edward Omiecinski. Inverted file partitioning schemes in multiple disk systems. IEEE Trans. Parallel Distrib. Syst., 6(2):142–153, 1995. [10] Lung-Chi Lin. A Preliminary Study of Text Retrieval Techniques Utilizing Character/ Word Positions. Master’s thesis, Department of Information Management, National Taiwan University, 2000. [11] Mauricio Marin. Parallel Text Query Processing using Composite Inverted Lists. Master’s thesis, Computing Department, University of Magallanes, Chile, 2003. [12] J. Eliot B. Moss. Design of the mneme persistent object store. ACM Trans. Inf. Syst., 8(2):103–139, 1990. [13] M. F. Porter. An algorithm for suffix stripping. pages 313–316, 1997. [14] E. Rasmussen. Parallel information processing. In Annual Review of Information Science and Technology, 27:99–130, 1992. [15] Gerard Salton, Edward A. Fox, and Harry Wu. Extended boolean information retrieval. Commun. ACM, 26(11):1022–1036, 1983. [16] David B. Skillicorn, Jonathan M. D. Hill, and W. F. McColl. Questions and Answers about BSP. Scientific Programming, 6(3):249–274, Fall 1997. [17] Ohm Sornil. Hybrid Partition Inverted Files for Large-Scale Digital Libraries. Master’s thesis, Virginia Polytechnic and State University(Virginia Tech), 2001. [18] Ohm Sornil. Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries. Master’s thesis, Blacksburg, Virginia, 2001. [19] C. Stanfill. Partitioned posting files: a parallel inverted file structure for information retrieval. In SIGIR ’90: Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 413–428. ACM Press, 1990. [20] C. Stanfill, R. Thau, and D. Waltz. A parallel indexed algorithm for information retrieval. In SIGIR ’89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, pages 88–97. ACM Press, 1989. [21] Anthony Tomasic and Hector Garcia-Molina. Performance of inverted indices in shared-nothing distributed text document informatioon retrieval systems. In PDIS ’93: Proceedings of the second international conference on Parallel and distributed information systems, pages 8–17. IEEE Computer Society Press, 1993. [22] Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of inverted lists for text document retrieval. In SIGMOD ’94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pages 289–300. ACM Press, 1994. [23] Anthony Slavko Tomasic. Distributed queries and incremental updates in information retrieval systems. PhD thesis, 1994. [24] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, 1990. [25] Ching-Lin Yu. Sequence-Based Text Retrieval: Design and Implementation. Master’s thesis, Department of Information Management, National Taiwan University, 2002. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387 | - |
dc.description.abstract | The purpose of a text retrieval system is to locate documents from a large, textual
document collection that meet a user’s needs. The SIR system is such a system that is based on the sequence model. As it was designed and implemented as a sequential, rather than a parallel application, it becomes less efficient when the size of the data collection gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely when the data collections are modified. Also, compared with other models, the query evaluation process of the sequence model is time consuming. In this thesis, we seek to make improvements that address these problems. To facilitate parallel query processing, we implement three kinds of index partitioning schemes in the system, and evalauete their load balancing characteristics. To improve the scalability of index building, we design and implement a mechanism that allows the SIR system to support incremental index updates. We also make other improvements such as support of queries with homophones and support of more types of token, that make the system more flexible. | en |
dc.description.provenance | Made available in DSpace on 2021-06-13T16:32:07Z (GMT). No. of bitstreams: 1 ntu-94-R92725027-1.pdf: 534385 bytes, checksum: 63f528085702a69811c2628f6696a88e (MD5) Previous issue date: 2005 | en |
dc.description.tableofcontents | Contents
1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Text Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Chinese IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 Effectiveness and Scalability of IR systems . . . . . . . . . . . . . 3 1.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 7 2.1 Inverted Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Parallel Processing in IR . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Incremental Index Updating . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 The Indexing and Retrieval Process of SIR . . . . . . . . . . . . . . . . . 14 2.5.1 Indexing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.2 Retrieval Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Scalability and Effectiveness of SIR 19 3.1 Retrieval Process Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Index Partitioning Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Index Updating Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Other Modifications to the SIR system . . . . . . . . . . . . . . . . . . . 26 4 The New Components of SIR 29 4.1 The Broker Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 Indexing Building . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.1.2 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2 The Coordinated Collection Management Component . . . . . . . . . . . 35 5 Experimental Results and Analysis 38 5.1 The Effectiveness of the SIR system . . . . . . . . . . . . . . . . . . . . . 38 5.1.1 Topic Containing Continuous, Related Semantic Blocks . . . . . . 39 5.1.2 Topics Containing Distant Semantic Blocks . . . . . . . . . . . . . 40 5.1.3 Adjusting the Weight of the Three Scoring . . . . . . . . . . . . . 41 5.1.4 The Effect of Multi-phase Query . . . . . . . . . . . . . . . . . . 43 5.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 The Effect of Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.1 The Load Balancing Analysis . . . . . . . . . . . . . . . . . . . . 46 5.2.2 The Scalability Analysis of the SIR System . . . . . . . . . . . . . 49 6 Conclusion 51 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . 54 | |
dc.language.iso | zh-TW | |
dc.title | 改善以序列為基礎之文件檢索系統之有效性與彈性 | zh_TW |
dc.title | Improving the Effectiveness and Scalability of a Sequence-Based Text Retrieval System | en |
dc.type | Thesis | |
dc.date.schoolyear | 93-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 莊裕澤(Yuh-Jzer Joung),簡立峰(Lee-Feng Chien) | |
dc.subject.keyword | 累加式更新,索引切割設計,資訊檢索,平行化反轉索引,平行化處理,文件檢索, | zh_TW |
dc.subject.keyword | Incremental Update,Index Partitioning Schemes,Information Retrieval,Parallel Inverted Index,Parallel Processing,Text Retrieval, | en |
dc.relation.page | 56 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2005-07-11 | |
dc.contributor.author-college | 管理學院 | zh_TW |
dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-94-1.pdf 目前未授權公開取用 | 521.86 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。