支援布林查詢的以序列為基礎之文件檢索系統

Chun-Kai Jan; 詹淳凱

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蔡益坤
dc.contributor.author	Chun-Kai Jan	en
dc.contributor.author	詹淳凱	zh_TW
dc.date.accessioned	2021-06-13T16:30:15Z	-
dc.date.available	2005-07-20
dc.date.copyright	2005-07-20
dc.date.issued	2005
dc.date.submitted	2005-07-12
dc.identifier.citation	[1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. ADDISON-WESLEY, 1999. [2] A. BOOKSTEIN. Fuzzy requests: An approach to weighted boolean searches. J. Am. Soc.Inf. Sci., 31(4(July)):240 247, 1981. [3] Aitao Chen, Jianzhang He, Liangjie Xu, Fredric C. Gey, and Jason Meggs. Chinese text retrieval without using a dictionary. In Proceedings of the 20th Annual International ACM SIGIR Conference, pages 42–49, 1997. [4] K.J. Chen and Wei-Yun Ma. Unknown word extraction for chinese documents. COLING 2002, pages 169–175, 2002. [5] Yu-Fang Chen. A Text Retrieval System Based on Sequence Similarity. Master’s thesis, Department of Information Management, National Taiwan University, June 2003. [6] Charles L. A. Clarke and Gordon V. Cormack. Shortest-substring retrieval and ranking. ACM Transaction on Information Systems, pages 113–120, 2000. [7] Edward A. Fox Gerard Salton and Harry Wu. Extended Boolean Information Retrieval. Communications of the ACM, 26(12):1022–1036, 1983. [8] Chun-Chih Huang. Improving the effectiveness and scalability of a sequence-based text retrieval system. Master’s thesis, Department of Information Management, National Taiwan University, June 2005. [9] K.J. Chen Huang, C.R. and Li-Li Chang. Segmentation standard for chinese natural language processing. International Journal of Computational Linguistics and Chinese Language Processing, pages 47–67, 1997. [10] K. L. Kwok. Comparing representations in chinese information retrieval. Research and Development in Information Retrieval, pages 34–41, 1997. [11] Lung-Chi Lin. A Preliminary Study of Text Retrieval Techniques Utilizing Character/ Word Positions (In Chinese). Master’s thesis, Department of Information Management, National Taiwan University, June 2000. [12] K. Chakrabarti K. Porkaew S. Mehrotra M. Ortega, Y. Rui and T. S. Huang. Supporting rankedBoolean similarity queries in MARS. IEEE Trans.on Knowledge and Data Engineering, 10, 1998. [13] Y. Rui M. Ortega, S. Mehrotra K. Chakrabarti, and T. S. Huang. Supporting similarity queries in MARS. Proc. ACM Conf. Multimedia, 1997. [14] Wei-Yun Ma and K.J. Chen. A bottom-up merging algorithm for chinese unknown word extraction. Second SIGHAN Workshop on Chinese Language Processing, pages 31–38, 2003. [15] Jian-Yun Nie, Martin Brisebois, and Xiaobo Ren. On Chinese Text Retrieval. In Proceedings of the 19th Annual International ACM-SIGIR Conference, pages 225– 233, 1996. [16] Jian-Yun Nie, Jiangfeng Gao, Jian Zhang, , and Ming Zhou. On the Use ofWords and N-grams for Chinese Information Retrieval. In Proceedings of the 5th International Workshop Information Retrieval with Asian Languages, pages 141–148, 2000. [17] Jian-Yun Nie and Fuji Ren. Chinese Information Retrieval: using characters or words? Information Processing and Management, 35(4):443–462, 1999. [18] S. E. Robertson and K. Spark Jones. Relevance Weighting of Searched Terms. Journal of the American Society for Information Sciences, 27(3):129–146, 1976. [19] V. V. Raghavan S. K. M. Wong, W. Ziarko and P. C. N. Wong. On extending the vector space model for Boolean query processing. Proceedings of the ACM Conference on Research and Development in Information Retrieval, 1986. [20] G. Salton. Computer Evaluation of Indexing and Text Processing. Journal of the ACM, 15(1):8–36, January 1968. [21] G. Salton. The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs, NJ, 1971. [22] V. TAHANI. A fuzzy model of document retrieval systems. Inf. Process. Manage., 12(3):177 187, 1978. [23] Ching-Lin Yu. Sequence-Based Text Retrieval : Design and Implementation. Master’s thesis, Department of Information Management, National Taiwan University, June 2002.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316	-
dc.description.abstract	In most text retrieval models, relevance is judged using keywords. In contrast, the sequence model judges relevance by the similarity between character sequences. The sequences suggest the importance of positional information, which can avoid the Chinese word segmentation problem when applied to Chinese text retrieval. The sequence model can satisfy users’ information needs for long natural queries about some specific terms, because the query is represented as a sequence. This model can be enhanced by allowing Boolean queries, which can describe a user’s information needs more precisely, especially when the user is highly trained. In this study, a method based on Fuzzy Set Theory, which supports Boolean queries in the sequence model, is proposed. In addition, two algorithms are introduced by transforming the Boolean queries into the Disjunctive Normal Form (DNF) or the Conjunctive Normal Form (CNF). For the sake of efficiency, these algorithms are designed to obtain approximate results. In this work, the three algorithms are incorporated into a new implementation in C/C++. This version of the system also improves the efficiency of the query process, since efficiency is always an issue of the SIR system, an implementation of the sequence model.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T16:30:15Z (GMT). No. of bitstreams: 1 ntu-94-R92725025-1.pdf: 566347 bytes, checksum: 75ec5ee15b643c67d2991ebe520331d3 (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work 4 2.1 Classical Retrieval Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Ranked Results of Boolean Queries . . . . . . . . . . . . . . . . . . . . . 5 2.3 The Sequence Model and the SIR System . . . . . . . . . . . . . . . . . . 7 2.3.1 The Sequence Model . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 The SIR System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Processing Boolean Queries 13 3.1 Boolean Operators on the SIR System . . . . . . . . . . . . . . . . . . . 13 3.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Representative Document Sequence . . . . . . . . . . . . . . . . . 16 3.2.2 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . 20 4 System Design and Implementation 22 4.1 Supporting Boolean Queries . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 The Basic Process of Boolean Queries . . . . . . . . . . . . . . . . 22 4.1.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . 23 4.1.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 The Architecture of the SIR System . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 The System Processes . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Experimental Results and Analysis 31 5.1 Effectiveness of the Boolean queries . . . . . . . . . . . . . . . . . . . . . 31 5.2 Comparison between three algorithms of supporting Boolean queries . . . 35 5.2.1 The Effectiveness of Three Algorithms . . . . . . . . . . . . . . . 35 5.2.2 The Efficiency of Three Algorithms . . . . . . . . . . . . . . . . . 36 5.3 System Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . 38 5.3.1 The Efficiency of Adopted Sorting Algorithms . . . . . . . . . . . 39 5.3.2 System Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Conclusion 45 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Bibliography. . . . . . . . . . . . . . . . . . .. . . . . . . . . .48
dc.language.iso	en
dc.subject	序列相似度	zh_TW
dc.subject	布林運算子	zh_TW
dc.subject	布林查詢	zh_TW
dc.subject	資訊檢索	zh_TW
dc.subject	文件檢索	zh_TW
dc.subject	序列模式	zh_TW
dc.subject	Boolean Operators	en
dc.subject	Text Retrieval	en
dc.subject	Information Retrieval	en
dc.subject	Boolean Queries	en
dc.subject	Sequence Similarity	en
dc.subject	Sequence Model	en
dc.title	支援布林查詢的以序列為基礎之文件檢索系統	zh_TW
dc.title	Supporting Boolean Queries in a Sequence-Based Text Retrieval System	en
dc.type	Thesis
dc.date.schoolyear	93-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	莊裕澤,簡立峰
dc.subject.keyword	布林運算子,布林查詢,資訊檢索,文件檢索,序列模式,序列相似度,	zh_TW
dc.subject.keyword	Boolean Operators,Boolean Queries,Information Retrieval,Text Retrieval,Sequence Model,Sequence Similarity,	en
dc.relation.page	50
dc.rights.note	有償授權
dc.date.accepted	2005-07-12
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	553.07 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。