請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蔡益坤 | |
| dc.contributor.author | Chun-Kai Jan | en |
| dc.contributor.author | 詹淳凱 | zh_TW |
| dc.date.accessioned | 2021-06-13T16:30:15Z | - |
| dc.date.available | 2005-07-20 | |
| dc.date.copyright | 2005-07-20 | |
| dc.date.issued | 2005 | |
| dc.date.submitted | 2005-07-12 | |
| dc.identifier.citation | [1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval.
ADDISON-WESLEY, 1999. [2] A. BOOKSTEIN. Fuzzy requests: An approach to weighted boolean searches. J. Am. Soc.Inf. Sci., 31(4(July)):240 247, 1981. [3] Aitao Chen, Jianzhang He, Liangjie Xu, Fredric C. Gey, and Jason Meggs. Chinese text retrieval without using a dictionary. In Proceedings of the 20th Annual International ACM SIGIR Conference, pages 42–49, 1997. [4] K.J. Chen and Wei-Yun Ma. Unknown word extraction for chinese documents. COLING 2002, pages 169–175, 2002. [5] Yu-Fang Chen. A Text Retrieval System Based on Sequence Similarity. Master’s thesis, Department of Information Management, National Taiwan University, June 2003. [6] Charles L. A. Clarke and Gordon V. Cormack. Shortest-substring retrieval and ranking. ACM Transaction on Information Systems, pages 113–120, 2000. [7] Edward A. Fox Gerard Salton and Harry Wu. Extended Boolean Information Retrieval. Communications of the ACM, 26(12):1022–1036, 1983. [8] Chun-Chih Huang. Improving the effectiveness and scalability of a sequence-based text retrieval system. Master’s thesis, Department of Information Management, National Taiwan University, June 2005. [9] K.J. Chen Huang, C.R. and Li-Li Chang. Segmentation standard for chinese natural language processing. International Journal of Computational Linguistics and Chinese Language Processing, pages 47–67, 1997. [10] K. L. Kwok. Comparing representations in chinese information retrieval. Research and Development in Information Retrieval, pages 34–41, 1997. [11] Lung-Chi Lin. A Preliminary Study of Text Retrieval Techniques Utilizing Character/ Word Positions (In Chinese). Master’s thesis, Department of Information Management, National Taiwan University, June 2000. [12] K. Chakrabarti K. Porkaew S. Mehrotra M. Ortega, Y. Rui and T. S. Huang. Supporting rankedBoolean similarity queries in MARS. IEEE Trans.on Knowledge and Data Engineering, 10, 1998. [13] Y. Rui M. Ortega, S. Mehrotra K. Chakrabarti, and T. S. Huang. Supporting similarity queries in MARS. Proc. ACM Conf. Multimedia, 1997. [14] Wei-Yun Ma and K.J. Chen. A bottom-up merging algorithm for chinese unknown word extraction. Second SIGHAN Workshop on Chinese Language Processing, pages 31–38, 2003. [15] Jian-Yun Nie, Martin Brisebois, and Xiaobo Ren. On Chinese Text Retrieval. In Proceedings of the 19th Annual International ACM-SIGIR Conference, pages 225– 233, 1996. [16] Jian-Yun Nie, Jiangfeng Gao, Jian Zhang, , and Ming Zhou. On the Use ofWords and N-grams for Chinese Information Retrieval. In Proceedings of the 5th International Workshop Information Retrieval with Asian Languages, pages 141–148, 2000. [17] Jian-Yun Nie and Fuji Ren. Chinese Information Retrieval: using characters or words? Information Processing and Management, 35(4):443–462, 1999. [18] S. E. Robertson and K. Spark Jones. Relevance Weighting of Searched Terms. Journal of the American Society for Information Sciences, 27(3):129–146, 1976. [19] V. V. Raghavan S. K. M. Wong, W. Ziarko and P. C. N. Wong. On extending the vector space model for Boolean query processing. Proceedings of the ACM Conference on Research and Development in Information Retrieval, 1986. [20] G. Salton. Computer Evaluation of Indexing and Text Processing. Journal of the ACM, 15(1):8–36, January 1968. [21] G. Salton. The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs, NJ, 1971. [22] V. TAHANI. A fuzzy model of document retrieval systems. Inf. Process. Manage., 12(3):177 187, 1978. [23] Ching-Lin Yu. Sequence-Based Text Retrieval : Design and Implementation. Master’s thesis, Department of Information Management, National Taiwan University, June 2002. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316 | - |
| dc.description.abstract | In most text retrieval models, relevance is judged using keywords. In contrast, the sequence model judges relevance by the similarity between character sequences. The sequences suggest the importance of positional information, which can avoid the Chinese word segmentation problem when applied to Chinese text retrieval. The sequence model can satisfy users’ information needs for long natural queries about some specific terms, because the query is represented as a sequence.
This model can be enhanced by allowing Boolean queries, which can describe a user’s information needs more precisely, especially when the user is highly trained. In this study, a method based on Fuzzy Set Theory, which supports Boolean queries in the sequence model, is proposed. In addition, two algorithms are introduced by transforming the Boolean queries into the Disjunctive Normal Form (DNF) or the Conjunctive Normal Form (CNF). For the sake of efficiency, these algorithms are designed to obtain approximate results. In this work, the three algorithms are incorporated into a new implementation in C/C++. This version of the system also improves the efficiency of the query process, since efficiency is always an issue of the SIR system, an implementation of the sequence model. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T16:30:15Z (GMT). No. of bitstreams: 1 ntu-94-R92725025-1.pdf: 566347 bytes, checksum: 75ec5ee15b643c67d2991ebe520331d3 (MD5) Previous issue date: 2005 | en |
| dc.description.tableofcontents | 1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work 4 2.1 Classical Retrieval Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Ranked Results of Boolean Queries . . . . . . . . . . . . . . . . . . . . . 5 2.3 The Sequence Model and the SIR System . . . . . . . . . . . . . . . . . . 7 2.3.1 The Sequence Model . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 The SIR System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Processing Boolean Queries 13 3.1 Boolean Operators on the SIR System . . . . . . . . . . . . . . . . . . . 13 3.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Representative Document Sequence . . . . . . . . . . . . . . . . . 16 3.2.2 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . 20 4 System Design and Implementation 22 4.1 Supporting Boolean Queries . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.1 The Basic Process of Boolean Queries . . . . . . . . . . . . . . . . 22 4.1.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . 23 4.1.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 The Architecture of the SIR System . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 The System Processes . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Experimental Results and Analysis 31 5.1 Effectiveness of the Boolean queries . . . . . . . . . . . . . . . . . . . . . 31 5.2 Comparison between three algorithms of supporting Boolean queries . . . 35 5.2.1 The Effectiveness of Three Algorithms . . . . . . . . . . . . . . . 35 5.2.2 The Efficiency of Three Algorithms . . . . . . . . . . . . . . . . . 36 5.3 System Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . 38 5.3.1 The Efficiency of Adopted Sorting Algorithms . . . . . . . . . . . 39 5.3.2 System Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Conclusion 45 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Bibliography. . . . . . . . . . . . . . . . . . .. . . . . . . . . .48 | |
| dc.language.iso | en | |
| dc.subject | 序列相似度 | zh_TW |
| dc.subject | 布林運算子 | zh_TW |
| dc.subject | 布林查詢 | zh_TW |
| dc.subject | 資訊檢索 | zh_TW |
| dc.subject | 文件檢索 | zh_TW |
| dc.subject | 序列模式 | zh_TW |
| dc.subject | Boolean Operators | en |
| dc.subject | Text Retrieval | en |
| dc.subject | Information Retrieval | en |
| dc.subject | Boolean Queries | en |
| dc.subject | Sequence Similarity | en |
| dc.subject | Sequence Model | en |
| dc.title | 支援布林查詢的以序列為基礎之文件檢索系統 | zh_TW |
| dc.title | Supporting Boolean Queries in a Sequence-Based Text Retrieval System | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 93-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 莊裕澤,簡立峰 | |
| dc.subject.keyword | 布林運算子,布林查詢,資訊檢索,文件檢索,序列模式,序列相似度, | zh_TW |
| dc.subject.keyword | Boolean Operators,Boolean Queries,Information Retrieval,Text Retrieval,Sequence Model,Sequence Similarity, | en |
| dc.relation.page | 50 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2005-07-12 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-94-1.pdf 未授權公開取用 | 553.07 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
