Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡益坤
dc.contributor.authorChun-Kai Janen
dc.contributor.author詹淳凱zh_TW
dc.date.accessioned2021-06-13T16:30:15Z-
dc.date.available2005-07-20
dc.date.copyright2005-07-20
dc.date.issued2005
dc.date.submitted2005-07-12
dc.identifier.citation[1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval.
ADDISON-WESLEY, 1999.
[2] A. BOOKSTEIN. Fuzzy requests: An approach to weighted boolean searches. J.
Am. Soc.Inf. Sci., 31(4(July)):240 247, 1981.
[3] Aitao Chen, Jianzhang He, Liangjie Xu, Fredric C. Gey, and Jason Meggs. Chinese
text retrieval without using a dictionary. In Proceedings of the 20th Annual
International ACM SIGIR Conference, pages 42–49, 1997.
[4] K.J. Chen and Wei-Yun Ma. Unknown word extraction for chinese documents.
COLING 2002, pages 169–175, 2002.
[5] Yu-Fang Chen. A Text Retrieval System Based on Sequence Similarity. Master’s
thesis, Department of Information Management, National Taiwan University, June
2003.
[6] Charles L. A. Clarke and Gordon V. Cormack. Shortest-substring retrieval and
ranking. ACM Transaction on Information Systems, pages 113–120, 2000.
[7] Edward A. Fox Gerard Salton and Harry Wu. Extended Boolean Information Retrieval.
Communications of the ACM, 26(12):1022–1036, 1983.
[8] Chun-Chih Huang. Improving the effectiveness and scalability of a sequence-based
text retrieval system. Master’s thesis, Department of Information Management,
National Taiwan University, June 2005.
[9] K.J. Chen Huang, C.R. and Li-Li Chang. Segmentation standard for chinese natural
language processing. International Journal of Computational Linguistics and
Chinese Language Processing, pages 47–67, 1997.
[10] K. L. Kwok. Comparing representations in chinese information retrieval. Research
and Development in Information Retrieval, pages 34–41, 1997.
[11] Lung-Chi Lin. A Preliminary Study of Text Retrieval Techniques Utilizing Character/
Word Positions (In Chinese). Master’s thesis, Department of Information Management,
National Taiwan University, June 2000.
[12] K. Chakrabarti K. Porkaew S. Mehrotra M. Ortega, Y. Rui and T. S. Huang. Supporting
rankedBoolean similarity queries in MARS. IEEE Trans.on Knowledge and
Data Engineering, 10, 1998.
[13] Y. Rui M. Ortega, S. Mehrotra K. Chakrabarti, and T. S. Huang. Supporting
similarity queries in MARS. Proc. ACM Conf. Multimedia, 1997.
[14] Wei-Yun Ma and K.J. Chen. A bottom-up merging algorithm for chinese unknown
word extraction. Second SIGHAN Workshop on Chinese Language Processing, pages
31–38, 2003.
[15] Jian-Yun Nie, Martin Brisebois, and Xiaobo Ren. On Chinese Text Retrieval. In
Proceedings of the 19th Annual International ACM-SIGIR Conference, pages 225–
233, 1996.
[16] Jian-Yun Nie, Jiangfeng Gao, Jian Zhang, , and Ming Zhou. On the Use ofWords and
N-grams for Chinese Information Retrieval. In Proceedings of the 5th International
Workshop Information Retrieval with Asian Languages, pages 141–148, 2000.
[17] Jian-Yun Nie and Fuji Ren. Chinese Information Retrieval: using characters or
words? Information Processing and Management, 35(4):443–462, 1999.
[18] S. E. Robertson and K. Spark Jones. Relevance Weighting of Searched Terms. Journal
of the American Society for Information Sciences, 27(3):129–146, 1976.
[19] V. V. Raghavan S. K. M. Wong, W. Ziarko and P. C. N. Wong. On extending the
vector space model for Boolean query processing. Proceedings of the ACM Conference
on Research and Development in Information Retrieval, 1986.
[20] G. Salton. Computer Evaluation of Indexing and Text Processing. Journal of the
ACM, 15(1):8–36, January 1968.
[21] G. Salton. The SMART Retrieval System — Experiments in Automatic Document
Processing. Prentice Hall Inc., Englewood Cliffs, NJ, 1971.
[22] V. TAHANI. A fuzzy model of document retrieval systems. Inf. Process. Manage.,
12(3):177 187, 1978.
[23] Ching-Lin Yu. Sequence-Based Text Retrieval : Design and Implementation. Master’s
thesis, Department of Information Management, National Taiwan University,
June 2002.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38316-
dc.description.abstractIn most text retrieval models, relevance is judged using keywords. In contrast, the sequence model judges relevance by the similarity between character sequences. The sequences suggest the importance of positional information, which can avoid the Chinese word segmentation problem when applied to Chinese text retrieval. The sequence model can satisfy users’ information needs for long natural queries about some specific terms, because the query is represented as a sequence.
This model can be enhanced by allowing Boolean queries, which can describe a user’s information needs more precisely, especially when the user is highly trained. In this study, a method based on Fuzzy Set Theory, which supports Boolean queries in the
sequence model, is proposed. In addition, two algorithms are introduced by transforming the Boolean queries into the Disjunctive Normal Form (DNF) or the Conjunctive
Normal Form (CNF). For the sake of efficiency, these algorithms are designed to obtain approximate results.
In this work, the three algorithms are incorporated into a new implementation in C/C++. This version of the system also improves the efficiency of the query process,
since efficiency is always an issue of the SIR system, an implementation of the sequence model.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T16:30:15Z (GMT). No. of bitstreams: 1
ntu-94-R92725025-1.pdf: 566347 bytes, checksum: 75ec5ee15b643c67d2991ebe520331d3 (MD5)
Previous issue date: 2005
en
dc.description.tableofcontents1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Related Work 4
2.1 Classical Retrieval Models . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Ranked Results of Boolean Queries . . . . . . . . . . . . . . . . . . . . . 5
2.3 The Sequence Model and the SIR System . . . . . . . . . . . . . . . . . . 7
2.3.1 The Sequence Model . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 The SIR System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Processing Boolean Queries 13
3.1 Boolean Operators on the SIR System . . . . . . . . . . . . . . . . . . . 13
3.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Representative Document Sequence . . . . . . . . . . . . . . . . . 16
3.2.2 Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . 20
4 System Design and Implementation 22
4.1 Supporting Boolean Queries . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 The Basic Process of Boolean Queries . . . . . . . . . . . . . . . . 22
4.1.2 Simple Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . 23
4.1.3 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 The Architecture of the SIR System . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 The System Processes . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Experimental Results and Analysis 31
5.1 Effectiveness of the Boolean queries . . . . . . . . . . . . . . . . . . . . . 31
5.2 Comparison between three algorithms of supporting Boolean queries . . . 35
5.2.1 The Effectiveness of Three Algorithms . . . . . . . . . . . . . . . 35
5.2.2 The Efficiency of Three Algorithms . . . . . . . . . . . . . . . . . 36
5.3 System Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . 38
5.3.1 The Efficiency of Adopted Sorting Algorithms . . . . . . . . . . . 39
5.3.2 System Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusion 45
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Bibliography. . . . . . . . . . . . . . . . . . .. . . . . . . . . .48
dc.language.isoen
dc.subject序列相似度zh_TW
dc.subject布林運算子zh_TW
dc.subject布林查詢zh_TW
dc.subject資訊檢索zh_TW
dc.subject文件檢索zh_TW
dc.subject序列模式zh_TW
dc.subjectBoolean Operatorsen
dc.subjectText Retrievalen
dc.subjectInformation Retrievalen
dc.subjectBoolean Queriesen
dc.subjectSequence Similarityen
dc.subjectSequence Modelen
dc.title支援布林查詢的以序列為基礎之文件檢索系統zh_TW
dc.titleSupporting Boolean Queries in a Sequence-Based Text Retrieval Systemen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee莊裕澤,簡立峰
dc.subject.keyword布林運算子,布林查詢,資訊檢索,文件檢索,序列模式,序列相似度,zh_TW
dc.subject.keywordBoolean Operators,Boolean Queries,Information Retrieval,Text Retrieval,Sequence Model,Sequence Similarity,en
dc.relation.page50
dc.rights.note有償授權
dc.date.accepted2005-07-12
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
553.07 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved