Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蔡益坤(Yih-Kuen Tsay)
dc.contributor.authorChun-Chih Huangen
dc.contributor.author黃俊誌zh_TW
dc.date.accessioned2021-06-13T16:32:07Z-
dc.date.available2005-07-13
dc.date.copyright2005-07-13
dc.date.issued2005
dc.date.submitted2005-07-11
dc.identifier.citationBibliography
[1] Ricardo Baeza-Yates and Bertheir Ribeiro-Neto. Modern Information Retrieval.
Addison-Wesley, 1999.
[2] Eric W. Brown, James P. Callan, and W. Bruce Croft. Fast incremental indexing for
full-text information retrieval. In VLDB ’94: Proceedings of the 20th International
Conference on Very Large Data Bases, pages 192–202. Morgan Kaufmann Publishers
Inc., 1994.
[3] B. Ribeiro-Neto C. Badue, R. Baeza-Yates and N. Ziviani. Distributed query processing
using partitioned inverted files. In Proc. Symp. String Processing and Information
Retrieval, pages 10–20. IEEE CS Press, 2001.
[4] James P. Callan, W. Bruce Croft, and Stephen M. Harding. The INQUERY retrieval
system. In Proceedings of DEXA-92, 3rd International Conference on Database and
Expert Systems Applications, pages 78–83, 1992.
[5] Yu-Fang Chen. A Text Retrieval System Based on Sequence Similarity. Master’s
thesis, Department of Information Management, National Taiwan University, 2003.
[6] Doug Cutting and Jan Pedersen. Optimizations for dynamic inverted index maintenance.
In Proceedings of the 13th International ACM SIGIR Conference on Research
and Development in Information Retrieval, pages 405–411, 1990.
[7] Michael J. Flynn. Very high-speed computing systems. pages 519–527, 2000.
[8] O. Frieder, D. Grossman, A. Chowdhury, and G. Frieder. Efficiency considerations for
scalable information retrieval servers. Journal of Digital Information, 1(5), January
2000.
[9] Byeong-Soo Jeong and Edward Omiecinski. Inverted file partitioning schemes in
multiple disk systems. IEEE Trans. Parallel Distrib. Syst., 6(2):142–153, 1995.
[10] Lung-Chi Lin. A Preliminary Study of Text Retrieval Techniques Utilizing Character/
Word Positions. Master’s thesis, Department of Information Management,
National Taiwan University, 2000.
[11] Mauricio Marin. Parallel Text Query Processing using Composite Inverted Lists.
Master’s thesis, Computing Department, University of Magallanes, Chile, 2003.
[12] J. Eliot B. Moss. Design of the mneme persistent object store. ACM Trans. Inf.
Syst., 8(2):103–139, 1990.
[13] M. F. Porter. An algorithm for suffix stripping. pages 313–316, 1997.
[14] E. Rasmussen. Parallel information processing. In Annual Review of Information
Science and Technology, 27:99–130, 1992.
[15] Gerard Salton, Edward A. Fox, and Harry Wu. Extended boolean information retrieval.
Commun. ACM, 26(11):1022–1036, 1983.
[16] David B. Skillicorn, Jonathan M. D. Hill, and W. F. McColl. Questions and Answers
about BSP. Scientific Programming, 6(3):249–274, Fall 1997.
[17] Ohm Sornil. Hybrid Partition Inverted Files for Large-Scale Digital Libraries. Master’s
thesis, Virginia Polytechnic and State University(Virginia Tech), 2001.
[18] Ohm Sornil. Parallel Inverted Index for Large-Scale, Dynamic Digital Libraries.
Master’s thesis, Blacksburg, Virginia, 2001.
[19] C. Stanfill. Partitioned posting files: a parallel inverted file structure for information
retrieval. In SIGIR ’90: Proceedings of the 13th annual international ACM SIGIR
conference on Research and development in information retrieval, pages 413–428.
ACM Press, 1990.
[20] C. Stanfill, R. Thau, and D. Waltz. A parallel indexed algorithm for information
retrieval. In SIGIR ’89: Proceedings of the 12th annual international ACM SIGIR
conference on Research and development in information retrieval, pages 88–97. ACM
Press, 1989.
[21] Anthony Tomasic and Hector Garcia-Molina. Performance of inverted indices in
shared-nothing distributed text document informatioon retrieval systems. In PDIS
’93: Proceedings of the second international conference on Parallel and distributed
information systems, pages 8–17. IEEE Computer Society Press, 1993.
[22] Anthony Tomasic, Hector Garcia-Molina, and Kurt Shoens. Incremental updates of
inverted lists for text document retrieval. In SIGMOD ’94: Proceedings of the 1994
ACM SIGMOD international conference on Management of data, pages 289–300.
ACM Press, 1994.
[23] Anthony Slavko Tomasic. Distributed queries and incremental updates in information
retrieval systems. PhD thesis, 1994.
[24] Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM,
33(8):103–111, 1990.
[25] Ching-Lin Yu. Sequence-Based Text Retrieval: Design and Implementation. Master’s
thesis, Department of Information Management, National Taiwan University, 2002.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38387-
dc.description.abstractThe purpose of a text retrieval system is to locate documents from a large, textual
document collection that meet a user’s needs. The SIR system is such a system that is
based on the sequence model. As it was designed and implemented as a sequential, rather
than a parallel application, it becomes less efficient when the size of the data collection
gets larger. Another drawback of the SIR system is that the index must be rebuilt entirely
when the data collections are modified. Also, compared with other models, the query
evaluation process of the sequence model is time consuming. In this thesis, we seek to
make improvements that address these problems.
To facilitate parallel query processing, we implement three kinds of index partitioning
schemes in the system, and evalauete their load balancing characteristics. To improve the
scalability of index building, we design and implement a mechanism that allows the SIR
system to support incremental index updates. We also make other improvements such as
support of queries with homophones and support of more types of token, that make the
system more flexible.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T16:32:07Z (GMT). No. of bitstreams: 1
ntu-94-R92725027-1.pdf: 534385 bytes, checksum: 63f528085702a69811c2628f6696a88e (MD5)
Previous issue date: 2005
en
dc.description.tableofcontentsContents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Text Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Chinese IR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Effectiveness and Scalability of IR systems . . . . . . . . . . . . . 3
1.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Related Work 7
2.1 Inverted Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Parallel Processing in IR . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Incremental Index Updating . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 The Indexing and Retrieval Process of SIR . . . . . . . . . . . . . . . . . 14
2.5.1 Indexing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.2 Retrieval Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Scalability and Effectiveness of SIR 19
3.1 Retrieval Process Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Index Partitioning Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Index Updating Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Other Modifications to the SIR system . . . . . . . . . . . . . . . . . . . 26
4 The New Components of SIR 29
4.1 The Broker Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Indexing Building . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.2 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 The Coordinated Collection Management Component . . . . . . . . . . . 35
5 Experimental Results and Analysis 38
5.1 The Effectiveness of the SIR system . . . . . . . . . . . . . . . . . . . . . 38
5.1.1 Topic Containing Continuous, Related Semantic Blocks . . . . . . 39
5.1.2 Topics Containing Distant Semantic Blocks . . . . . . . . . . . . . 40
5.1.3 Adjusting the Weight of the Three Scoring . . . . . . . . . . . . . 41
5.1.4 The Effect of Multi-phase Query . . . . . . . . . . . . . . . . . . 43
5.1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 The Effect of Parallel Processing . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 The Load Balancing Analysis . . . . . . . . . . . . . . . . . . . . 46
5.2.2 The Scalability Analysis of the SIR System . . . . . . . . . . . . . 49
6 Conclusion 51
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . 54
dc.language.isozh-TW
dc.subject累加式更新zh_TW
dc.subject文件檢索zh_TW
dc.subject平行化處理zh_TW
dc.subject平行化反轉索引zh_TW
dc.subject資訊檢索zh_TW
dc.subject索引切割設計zh_TW
dc.subjectIncremental Updateen
dc.subjectText Retrievalen
dc.subjectParallel Inverted Indexen
dc.subjectInformation Retrievalen
dc.subjectIndex Partitioning Schemesen
dc.subjectParallel Processingen
dc.title改善以序列為基礎之文件檢索系統之有效性與彈性zh_TW
dc.titleImproving the Effectiveness and Scalability of a Sequence-Based Text Retrieval Systemen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee莊裕澤(Yuh-Jzer Joung),簡立峰(Lee-Feng Chien)
dc.subject.keyword累加式更新,索引切割設計,資訊檢索,平行化反轉索引,平行化處理,文件檢索,zh_TW
dc.subject.keywordIncremental Update,Index Partitioning Schemes,Information Retrieval,Parallel Inverted Index,Parallel Processing,Text Retrieval,en
dc.relation.page56
dc.rights.note有償授權
dc.date.accepted2005-07-11
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
521.86 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved