點對點網路下複合檢索系統之設計與實作

Li-Wei Yang; 楊立偉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25692

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊裕澤(Yuh-Jzer Joung)
dc.contributor.author	Li-Wei Yang	en
dc.contributor.author	楊立偉	zh_TW
dc.date.accessioned	2021-06-08T06:25:06Z	-
dc.date.copyright	2006-07-31
dc.date.issued	2006
dc.date.submitted	2006-07-28
dc.identifier.citation	[1] Karl Aberer. P-Grid: A self-organizing access structure for P2P information systems. In Proceedings of the Ninth International Conference on Cooperative Information Systems (CoopIS 2001), volume 2172 of Lecture Notes in Computer Science, pages 179.194. Springer-Verlag, 2001. [2] Jyoti Ahuja, Jun-Hong Cui, Li Lao, and Shigang Chen. Pson: A scalable peer-to-peer .le sharing system supporting complex queries. In SIGCOMM Posters, 2005. [3] Artur Andrzejak and Zhichen Xu. Scalable, efficient range queries for grid information services. In Proceedings of Second International Conference on Peer-to-Peer Computing (P2P 2002), pages 33.40. IEEE Computer Society, September 2002. [4] James Aspnes, Jonathan Kirsch, and Arvind Krishnamurthy. Load balancing and locality in range-queriable data structures. In Proceedings of the twenty-third Annual ACM Symposium on Principles of Distributed Computing (PODC2004), pages 115.124. ACM Press, July 2004. [5] Baruch Awerbuch and Christian Scheideler. Peer-to-peer systems for prefix search. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (PODC 2003), pages 123.132. ACM Press, July 2003. [6] Ashwin R. Bharambe, Mukesh Agrawal, and Srinivasan Seshan. Mercury: Supporting scalable multi-attribute range queries. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2004), pages 353.366. ACM Press, August 2004. [7] Bobby Bhattacharjee, Sudarshan Chawathe, Vijay Gopalakrishnan, Pete Keleher, and Bujor Silaghi. Efficient peer-to-peer searches using result-caching. In Proceedings of the Second International Workshop on Peer-to-Peer Systems (IPTPS 2003), volume 2735 of Lecture Notes in Computer Science, pages 225. 236. Springer-Verlag, 2003. [8] Burton H. Bloom. Space/time trade-o.s in hash coding with allowable errors. Communications of the ACM, 13(7):422.426, 1970. [9] Hailong Cai and JunWang. Foreseer: a novel, locality-aware peer-to-peer system architecture for keyword searches. In Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware, volume 3231 of Lecture Notes in Computer Science, pages 38.58, New York, NY, USA, 2004. Springer-Verlag. [10] Min Cai, Martin Frank, Jinbo Chen, and Pedro Szekely. MAAN: A multi-attribute addressable network for grid information services. In Proceedings of the Fourth International Workshop on Grid Computing(GRID .03), pages 184. 191, Washington, DC, USA, 2003. IEEE Computer Society. [11] Yatin Chawathe, Sylvia Ratnasamy, Lee Breslau, Nick Lanham, and Scott Shenker. Making Gnutella-like P2P systems scalable. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2003), pages 407.418. ACM Press, August 2003. [12] Kacky Chu, Kevin Labonte, and Brian Neil Levine. Availability and locality measurements of peer-to-peer .le systems. In Proceedings of SPIE (SPIE 2002), volume 4868. The International Society for Optical Engineering, August 2002. [13] Adina Crainiceanu, Prakash Linga, Johannes Gehrke, and Jayavel Shanmugasundaram. Querying peer-to-peer networks using p-trees. In WebDB, pages 25.30, 2004. [14] Arturo Crespo and Hector Garcia-Molina. Routing indices for peer-to-peer systems. In Proceedings of the Twenty-Second International Conference on Distributed Computing Systems (ICDCS 2002), pages 23.32. IEEE Computer Society, July 2002. [15] Antonios Daskos, Shahram Ghandeharizadeh, and Xinghua An. PePeR: A distributed range addressing space for peer-to-peer systems. In Karl Aberer, Vana Kalogeraki, and Manolis Koubarakis, editors, Proceedings of the First International Workshop on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P 2003), volume 2944 of Lecture Notes in Computer Science, pages 200.218. Springer, 2004. [16] Anwitaman Datta, Manfred Hauswirth, Renault John, Roman Schmidt, and Karl Aberer. Range queries in trie-structured overlays. In Proceedings of The Fifth IEEE International Conference on Peer-to-Peer Computing (P2P 2005). IEEE Computer Society, 2005. [17] Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391.407, 1990. [18] George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30(11):964.971, 1987. [19] Prasanna Ganesan, Qixiang Sun, and Hector Garcia-Molina. Adlib: A self-tuning index for dynamic peer-to-peer systems. In Proceedings of the 21st International Conference on Data Engineering (ICDE.05), pages 256.257, Washington, DC, USA, 2005. IEEE Computer Society. [20] Susan Gauch and Jianying Wang. A corpus analysis approach for automatic query expansion. In CIKM, pages 278.284, 1997. [21] Omprakash D. Gnawali. A keyword-set search system for peer-to-peer networks. Master.s thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States, June 2002. [22] Gnutella. http://www.gnutella.com. [23] Gaston H. Gonnet, Ricardo A. Baeza-Yates, and Tim Snider. New indices for text: Pat trees and pat arrays. Information retrieval: data structures and algorithms, pages 66.82, 1992. [24] Abhishek Gupta, Divyakant Agrawal, and Amr El Abbadi. Approximate Range Selection Queries in Peer-to-Peer Systems. In Proceedings of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), 2003. [25] Matthew Harren, Joseph M. Hellerstein, Ryan Huebsch, Boon T. Loo, Scott Shenker, and Ion Stoica. Complex queries in DHT-based peer-to-peer networks. In Proceedings of the First International Workshop on Peer-to-Peer Systems (IPTPS 2002), volume 2429 of Lecture Notes in Computer Science, pages 242. 259. Springer-Verlag, 2002. [26] Sta Jean-David. Automatic acquisition of terminological relations from a corpus for query expansion. In SIGIR .98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 24-28 1998, Melbourne, Australia, pages 371.372. ACM, 1998. [27] S. Lennart Johnsson and Chien-Tien Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9):1249.1268, September 1989. [28] Vana Kalogeraki, Dimitrios Gunopulos, and D. Zeinalipour-Yazti. A local search mechanism for peer-to-peer networks. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM 2002), pages 300.307. ACM Press, November 2002. [29] David R. Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing (STOC 1997), pages 654.663. ACM Press, May 1997. [30] Stefan Klink, Armin Hust, Markus Junker, and Andreas Dengel. Improving document retrieval by automatic query expansion using collaborative learning of term-based concepts. In Document Analysis Systems, pages 376.387, 2002. [31] Prakash Linga, Adina Crainiceanu, Johannes Gehrke, and Jayavel Shanmugasundaram. Guaranteeing correctness and availability in p2p range indices. In SIGMOD Conference, pages 323.334, 2005. [32] Bin Liu, Wang-Chien Lee, and Dik Lun Lee. Supporting complex multi-dimensional queries in p2p systems. In ICDCS, pages 155.164, 2005. [33] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and replication in unstructured peer-to-peer networks. In Proceedings of the 2002 International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2002), pages 258.259. ACM Press, June 2002. [34] Mark Magennis and C. J. van Rijsbergen. The potential and actual effectiveness of interactive query expansion. In SIGIR, pages 324.332, 1997. [35] Mandar Mitra, Amit Singhal, and Chris Buckley. Improving automatic query expansion. In SIGIR, pages 206.214, 1998. [36] Kiyohide Nakauchi, Yuichi Ishikawa, Hiroyuki Morikawa, and Tomonori Aoyama. Peer-to-peer keyword search using keyword relationship. In Proceedings of the Third International Workshop on Global and P2P Computing (GP2PC 2003), pages 359.366. IEEE Computer Society, May 2003. [37] Napster. http://www.napster.com. [38] Wolfgang Nejdl, Martin Wolpers, Wolf Siberski, Christoph Schmitz, Mario Schlosser, Ingo Brunkhorst, and Alexander Loser. Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks. In Proceedings of the Twelfth International Conference on World Wide Web (WWW 2003), pages 536.543. ACM Press, May 2003. [39] Sharman Networks. KaZaA. Retrieved July 27, 2004 from http://www.kazaa.com. [40] Sriram Ramabhadran, Sylvia Ratnasamy, Joseph M. Hellerstein, and Scott Shenker. Brief announcement: Prefix hash tree. In Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing (PODC04), pages 368.368, New York, NY, USA, 2004. ACM Press. [41] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker. A scalable content-addressable network. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2001), pages 161.172. ACM Pres, August 2001. [42] Patrick Reynolds and Amin Vahdat. Efficient peer-to-peer keyword searching. In Proceedings of the 2003 ACM/IFIP/USENIX International Middleware Conference (Middleware 2003), volume 2672 of Lecture Notes in Computer Science, pages 21.40. Springer-Verlag, 2003. [43] Matei Ripeanu. Peer-to-peer architecture case study: Gnutella network. In Proceedings of the First International Conference on Peer-to-peer Computing (P2P 2001), pages 99.100. IEEE Computer Society, August 2001. [44] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Proceedings of the 2001 IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001), volume 2218 of Lecture Notes in Computer Science, pages 329.350. Springer-Verlag, 2001. [45] Ozgur D. Sahin, Abhishek Gupta, Divyakant Agrawal, and Amr El Abbadi. A peer-to-peer framework for caching range queries. In Proceedings of the 20th International Conference on Data Engineering (ICDE 2004), pages 165.176. IEEE Computer Society, 2004. [46] Gerard Salton. Automatic Text Processing. The Transformation, Analysis and Retrieval of Information by Computer. Reading, MA: Addison-Wesley, 1988. [47] Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble. A measurement study of peer-to-peer .le sharing systems. In Proceedings of the 2002 Multimedia Computing and Networking (MMCN 2002). The International Society of Optical Engineering, January 2002. [48] Mario Schlosser, Michael Sintek, Stefan Decker, and Wolfgang Nejdl. HyperCuP: Hypercubes, ontologies and efficient search on p2p networks. In Proceedings of the 2002 International Workshop on Agents and Peer-to-Peer Computing (AP2PC 2002), volume 2530 of Lecture Notes in Computer Science, pages 112. 124. Springer-Verlag, 2003. [49] Cristina Schmidt andManish Parashar. Enabling .exible queries with guarantees in P2P systems. IEEE Internet Computing, 8(3):19.26, 2004. [50] SETI@home: Search for Extraterrestrial Intelligence at home. http://setiathome.ssl.berkeley.edu. [51] Shuming Shi, Guangwen Yang, Dingxing Wang, Jin Yu, Shaogang Qu, and Ming Chen. Making peer-to-peer keyword searching feasible using multi-level partitioning. In IPTPS, pages 151.161, 2004. [52] Kunwadee Sripanidkulchai, Bruce, and Hui Zhang. Efficient content location using interest-based locality in peer-to-peer systems. In Proceedings of Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM 2003), volume 3, pages 2166.2176, March/April 2003. [53] Ion Stoica, Robert Morris, David R. Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2001), pages 149.160. ACM Press, August 2001. [54] Chunqiang Tang and Sandhya Dwarkadas. Hybrid global-local indexing for efficient peer-to-peer information retrieval. In Proceedings of the First Symposium on Networked Systems Design and Implementation (NSDI 2004), pages 211.224. USENIX, 2004. [55] Chunqiang Tang, Zhichen Xu, and Sandhya Dwarkadas. Peer-to-peer information retrieval using self-organizing semantic overlay networks. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2003), pages 175.186. ACM Press, August 2003. [56] Skype Free Internet telephony that just works. http://www.skype.com/. [57] The Gnutella2 Developer Network. http://www.gnutella2.com. [58] Dimitrios Tsoumakos and Nick Roussopoulos. Adaptive probabilistic search for peer-to-peer networks. In Peer-to-Peer Computing, pages 102.109, 2003. [59] Ian H. Witten, Alistair Mo.at, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images, Second Edition. Morgan Kaufmann, 1999. [60] Beverly Yang and Hector Garcia-Molina. Improving search in peer-to-peer systems. In Proceedings of the Twenty-Second International Conference on Distributed Computing Systems (ICDCS 2002), pages 5.14. IEEE Computer Society, July 2002. [61] Beverly Yang and Hector Garcia-Molina. Designing a super-peer network. In Proceedings of the Ninteenth International Conference on Data Engineering (ICDE2003). IEEE Computer Society, March 2003. [62] Ben Y. Zhao, John Kubiatowicz, and Anthony D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, University of California, Berkeley, April 2001. [63] Feng Zhou, Li Zhuang, Ben Y. Zhao, Ling Huang, Anthony D. Joseph, and John Kubiatowics. Approximate object location and spam .ltering on peer-to-peer systems. In Proceedings of the 2003 ACM/IFIP/USENIX International Middleware Conference (Middleware 2003), volume 2672 of Lecture Notes in Computer Science, pages 1.20. Springer-Verlag, 2003. [64] Hai Zhuge, Xiaoping Sun, Jie Liu, Erlin Yao, and Xue Chen. A scalable p2p platform for the knowledge grid. IEEE Trans. Knowl. Data Eng., 17(12):1721. 1736, 2005.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25692	-
dc.description.abstract	本研究提出了一種全新的索引與搜尋方案，稱之為KISS（以關鍵符記為基礎的索引與搜尋方案；keytoken-based index and search scheme）。這是將物件以其關鍵詞之字元與位置資訊來索引在超立方空間的一種方法。透過這個索引與搜尋方案，可以完整實現在點對點網路下各種複合檢索的能力，包含了關鍵詞搜尋、前置搜尋、通用字元搜尋、以及多維度搜尋等。為了避免傳統透過倒置列表法所帶來的問題，這個索引與搜尋方案使用超立方空間做為基礎，並將物件以其所屬的關鍵詞集合來做為索引的依據。因此，這個索引與搜尋方案具有以下特點：　　一、透過單一的查找操作，就能有效率地找到任何物件（就好比在DHT網路下的鍵值搜尋一樣有效率）；若要插入、刪除、或維護物件時，也都可在僅一個查找操作內完成，而不會跟該物件的關鍵詞集大小有關係。　　二、單一關鍵詞的索引資訊是由一群節點所負責：越常出現的關鍵詞，會由越多的節點來負責。因此由索引與搜尋所造成的工作負荷可以平均地分散在各節點上。　　三、每個查詢關鍵詞組均可以得到一個子超立方空間，並進而得到一個二元展開樹；而遍歷該二元展開樹，就可以搜尋到所有符合查詢的物件。其中，查詢結果可以互動、累進的方式來呈現。此外，導引式深度優先搜尋（Guided Depth-Frist Search，GDFS）及快取等技巧，能夠提升搜尋效率；而利用關鍵符記的連續性或對稱性來修剪二元展開樹，則可以縮小搜尋範圍。　　四、基於該二元展開樹的遍歷方式，能夠提供富有彈性的定序機制。例如，查詢結果可以依照物件的明確程度、關鍵詞長度、字典順序、或是相似度來進行排序，而不需經過全域統計的方式來完成。　　五、當把每個屬性視為一個子超立方空間時，則多個屬性可以構成一個較大的超立方空間，也因此多維度搜尋可以在這個較大的超立方空間上被實現，不論該多維度搜尋是針對單一屬性或是多個屬性的查詢。本索引與搜尋方案在進行多維度搜尋時，是一次同時考慮到多個屬性，因此當有越多屬性被指定時，搜尋會變得越有效率；而不像其它傳統方法是一次處理一個屬性，然後再將各屬性處理後的結果做合併。此外，透過將關鍵符記進行合併的技巧，可以調整該超立方空間的維度，端視各種上層應用、或底層網路之需要。　　總括來說，KISS這個索引與搜尋方案提供了一種簡單、有效、統一的方式，來實現點對點網路下各種複合檢索的能力，並且解決了各種既有的問題，如增加的索引成本、頻寬額外負擔、不平衡的負荷、網路熱點與故障點等。而這個方案也支援了富有彈性的定序機制，卻不需要任何的全域資訊。　　本研究以Java程式語言建立了一個KISS的雛形系統，並在該雛形系統上進行一連串的實驗，以評估本方案的負荷平衡程度與搜尋效率。本研究使用了兩組真實資料來進行實驗，第一組為網站分類目錄資料庫，共有131,180筆資料；第二組為網際網路上最大的音樂專輯資料庫，共有2,412,613筆資料。實驗結果證明之前所提到的特點都能夠被具體實現。而本研究也提供了關於本方案複雜性與維度的數學分析。	zh_TW
dc.description.abstract	In this research work, KISS (Keytoken-based index and search scheme) is introduced with a novel technique to extract keytokens - character-position information - from keywords to index objects over a hypercube. With KISS, complex search capabilities are fully provided in P2P networks, including keyword search, prefix search, wildcard search, and multi-dimensional search. To address the problems of other existing approaches mainly based on inverted index, KISS uses hypercube to index objects according to their keyword set. As a result, the following features are provided: 1. An object can be efficiently and deterministically located by one lookup operation such as key search in DHT-based overlay. Object insertion, deletion, and maintenance also take one lookup only, regardless of the keyword set size of the object. 2. The index entries of a single keyword are handled by a set of nodes: the more popular a keyword is, the higher the number of nodes responsible for the keyword. So the index/search load of nodes can be balanced. 3. Searching the matching objects corresponds to traversing the spanning binomial tree of a subhypercube obtained from the queried keyword set. Every matched object can be retrieved interactively and cumulatively by exploring the tree. Techniques such as Guided Depth-Frist Search (GDFS) and caching can boost the search efficiency; while tree pruning by the continuous and/or symmetric property of keytoken can reduce the search space. 4. Flexible ranking is provided based on how the spanning binomial tree is traversed. For example, ranking by object specificness, keyword length, lexicographical order, or similarity, can be facilitated without global statistics. 5. Multi-dimensional search is supported by constructing a larger hypercube, which consists of subhypercubes for every attribute. Thus, queries on any single/multiple attribute(s) can be fulfilled as well. KISS takes multiple attributes into consideration at a time, and becomes more efficient when more attributes are specified, contrary to other existing approaches that can only process one attribute at a time and then merge all the search results for each one. In addition, by keytoken clustering, the dimensionality r of the hypercube is adjustable to fit various upper applications, or the underlying overlay. In summary, KISS provides a simple yet effective unified solution for complex search capabilities in P2P networks, and eliminates the problems such as increased index cost, bandwidth overhead, unbalanced load, hot spots, and points of failures, etc. It also supports a flexible ranking mechanism without any global knowledge required. A prototype of KISS is implemented in Java, and a series of experiments have been conducted to evaluate KISS in load balancing and search efficiency. Two real datasets are used: one consisting of 131,180 records from a website directory and the other consisting of 2,412,613 records from Internet CD database. Experimental results are presented to support the above-mentioned features, while a mathematical analysis of the complexity and dimensionality of KISS is also given.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T06:25:06Z (GMT). No. of bitstreams: 1 ntu-95-F85725003-1.pdf: 4986311 bytes, checksum: c7cda16cdecfca014cf5ba541e056765 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Results of this work . . . . . . . . . . . . . . . . . . . 5 1.2.1 System Model . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Road-map . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Literature Review 8 2.1 Unstructured P2P networks . . . . . . . . . . . . . . . . . 8 2.1.1 Decentralized overlay . . . . . . . . . . . . . . . . . 9 2.1.2 Centralized overlay . . . . . . . . . . . . . . . . . . 10 2.1.3 Hybrid approach . . . . . . . . . . . . . . . . . . . . 10 2.2 Structured P2P networks . . . . . . . . . . . . . . . . . . 11 2.2.1 DHT-based overlay . . . . . . . . . . . . . . . . . . . 12 2.3 Query Capabilities . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Keyword search . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Prefix search and Range search . . . . . . . . . . . . . 16 2.3.3 Complex search . . . . . . . . . . . . . . . . . . . . . 18 3 System Overview 20 3.1 Underlying P2P Networks . . . . . . . . . . . . . . . . . . 20 3.1.1 Using DHT-based overlay . . . . . . . . . . . . . . . . 21 3.2 r-Dimensional Hypercube Vector Space . . . . . . . . . . . 22 3.3 Map to the Underlying Overlay . . . . . . . . . . . . . . . 27 4 Keyword Search 29 4.1 Keyword Search Problem . . . . . . . . . . . . . . . . . . 29 4.2 Index and Search Scheme . . . . . . . . . . . . . . . . . . 30 4.2.1 Search in the Hypercube . . . . . . . . . . . . . . . . 31 4.2.2 Detailed Operations . . . . . . . . . . . . . . . . . . 34 4.3 Comparison to Existing Approaches . . . . . . . . . . . . . 37 4.3.1 Remarks on performance . . . . . . . . . . . . . . . . . 39 4.4 Mathematical Analysis . . . . . . . . . . . . . . . . . . . 40 4.4.1 Choosing the dimensionality r . . . . . . . . . . . . . 40 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . 43 4.5.1 Experiments for Load Distribution . . . . . . . . . . . 45 4.5.2 Experiments for Search Performance . . . . . . . . . . . 49 4.5.3 Guided Depth-First Search . . . . . . . . . . . . . . . 51 4.5.4 Search Performance with cache . . . . . . . . . . . . . 52 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5 Prefix Search 57 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 57 5.1.1 Prefix Search Problem . . . . . . . . . . . . . . . . . 57 5.1.2 Existing Approaches for Prefix Search . . . . . . . . . 58 5.2 Keytoken-based Index and Search Scheme . . . . . . . . . . 60 5.2.1 Tokenization . . . . . . . . . . . . . . . . . . . . . . 61 5.2.2 Index scheme . . . . . . . . . . . . . . . . . . . . . . 61 5.2.3 Search scheme . . . . . . . . . . . . . . . . . . . . . 63 5.2.4 Search Strategies . . . . . . . . . . . . . . . . . . . 65 5.2.5 Tree Pruning for Performance . . . . . . . . . . . . . . 67 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . 68 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6 Wildcard Search 75 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 75 6.1.1 Existing Approaches for Wildcard Search . . . . . . . . 77 6.2 Keytoken-based Index and Search Scheme . . . . . . . . . . 81 6.2.1 Symmetric Tokenization . . . . . . . . . . . . . . . . . 81 6.2.2 Index scheme . . . . . . . . . . . . . . . . . . . . . . 82 6.2.3 Search scheme . . . . . . . . . . . . . . . . . . . . . 85 6.2.4 Advanced Search Strategies . . . . . . . . . . . . . . . 87 6.2.5 Ranking by Similarity . . . . . . . . . . . . . . . . . 89 6.3 Keytoken Clustering . . . . . . . . . . . . . . . . . . . . 90 6.3.1 Keytoken Correlation . . . . . . . . . . . . . . . . . . 91 6.3.2 Load Distribution . . . . . . . . . . . . . . . . . . . 94 6.4 Experiment Result . . . . . . . . . . . . . . . . . . . . 97 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 101 7 Extensions of KISS 104 7.1 Multi-dimensional Search . . . . . . . . . . . . . . . . . 104 7.1.1 Shifted Tokenization . . . . . . . . . . . . . . . . . . 107 7.1.2 Index Scheme . . . . . . . . . . . . . . . . . . . . . . 107 7.1.3 Experiment Result . . . . . . . . . . . . . . . . . . . 110 7.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2 Query Expansion . . . . . . . . . . . . . . . . . . . . . . 114 7.2.1 Search scheme for Query expansion . . . . . . . . . . . 116 7.2.2 Experimental Results . . . . . . . . . . . . . . . . . . 119 8 Encoding Strategies 121 8.1 Generalized KISS . . . . . . . . . . . . . . . . . . . . . 121 8.1.1 Search by Regular Expression . . . . . . . . . . . . . . 122 8.1.2 KISS with Hybrid Encoding . . . . . . . . . . . . . . . 122 8.2 Comparison of Encoding Schemes . . . . . . . . . . . . . . 128 8.2.1 Query expressiveness and Query granularity . . . . . . . 128 8.2.2 Encoding efficiency and Object distinguishability . . . 129 8.2.3 Load balance . . . . . . . . . . . . . . . . . . . . . . 135 8.2.4 Search performance . . . . . . . . . . . . . . . . . . . 145 8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9 Conclusion 159 10 Future Work 164 10.1 Range Search . . . . . . . . . . . . . . . . . . . . . . . 164 Bibliography 167 Curriculum Vitae 177
dc.language.iso	en
dc.subject	點對點網路	zh_TW
dc.subject	超立方空間	zh_TW
dc.subject	多維度搜尋	zh_TW
dc.subject	通用字元搜尋	zh_TW
dc.subject	前置搜尋	zh_TW
dc.subject	關鍵詞搜尋	zh_TW
dc.subject	資訊檢索	zh_TW
dc.title	點對點網路下複合檢索系統之設計與實作	zh_TW
dc.title	Complex Search in Peer-to-Peer Networks	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	李瑞庭(Anthony J. T. Lee),呂及人(Chi-Jen Lu),邱舉明(Ge-Ming Chiu),林宗男(Tsungnan Lin),蔡益坤(Yih-Kuen Tsay)
dc.subject.keyword	點對點網路,資訊檢索,關鍵詞搜尋,前置搜尋,通用字元搜尋,多維度搜尋,超立方空間,	zh_TW
dc.subject.keyword	Peer-to-peer network,Information retrieval,Keyword search,Prefix search,Wildcard search,Complex search,Multi-dimensional search,Hypercube,	en
dc.relation.page	179
dc.rights.note	未授權
dc.date.accepted	2006-07-29
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	4.87 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。