可延伸標記語言之關鍵字搜尋與編碼機制研究

Rung-Ren Lin; 林容任

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29985

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	趙坤茂
dc.contributor.author	Rung-Ren Lin	en
dc.contributor.author	林容任	zh_TW
dc.date.accessioned	2021-06-13T01:28:56Z	-
dc.date.available	2013-08-05
dc.date.copyright	2011-08-05
dc.date.issued	2011
dc.date.submitted	2011-08-02
dc.identifier.citation	[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. On Finding Lowest Common Ancestors in Trees. In SIAM J. Computing, pages 115-132, 1976. [2] Z. Bao, T. W. Ling, B. Chen, and J. Lu. Effective XML Keyword Search with Relevance Oriented Ranking. In ICDE, pages 517-528, 2009. [3] B. Cate and C. Lutz. The Complexity of Query Containment in Expressive Fragments of XPath 2.0. In Journal of ACM, pages 73-82, 2009. [4] Y. Chen, S. B. Davidson, and Y. Zheng. BLAS: an Efficient XPath Processing System. In SIGMOD, pages 47-58, 2004. [5] S. Cohen, J.Mamou, Y.Kanza, and Y.Sagiv. XSEarch: A Semantic Search Engine for XML. In VLDB, pages 45-56, 2003. [6] B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword Search on External Memory Data Graphs. In PVLDB, pages 1189-1204, 2008. [7] B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding Top-k Min-Cost Connected Trees in Databases. In ICDE, pages 836-845, 2007. [8] D. K. Fisher, F. Lam, W. M. Shui, and R. K.Wong. Dynamic Labeling Schemes for Ordered XML Based on Type Information. In ADC, pages 59-68, 2006. [9] S. Flesca, F. Furfaro, and S. Greco. A Query Language for XML Based on Graph Grammars. In Journal of WWW, pages 125-157, 2002. [10] K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword Proximity Search in Complex Data Graphs. In SIGMOD, pages 927-940, 2008. [11] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword Search over XML Documents. In SIGMOD, pages 16-27, 2003. [12] V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava, Keyword Proximity Search in XML Trees. In IEEE TKDE, pages 525-539, 2006. [13] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional Expansion for Keyword Search on Graph Databases. In VLDB, pages 505-516, 2005. [14] B. Kimelfeld and Y. Sagiv. Finding and Approximating Top-k Answers in Keyword Proximity Search. In PODS, pages 173-182 , 2006. [15] G. Koloniari and E. Pitoura. LCA-Based Selection for XML Document Collections. In WWW, pages 511-520, 2010. [16] L. Kong, R. Gilleron, and A. Lema. Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search. In EDBT, pages 815-826, 2009. [17] C. Li, T. W. Ling. QED: a Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates. In CIKM, pages 501-508, 2005. [18] G. Li, J. Feng, J. Wang, and L. Zhou. Effective Keyword Search for Valuable LCAs over XML Documents. In CIKM, pages 31-40, 2007. [19] G. Li, J. Feng, J. Wang, and L. Zhou. An Effective and Versatile Keyword Search Engine on Heterogenous Data Sources. In PVLDB, pages 1452-1455, 2008. [20] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an Effective 3-in- 1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data. In SIGMOD, pages 903-914, 2008. [21] Q. Li, B. Moon. Indexing and Querying XML Data for Regular Path Expressions. In VLDB, pages 361-370, 2001. [22] Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. In VLDB, pages 72-83, 2004. [23] R.-R. Lin, Y.-H. Chang, and K.-M. Chao. Faster Algorithms for Searching Relevant Matches in XML Databases. In DEXA, pages 290-297, 2010. [24] R.-R. Lin, Y.-H. Chang, and K.-M. Chao. Identifying Relevant Matches with NOT Semantics over XML Documents. In DASFAA, pages 466-480, 2011. [25] C. Liu, J. Li, J. X. Yu, and R. Zhou. Adaptive Relaxation for Querying Heterogeneous XML Data Sources. In Information Systems, pages 688-707, 2010. [26] Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML Keyword Search. In VLDB, pages 921-932, 2008. [27] J. Lu, T. W. Lin, C.-Y. Chan, and T. Chen. From Region Encoding to Extended Dewey: on Efficient Processing of XML Twig Pattern Matching. In VLDB, pages 193-204, 2005. [28] Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k Keyword Query in Relational Databases. In SIGMOD, pages 115 - 126, 2007. [29] J.-K. Min, J. Lee, and C.-W. Chung. An Efficient XML Encoding and Labeling Method for Query Processing and Updating on Dynamic XML Data. In Journal of Systems and Software, pages 503-515, 2009. [30] B. Nguyen and S. Abiteboul. A Hash-Tree Based Algorithm for Subset Detection: Analysis and Experiments. Manuscript, 2002. [31] B. Nguyen and S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML Data on the Web. In SIGMOD, pages 437-448, 2001. [32] P. O’Neil, E. O’Neil, S. Pal, I. Cseri, G. Schaller, and N.Westbury. ORDPATHs: Insert-Friendly XML Node Labels. In SIGMOD, pages 903-908, 2004. [33] P. Rao and B. Moon. PRIX: Indexing And Querying XML Using Prufer Sequences. In ICDE, pages 288-300, 2004. [34] B. Schieber and U.Vishkin. On Finding Lowest Common Ancestors: Simplification and Parallelization. In SIAM J. Computing, pages 111-123, 1988. [35] F. Shao, L. Guo, and C. Botev. Efficient Keyword Search over Virtual XML Views. In VLDB Journal, pages 543 - 570, 2009. [36] C. Sun, C.-Y. Chan, and A. K. Goenka. Multiway SLCA-Based Keyword Search in XML Data. In WWW, pages 1043-1052, 2007. [37] I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and Querying Ordered XML Using a Relational Database System. In SIGMOD, pages 204-215, 2002. [38] S. Tatikonda, S. Parthasarathy, and M. Goyder. LCSTRIM: Dynamic Programming Meets XML Indexing and Querying. In VLDB, pages 63-74, 2007. [39] Q. H. Vu, B. C. Ooi, D. Papadias, and A. K. H. Tung. A Graph Method for Keyword-Based Selection of the Top-k Databases. In SIGMOD, pages 915-926, 2008. [40] H. Wang, S. Park, W. Fan, and P. Yu. ViST: a Dynamic Index Method for Querying XML Data by Tree Structures. In SIGMOD, pages 110-121, 2003. [41] W. Wang, X. Wang, and A. Zhou. Hash-Search: An Efficient SLCA-based Keyword Search Algorithm on XML Documents. In DASFAA, pages 496-510, 2009. [42] Z. Wen. New Algorithms for the LCA Problem and the Binary Tree Reconstruction Problem. In Information Processing Letters, pages 11-16, 1994. [43] L. Wu, T. W. Ling, H. Wu, and Z. Bao. DDE: From Dewey to a Fully Dynamic XML Labeling Scheme. In SIGMOD, pages 719-730, 2009. [44] X.Wu, M.-L. Lee, and W. Hsu. A Prime Number Labeling Scheme for Dynamic Ordered XML Trees. In ICDE, pages 66-78, 2004. [45] X. Wu, D. Theodoratos, S. Souldatos, T. Dalamagas, and T. Sellis Evaluation Techniques for Generalized Path Pattern Queries on XML Data. In Journal of WWW, pages 441-474, 2010. [46] Y. Xu and Y. Papakonstantinou. Efficient Keyword Search for Smallest LCAs in XML Databases. In SIGMOD, pages 527-538, 2005. [47] Y. Xu and Y. Papakonstantinou. Efficient LCA Based Keyword Search in XML Data. In EDBT, pages 535-546, 2008. [48] L. Xyleme: A Dynamic Warehouse for XML Data of the Web. In IEEE Data Engineering Bulletin, pages 40-47, 2001. [49] C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, G. M. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In SIGMOD, pages 425-436, 2001. [50] M. Zhong, M. Liu. Efficient Keyword Proximity Search Using a Frontier-Reduce Strategy Based on d-Distance Graph Index. In IDEAS, pages 206-216, 2009. [51] R. Zhou, C. Liu, and J. Li. Fast ELCA Computation for Keyword Queries on XML Data. In EDBT, pages 549-560, 2010. [52] Dewey Decimal Classification. http://www.oclc.org/dewey/. [53] Oracle Berkeley DB. http://www.oracle.com/database/berkeley-db/. [54] http://www.cs.washington.edu/research/xmldatasets/ [55] http://www.cafeconleche.org/books/biblegold/examples/ [56] http://www.xml-benchmark.org/ [57] http://www.w3.org/TR/xpath-full-text-10/ [58] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29985	-
dc.description.abstract	可延伸標記語言(XML)現今已被廣泛地使用於資料儲存與轉換。為了從XML文件中擷取有意義的資訊，許多關鍵字搜尋機制因此被提出來討論。一般來說，XML 文件可被視為一棵樹狀結構，而最小最低共同祖先(SLCA)關鍵字搜尋是一種常用的方法，它可以根據給定的關鍵字來找出合適的子樹回傳給使用者。然而在SLCA 機制下，所有被回傳的子樹裡的節點都被視為等同重要，因此又有新概念建構子被提出，進而只回傳有意義節點而非整棵子樹。在本論文中，我們提出兩個方法MinMap 以及SingleProbe 來改進找尋有意義節點的時間效能。此外，當XML 樹的寬度很大且關鍵字個數也多的情況下，子集合偵測是一個自然衍生出來的問題。我們同時也提出一個有效率的方法來處理此情況，同時實作到MinMap方法裡，並稱為MinMap+方法。此外，傳統的SLCA 關鍵字搜尋只允許AND 運算子。在本論文裡，我們同時針對使用NOT 運算子之關鍵字搜尋來擷取有意義的資訊，並提出一個演算法找出有效且有意義的節點。根據一個對XML 文件的觀察，我們將節點分成有效的與無效的，而只有有效的節點才有資格成為被回傳的節點。因為NOT 運算子的使用，關鍵字搜尋變得更為複雜，因此我們也討論單調性以及一致性在我們定義的架構中的變化。我們同時亦實作所有的方法來做實驗測試，以證明我們提出的方法的效能。根據實驗結果，MinMap 以及SingleProbe都比之前的方法來得有效率，而MinMap+也被證實當XML 樹寬度很大且關鍵字個數也多時，能夠大幅改善時間效能。此外，關於使用NOT 運算子之關鍵字搜尋的部份，我們也證明我們所提出的方法能夠比之前的方法找出更合理的資訊。另一方面，編碼機制的運用允許我們快速的決定兩個節點之間的結構關係，而不用整個XML 樹都重新訪視一遍。也就是XML 樹上的每一個節點都會被配置一個獨一無二的編碼，並利用此編碼決定任二個節點之間的結構關係。這些結構關係包含文件順序、祖先子孫、父子以及兄弟。目前最常見的編碼機制可分為兩大類：包含式以及字首式，其中杜威碼是字首式的代表。這兩種編碼機制都因它們簡單以及有效率而被廣泛地使用，但它們仍然有被改善的空間。在本論文裡，我們也提出一個基於完整樹的編碼機制，並稱為CT(complete-tree)編碼機制。我們提出的編碼機制相當簡潔且緊密。實驗結果顯示CT 編碼機制可以有效地支援所有結構關係的偵測，同時在大部份的情況下它也有最佳的空間效能表現。	zh_TW
dc.description.provenance	Made available in DSpace on 2021-06-13T01:28:56Z (GMT). No. of bitstreams: 1 ntu-100-D95922017-1.pdf: 761075 bytes, checksum: fd941de0e00fed3da150ce18b395e22f (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions and Organization . . . . . . . . . . . . . . . . . . . . . 5 2 Preliminaries and Related Work 8 2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 The MaxMatch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Faster Algorithms for Searching Relevant Matches 15 3.1 The MinMap Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 The SingleProbe Algorithm . . . . . . . . . . . . . . . . . . . . . . . 21 3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 The MinMap+ Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.1 The Subset Detection Problem . . . . . . . . . . . . . . . . . 25 3.3.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Processing Time and Scalability . . . . . . . . . . . . . . . . . 31 4 Searching Relevant Matches with NOT Semantics 36 4.1 The ValidRM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1.1 Basic Observation and Definitions . . . . . . . . . . . . . . . . 37 4.1.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Properties of Monotonicity and Consistency . . . . . . . . . . . . . . 45 4.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.1 Precision and Recall . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Processing Time . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 A Complete-Tree-Based Labeling Scheme 55 5.1 Existing Labeling Schemes . . . . . . . . . . . . . . . . . . . . . . . . 56 5.2 The Complete-Tree Labeling Scheme . . . . . . . . . . . . . . . . . . 59 5.2.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.3 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3.1 Space Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.2 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6 Concluding Remarks 71
dc.language.iso	en
dc.subject	結構關係	zh_TW
dc.subject	可延伸標記語言	zh_TW
dc.subject	關鍵字搜尋	zh_TW
dc.subject	NOT 語意	zh_TW
dc.subject	編碼機制	zh_TW
dc.subject	labeling scheme	en
dc.subject	structure relationship	en
dc.subject	XML	en
dc.subject	keyword search	en
dc.subject	NOT semantics	en
dc.title	可延伸標記語言之關鍵字搜尋與編碼機制研究	zh_TW
dc.title	A Study of Efficient Keyword Search and Labeling Schemes for XML Documents	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	博士
dc.contributor.coadvisor	張雅惠
dc.contributor.oralexamcommittee	陳信希,傅楸善,劉長遠,鄭卜壬
dc.subject.keyword	可延伸標記語言,關鍵字搜尋,NOT 語意,編碼機制,結構關係,	zh_TW
dc.subject.keyword	XML,keyword search,NOT semantics,labeling scheme,structure relationship,	en
dc.relation.page	78
dc.rights.note	有償授權
dc.date.accepted	2011-08-03
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 未授權公開取用	743.24 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。