Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29985
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor趙坤茂
dc.contributor.authorRung-Ren Linen
dc.contributor.author林容任zh_TW
dc.date.accessioned2021-06-13T01:28:56Z-
dc.date.available2013-08-05
dc.date.copyright2011-08-05
dc.date.issued2011
dc.date.submitted2011-08-02
dc.identifier.citation[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. On Finding Lowest Common
Ancestors in Trees. In SIAM J. Computing, pages 115-132, 1976.
[2] Z. Bao, T. W. Ling, B. Chen, and J. Lu. Effective XML Keyword Search with
Relevance Oriented Ranking. In ICDE, pages 517-528, 2009.
[3] B. Cate and C. Lutz. The Complexity of Query Containment in Expressive
Fragments of XPath 2.0. In Journal of ACM, pages 73-82, 2009.
[4] Y. Chen, S. B. Davidson, and Y. Zheng. BLAS: an Efficient XPath Processing
System. In SIGMOD, pages 47-58, 2004.
[5] S. Cohen, J.Mamou, Y.Kanza, and Y.Sagiv. XSEarch: A Semantic Search Engine
for XML. In VLDB, pages 45-56, 2003.
[6] B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword Search on External
Memory Data Graphs. In PVLDB, pages 1189-1204, 2008.
[7] B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding Top-k
Min-Cost Connected Trees in Databases. In ICDE, pages 836-845, 2007.
[8] D. K. Fisher, F. Lam, W. M. Shui, and R. K.Wong. Dynamic Labeling Schemes
for Ordered XML Based on Type Information. In ADC, pages 59-68, 2006.
[9] S. Flesca, F. Furfaro, and S. Greco. A Query Language for XML Based on
Graph Grammars. In Journal of WWW, pages 125-157, 2002.
[10] K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword Proximity Search in Complex
Data Graphs. In SIGMOD, pages 927-940, 2008.
[11] L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked Keyword
Search over XML Documents. In SIGMOD, pages 16-27, 2003.
[12] V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava, Keyword
Proximity Search in XML Trees. In IEEE TKDE, pages 525-539, 2006.
[13] V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar.
Bidirectional Expansion for Keyword Search on Graph Databases. In
VLDB, pages 505-516, 2005.
[14] B. Kimelfeld and Y. Sagiv. Finding and Approximating Top-k Answers in Keyword
Proximity Search. In PODS, pages 173-182 , 2006.
[15] G. Koloniari and E. Pitoura. LCA-Based Selection for XML Document Collections.
In WWW, pages 511-520, 2010.
[16] L. Kong, R. Gilleron, and A. Lema. Retrieving Meaningful Relaxed Tightest
Fragments for XML Keyword Search. In EDBT, pages 815-826, 2009.
[17] C. Li, T. W. Ling. QED: a Novel Quaternary Encoding to Completely Avoid
Re-labeling in XML Updates. In CIKM, pages 501-508, 2005.
[18] G. Li, J. Feng, J. Wang, and L. Zhou. Effective Keyword Search for Valuable
LCAs over XML Documents. In CIKM, pages 31-40, 2007.
[19] G. Li, J. Feng, J. Wang, and L. Zhou. An Effective and Versatile Keyword
Search Engine on Heterogenous Data Sources. In PVLDB, pages 1452-1455,
2008.
[20] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an Effective 3-in-
1 Keyword Search Method for Unstructured, Semi-Structured and Structured
Data. In SIGMOD, pages 903-914, 2008.
[21] Q. Li, B. Moon. Indexing and Querying XML Data for Regular Path Expressions.
In VLDB, pages 361-370, 2001.
[22] Y. Li, C. Yu, and H. V. Jagadish. Schema-Free XQuery. In VLDB, pages 72-83,
2004.
[23] R.-R. Lin, Y.-H. Chang, and K.-M. Chao. Faster Algorithms for Searching
Relevant Matches in XML Databases. In DEXA, pages 290-297, 2010.
[24] R.-R. Lin, Y.-H. Chang, and K.-M. Chao. Identifying Relevant Matches with
NOT Semantics over XML Documents. In DASFAA, pages 466-480, 2011.
[25] C. Liu, J. Li, J. X. Yu, and R. Zhou. Adaptive Relaxation for Querying Heterogeneous
XML Data Sources. In Information Systems, pages 688-707, 2010.
[26] Z. Liu and Y. Chen. Reasoning and Identifying Relevant Matches for XML
Keyword Search. In VLDB, pages 921-932, 2008.
[27] J. Lu, T. W. Lin, C.-Y. Chan, and T. Chen. From Region Encoding to Extended
Dewey: on Efficient Processing of XML Twig Pattern Matching. In VLDB,
pages 193-204, 2005.
[28] Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k Keyword Query in
Relational Databases. In SIGMOD, pages 115 - 126, 2007.
[29] J.-K. Min, J. Lee, and C.-W. Chung. An Efficient XML Encoding and Labeling
Method for Query Processing and Updating on Dynamic XML Data. In Journal
of Systems and Software, pages 503-515, 2009.
[30] B. Nguyen and S. Abiteboul. A Hash-Tree Based Algorithm for Subset Detection:
Analysis and Experiments. Manuscript, 2002.
[31] B. Nguyen and S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML Data
on the Web. In SIGMOD, pages 437-448, 2001.
[32] P. O’Neil, E. O’Neil, S. Pal, I. Cseri, G. Schaller, and N.Westbury. ORDPATHs:
Insert-Friendly XML Node Labels. In SIGMOD, pages 903-908, 2004.
[33] P. Rao and B. Moon. PRIX: Indexing And Querying XML Using Prufer Sequences.
In ICDE, pages 288-300, 2004.
[34] B. Schieber and U.Vishkin. On Finding Lowest Common Ancestors: Simplification
and Parallelization. In SIAM J. Computing, pages 111-123, 1988.
[35] F. Shao, L. Guo, and C. Botev. Efficient Keyword Search over Virtual XML
Views. In VLDB Journal, pages 543 - 570, 2009.
[36] C. Sun, C.-Y. Chan, and A. K. Goenka. Multiway SLCA-Based Keyword Search
in XML Data. In WWW, pages 1043-1052, 2007.
[37] I. Tatarinov, S. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C.
Zhang. Storing and Querying Ordered XML Using a Relational Database System.
In SIGMOD, pages 204-215, 2002.
[38] S. Tatikonda, S. Parthasarathy, and M. Goyder. LCSTRIM: Dynamic Programming
Meets XML Indexing and Querying. In VLDB, pages 63-74, 2007.
[39] Q. H. Vu, B. C. Ooi, D. Papadias, and A. K. H. Tung. A Graph Method for
Keyword-Based Selection of the Top-k Databases. In SIGMOD, pages 915-926,
2008.
[40] H. Wang, S. Park, W. Fan, and P. Yu. ViST: a Dynamic Index Method for
Querying XML Data by Tree Structures. In SIGMOD, pages 110-121, 2003.
[41] W. Wang, X. Wang, and A. Zhou. Hash-Search: An Efficient SLCA-based
Keyword Search Algorithm on XML Documents. In DASFAA, pages 496-510,
2009.
[42] Z. Wen. New Algorithms for the LCA Problem and the Binary Tree Reconstruction
Problem. In Information Processing Letters, pages 11-16, 1994.
[43] L. Wu, T. W. Ling, H. Wu, and Z. Bao. DDE: From Dewey to a Fully Dynamic
XML Labeling Scheme. In SIGMOD, pages 719-730, 2009.
[44] X.Wu, M.-L. Lee, and W. Hsu. A Prime Number Labeling Scheme for Dynamic
Ordered XML Trees. In ICDE, pages 66-78, 2004.
[45] X. Wu, D. Theodoratos, S. Souldatos, T. Dalamagas, and T. Sellis Evaluation
Techniques for Generalized Path Pattern Queries on XML Data. In Journal of
WWW, pages 441-474, 2010.
[46] Y. Xu and Y. Papakonstantinou. Efficient Keyword Search for Smallest LCAs
in XML Databases. In SIGMOD, pages 527-538, 2005.
[47] Y. Xu and Y. Papakonstantinou. Efficient LCA Based Keyword Search in XML
Data. In EDBT, pages 535-546, 2008.
[48] L. Xyleme: A Dynamic Warehouse for XML Data of the Web. In IEEE Data
Engineering Bulletin, pages 40-47, 2001.
[49] C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, G. M. Lohman. On Supporting
Containment Queries in Relational Database Management Systems. In
SIGMOD, pages 425-436, 2001.
[50] M. Zhong, M. Liu. Efficient Keyword Proximity Search Using a Frontier-Reduce
Strategy Based on d-Distance Graph Index. In IDEAS, pages 206-216, 2009.
[51] R. Zhou, C. Liu, and J. Li. Fast ELCA Computation for Keyword Queries on
XML Data. In EDBT, pages 549-560, 2010.
[52] Dewey Decimal Classification. http://www.oclc.org/dewey/.
[53] Oracle Berkeley DB. http://www.oracle.com/database/berkeley-db/.
[54] http://www.cs.washington.edu/research/xmldatasets/
[55] http://www.cafeconleche.org/books/biblegold/examples/
[56] http://www.xml-benchmark.org/
[57] http://www.w3.org/TR/xpath-full-text-10/
[58] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/29985-
dc.description.abstract可延伸標記語言(XML)現今已被廣泛地使用於資料儲存與轉換。為了從XML文件中擷取有意義的資訊,許多關鍵字搜尋機制因此被提出來討論。一般來說,XML 文件可被視為一棵樹狀結構,而最小最低共同祖先(SLCA)關鍵字搜尋是一種常用的方法,它可以根據給定的關鍵字來找出合適的子樹回傳給使用者。然而在SLCA 機制下,所有被回傳的子樹裡的節點都被視為等同重要,因此又有新概念建構子被提出,進而只回傳有意義節點而非整棵子樹。在本論文中,我們提出兩個方法MinMap 以及SingleProbe 來改進找尋有意義節點的時間效能。此外,當XML 樹的寬度很大且關鍵字個數也多的情況下,子集合偵測是一個自然衍生出來的問題。我們同時也提出一個有效率的方法來處理此情況,同時實作到MinMap方法裡,並稱為MinMap+方法。此外,傳統的SLCA 關鍵字搜尋只允許AND 運算子。在本論文裡,我們同時針對使用NOT 運算子之關鍵字搜尋來擷取有意義的資訊,並提出一個演算法找出有效且有意義的節點。根據一個對XML 文件的觀察,我們將節點分成有效的與無效的,而只有有效的節點才有資格成為被回傳的節點。因為NOT 運算子的使用,關鍵字搜尋變得更為複雜,因此我們也討論單調性以及一致性在我們定義的架構中的變化。我們同時亦實作所有的方法來做實驗測試,以證明我們提出的方法的效能。根據實驗結果,MinMap 以及SingleProbe都比之前的方法來得有效率,而MinMap+也被證實當XML 樹寬度很大且關鍵字個數也多時,能夠大幅改善時間效能。此外,關於使用NOT 運算子之關鍵字搜尋的部份,我們也證明我們所提出的方法能夠比之前的方法找出更合理的資訊。
另一方面,編碼機制的運用允許我們快速的決定兩個節點之間的結構關係,而不用整個XML 樹都重新訪視一遍。也就是XML 樹上的每一個節點都會被配置一個獨一無二的編碼,並利用此編碼決定任二個節點之間的結構關係。這些結構關係包含文件順序、祖先子孫、父子以及兄弟。目前最常見的編碼機制可分為兩大類:包含式以及字首式,其中杜威碼是字首式的代表。這兩種編碼機制都因它們簡單以及有效率而被廣泛地使用,但它們仍然有被改善的空間。在本論文裡,我們也提出一個基於完整樹的編碼機制,並稱為CT(complete-tree)編碼機制。我們提出的編碼機制相當簡潔且緊密。實驗結果顯示CT 編碼機制可以有效地支援所有結構關係的偵測,同時在大部份的情況下它也有最佳的空間效能表現。
zh_TW
dc.description.provenanceMade available in DSpace on 2021-06-13T01:28:56Z (GMT). No. of bitstreams: 1
ntu-100-D95922017-1.pdf: 761075 bytes, checksum: fd941de0e00fed3da150ce18b395e22f (MD5)
Previous issue date: 2011
en
dc.description.tableofcontents1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions and Organization . . . . . . . . . . . . . . . . . . . . . 5
2 Preliminaries and Related Work 8
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 The MaxMatch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Faster Algorithms for Searching Relevant Matches 15
3.1 The MinMap Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The SingleProbe Algorithm . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 The MinMap+ Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 The Subset Detection Problem . . . . . . . . . . . . . . . . . 25
3.3.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 Processing Time and Scalability . . . . . . . . . . . . . . . . . 31
4 Searching Relevant Matches with NOT Semantics 36
4.1 The ValidRM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Basic Observation and Definitions . . . . . . . . . . . . . . . . 37
4.1.2 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.3 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Properties of Monotonicity and Consistency . . . . . . . . . . . . . . 45
4.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.1 Precision and Recall . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Processing Time . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 A Complete-Tree-Based Labeling Scheme 55
5.1 Existing Labeling Schemes . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 The Complete-Tree Labeling Scheme . . . . . . . . . . . . . . . . . . 59
5.2.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.3 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.1 Space Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.2 Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Concluding Remarks 71
dc.language.isoen
dc.subject結構關係zh_TW
dc.subject可延伸標記語言zh_TW
dc.subject關鍵字搜尋zh_TW
dc.subjectNOT 語意zh_TW
dc.subject編碼機制zh_TW
dc.subjectlabeling schemeen
dc.subjectstructure relationshipen
dc.subjectXMLen
dc.subjectkeyword searchen
dc.subjectNOT semanticsen
dc.title可延伸標記語言之關鍵字搜尋與編碼機制研究zh_TW
dc.titleA Study of Efficient Keyword Search and Labeling
Schemes for XML Documents
en
dc.typeThesis
dc.date.schoolyear99-2
dc.description.degree博士
dc.contributor.coadvisor張雅惠
dc.contributor.oralexamcommittee陳信希,傅楸善,劉長遠,鄭卜壬
dc.subject.keyword可延伸標記語言,關鍵字搜尋,NOT 語意,編碼機制,結構關係,zh_TW
dc.subject.keywordXML,keyword search,NOT semantics,labeling scheme,structure relationship,en
dc.relation.page78
dc.rights.note有償授權
dc.date.accepted2011-08-03
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-100-1.pdf
  未授權公開取用
743.24 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved