Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41617
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王勝德(Sheng-De Wang)
dc.contributor.authorChia-Hung Chenen
dc.contributor.author陳家宏zh_TW
dc.date.accessioned2021-06-15T00:24:57Z-
dc.date.available2010-02-03
dc.date.copyright2009-02-03
dc.date.issued2009
dc.date.submitted2009-01-22
dc.identifier.citationBibliography
[1] D. Veillard. Libxml2 project web page, http://xmlsoft.org/.
[2] Datapower, http://www.datapower.com/.
[3] Document Object Model, http://www.w3.org/dom/.
[4] Simple API for XML, http://www.saxproject.org/.
[5] UTF-8 standard: RFC3629, http://tools.ietf.org/html/rfc3629.
[6] XML Path Language Version 1.0, http://www.w3.org/tr/xpath.
[7] M. J. L. N. Abu-Ghazaleh. Di erential Deserialization for Optimized SOAP
Performance. In SC05: High performance computing, networking, and storage
conference, Nov. 2005.
[8] J. B. B. C. C. L. Jan van Lunteren, Ton Engbersen. XML Accelerator Engine.
In The First International Workshop on High Performance XML Processing,
2004.
[9] W. L. A. S. K. Chiu, T. Devadithya. A Binary XML for Scienti c Applications.
In International Conference on e-Science and Grid Computing, 2005.
[10] W. L. Kenneth Chiu. A compiler-based approach to schema-speci c XML
parsing. In The First International Workshop on High Performance XML
Processing, 2004.
[11] W. L. Markus L. Noga, Ste en Schott. Lazy XML processing. In Proceedings
of the 2002 ACM symposium on Document engineering, 2002.
[12] J. J. Matthias Nicola. XML Parsing: A Threat to Database Performance. In
Conf. Information and Knowledge Management (CIKM 03), 2003.
[13] H. Sutter. The free lunch is over: A fundamental turn toward concurrency in
software. Dr. Dobb's Journal, page 30, 2005.
[14] R. A. van Engelen. Constructing Finite State Automata for High-Performance
XMLWeb Services. In Proceedings of the International Symposium on Web
Services(ISWS), 2004.
[15] Y. P. Wei Lu, Kennneth Chiu. A Parallel Approach to XML Parsing. In The
7th IEEE/ACM International Conference on Grid Computing, 2006.
[16] Y. Z. K. C. Y. Pan, W. Lu. A Static Load-Balancing Scheme for Parallel
XML Parsing on Multicore CPUs. In 7th IEEE International Symposium on
Cluster Computing and the Grid, 2007.
[17] J. Zhang. Process XML on a chip. In CommsdDesign Magazine, 2005.
4
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/41617-
dc.description.abstract在2008年時,可擴展標記語言(XML),由於其可擴展的特性,已經在全世界廣泛地使用,作為不同的電腦軟體間,最常用的文件交換標準。電腦工業界預測,可擴展標記語言將會在未來的數十年間,被更廣泛地採用。然而,也由於可擴展標記語言對於人眼而言,非常可讀且清楚的特性,電腦要對其作語法解析,會消耗大量的時間和記憶體。
虛擬標記描述子(VTD)是種處理可擴展標記語言的新方法。它最大的特色在於,它是種非抽取式的處理方法:它不會將文件中的資料自檔案中取出,以建構自身需要的資料結構。虛擬標記描述子僅僅記錄文件中,部分的後設資訊,例如一些標記在檔案中的位址。它可以幾乎媲美文件物件模型的處理能力,但使用較少的運算資源。
本文提出一個以虛擬標記描述子為目標輸出之可擴展標記語言的語法分析器之硬體實現,並且針對我們設計該高速語法分析器之技術做分析與討論。該語法分析器,是受限於硬體而特別設計的:它無法偵測文件中,本身的文法錯誤,但是它可以用非常快的速度做語法分析。硬體合成的數據和實驗結果顯示,在平均的情況下,這個硬體的語法分析器能以每秒處理三十億位元的速度,來分析XML文件。
zh_TW
dc.description.abstractAt the year 2008, XML (Extensible Markup Language) has been globally used as the most common and standard exchange format between different software because of its extensible characteristic. The computer industry predicts that XML will be used more and more in the future decades. However, because the format of XML is very clear and understandable to human, to parse XML costs a lot of time or memory.
Virtual Token Descriptor (VTD) is a new method of processing XML. Its most special characteristic is that it is a non-extractive method: it does not extract the data from original XML file to build its own data structure. VTD only records certain important meta-information such as the offsets of some tags. It can achieve almost all the functionalities of DOM, but with fewer resources.

This thesis presents a hardware implementation of the VTD XML parser, and discuss about the mechanisms that we use to create a high speed VTD XML parser. This parser is specially designed because of the hardware limitation: it cannot detect the errors inside the XML document, but it is capable of doing high-speed parsing. The synthesis data and experimental results show that this parser can process XML document at a speed of 3Gbs on average case.
en
dc.description.provenanceMade available in DSpace on 2021-06-15T00:24:57Z (GMT). No. of bitstreams: 1
ntu-98-R95921122-1.pdf: 1179622 bytes, checksum: e4ee4a115d696e4d550f0d6fb2ce96b6 (MD5)
Previous issue date: 2009
en
dc.description.tableofcontentsContents
1 Introduction .................. 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background and Problem Analysis .................. 5
2.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 UTF-8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 The Synthesis of Decoder . . . . . . . . . . . . . . . . . . . 6
2.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 The Difficulty of Processing UTF-8 . . . . . . . . . . . . . . 7
2.2.2 The Difficulty of Large Finite State Machine . . . . . . . . . 8
2.2.3 The Difficulty of Cutting Pipeline Stage . . . . . . . . . . . . 8
3 Related Works .................. 10
3.1 Virtual Token Descriptor . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.2 Imagining VTD Records as A Relational Database . . . . . 10
3.2 Parallel XML Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 System Architecture .................. 16
4.1 The Merging Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Merging with UTF-8 . . . . . . . . . . . . . . . . . . . . . . 18
4.1.2 Special Cyclic Buffer . . . . . . . . . . . . . . . . . . . . . . 21
4.1.3 Halting Circuit . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 The Rule-matching Stage . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.1 Parallel Architecture . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 TEXT Processing . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2.3 Deep into The Rule-matching . . . . . . . . . . . . . . . . . 25
4.3 The Recording Stage . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Evaluation Board . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Discussion .................. 29
5.1 The Decoupling with The Error Detection . . . . . . . . . . . . . . 29
5.1.1 Character Validation . . . . . . . . . . . . . . . . . . . . . . 29
5.1.2 Matching of The Start Tag and The End Tag . . . . . . . . 30
5.1.3 Uniqueness of Attribute . . . . . . . . . . . . . . . . . . . . 30
5.2 Modified Implementation of VTD Records . . . . . . . . . . . . . . 30
5.3 Vector Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Partial Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.5 Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Experimental Results .................. 35
6.1 Synthesis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Conclusion ..................39
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8 Bibliography .................. 41
dc.language.isoen
dc.subject硬體加速zh_TW
dc.subject可擴展標記語言zh_TW
dc.subject語法分析器zh_TW
dc.subject萬國碼zh_TW
dc.subjectUTF-8en
dc.subjectParseren
dc.subjectXMLen
dc.subjectVTDen
dc.subjecthardwareen
dc.subjectindexeren
dc.title以虛擬標記描述子為目標輸出之可擴展標記語言的語法分析器之硬體加速zh_TW
dc.titleHardware Accelerated XML Parser for Virtual Token Descriptoren
dc.typeThesis
dc.date.schoolyear97-1
dc.description.degree碩士
dc.contributor.oralexamcommittee洪士灝,鍾國亮,林彥君
dc.subject.keyword可擴展標記語言,硬體加速,語法分析器,萬國碼,zh_TW
dc.subject.keywordParser,XML,VTD,hardware,indexer,UTF-8,en
dc.relation.page42
dc.rights.note有償授權
dc.date.accepted2009-01-23
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-98-1.pdf
  未授權公開取用
1.15 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved