Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor曹承礎(Seng-Cho Chou)
dc.contributor.authorWei-Lin Yangen
dc.contributor.author楊瑋琳zh_TW
dc.date.accessioned2021-06-13T05:46:27Z-
dc.date.available2006-07-14
dc.date.copyright2006-07-14
dc.date.issued2006
dc.date.submitted2006-07-11
dc.identifier.citation[1] A. Hotho, A. Maedche, and S. Staab, Ontology-based text clustering, In Proceedings of the IJCAI-2001 Workshop, Text Learning: Beyond Supervision, August, Seattle, USA, 2001.
[2] ACM Digital Library, http://portal.acm.org/dl.cfm .
[3] B. Fung, K. Wang, and M. Ester. Large hierarchical document clustering using frequent itemsets. In SDM03.
[4] Chekuri C., Goldwasser M., Raghavan P., and Upfal E. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7–11), 1997.
[5] Chen, H. and Dumais, S. Bringing order to the web: Automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00),2000, pp.145-152.
[6] Chuang, S.-L. and Chien, L.-F., A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. CIKM-04, 2004.
[7] DMOZ Open Directory Project, http://dmoz.org/ .
[8] Dublin Core Metadata Initiative, http://dublincore.org/ .
[9] Dumais, S. T., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, Nov. 1998.
[10] Extensible Markup Language (XML) in W3C, http://www.w3.org/XML/ .
[11] F. Giannotti, M. Nanni, and D. Pedreschi. Webcat: Automatic categorization of web search results. In SEB D03.
[12] Ferragina, Paolo and Antonio Gulli. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proceedings of WWW 2005, 10-14 May 2005, Chiba, Japan, pp. 801-810.
[13] F.S.C. Tseng and A.Y.H. Chou, The concept of document warehousing and its applications on managing enterprise business intelligence. In: C.P. Wei (ed.), Proceedings of the 8th Pacific Asia Conference on Information Systems: PACIS 2004, July 8-11, 2004, Shanghai, China (AIS, Shanghai, 2004) pp. 563-574.
[14] Google Desktop Search Engine Kit's API, http://www.google.com.tw/apis/index.html .
[15] Google Desktop Search, http://desktop.google.com/index.html.
[16] Google Software Development Kit, http://desktop.google.com/developer.html .
[17] H. Zeng, Q. He, Z. Chen, and W. Ma. Learning to cluster web search results. In SIGIR04.
[18] LookSmart, http://search.looksmart.com/ .
[19] M. R. Genesereth, A. M. Keller and O. Duschka, Infomaster: An Information Integration System, In proceedings of 1997 ACM SIGMOD Conference, May 1997.
[20] M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000.
[21] Martin Porter. An Algorithm for Suffix Stripping. In Program, vol.14, no.3, 1980.
[22] O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. In WWW8, 1999.
[23] Paolo Ferragina, Antonio Gullì. The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets, Lecture Notes in Computer Science, Volume 3202, Jan 2004, pp. 506 – 508.
[24] Platt, J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods -Support Vector Learning. B. Sch61kopf, C. Burges, and A. Smola, eds., MIT Press, (1999).
[25] R.M. Bruckner, T. Wang Ling, O. Mangisengi and A.M. Tjoa, A framework for a multidimensional OLAP model using topic maps. In C. Claramunt et al. (eds), Second International Conference on Web Information Systems Engineering (WISE'01), Vol. 2, Dec. 2001 (IEEE Computer Society, Kyoto, 2001), pp.109-118.
[26] SnakeT, http://snaket.di.unipi.it/ .
[27] Thian-Huat Ong et al., Newsmap: A Knowledge Map for Online News, Decision Support Systems 39 (2005) pp.583-587.
[28] Thomas H. Davenport & Laurence Prusak, Working Knowledge - How Organizations Manage What They Know, Harvard Business School Press, 1998, pp. 187
[29] V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
[30] Vivisimo, http://www.vivisimo.com/ .
[31] Wenxian Wang, Weiyi Meng, Clement Yu. Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment, First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1, 2000, pp. 0283
[32] What Are Topic Maps, http://www.xml.com/pub/a/2002/09/11/topicmaps.html/ .
[33] Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi. Retriever: Improving web search engine results using clustering. In Managing Business with Electronic Commerce 02.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803-
dc.description.abstract以關鍵字為搜尋基礎的搜尋引擎是主要用來作為從大量資料中擷取相關文件的方法之一,回傳的搜尋結果(Snippet)並未加以組織,且僅以關鍵字為單一維度的選擇條件,無法提供多面向的瀏覽。在資訊擷取(Information Retrieval)領域中,分類(Classification)與分群(Clustering)是用來自動給定文件集合不同語意目錄類別的兩個方法。前者必須先訓練部分文件集合,形成分類模式以進行自動化分類。而後者則以統計方法計算文件之間的相似度,達到自動分群的目的。由於搜尋結果具有動態特性,且預先定義的目錄類別不具彈性,因此以分群技術作為本研究的工具之一。
本研究提出以具多維度瀏覽功能的虛擬文件倉儲系統作為提供多面向瀏覽搜尋結果的方法。結合現有搜尋引擎,以HAC+P階層分群演算法形成語意上的階層結構,即形成以語意為基礎的概念階層,透過不斷的搜尋與分群,可形成屬於個人的概念化知識地圖,藉此改善使用者的瀏覽經驗,更有效地找到相關的主題及文件內容。
zh_TW
dc.description.abstractSearching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering.
In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T05:46:27Z (GMT). No. of bitstreams: 1
ntu-95-R93725037-1.pdf: 5393869 bytes, checksum: bda1c47d1dbad8ce78c068a5363c582b (MD5)
Previous issue date: 2006
en
dc.description.tableofcontentsChapter 1 Introduction 1
1.1. Motivation 1
1.2. Objective 3
1.3. Organization 4
Chapter 2 Literature Review 5
2.1. Text Classification Techniques 5
2.2. Document Clustering Techniques 6
2.2.1. Partitioning Clustering 6
2.2.2. Hierarchical Clustering 8
2.2.3. Agglomerative Hierarchical Clustering 9
2.3. Document Warehouses 11
2.3.1. Data Warehouses and Document Warehouses 11
2.3.2. Virtual Data Warehouses 14
2.3.3. Concept Hierarchy 15
2.3.4. Dimensions 16
2.3.5. Data Cube 18
2.3.6. On-Line Analytical Processing Operations 18
2.4. Knowledge Maps 19
Chapter 3 System Design 22
3.1. System Architecture 22
3.2. System Components 22
3.2.1. Heterogeneous document sources 23
3.2.2. Clustering-based Search Engine 23
3.2.3. Warehouse Administrator 26
3.2.4. Multi-dimensional Browser Engine 28
3.3. System Flow 29
3.3.1. Extracting, Transforming, and Loading (ETL) Function 29
3.3.2. Virtual Document Warehousing and Cube Function 30
3.3.3. Multi-dimensional Analysis Function 31
Chapter 4 System Implementation and Experiment Analysis 33
4.1. Scenario 33
4.2. Development Tools 34
4.3. Hierarchical Clustering Experiment and Analysis 35
4.3.1. Hierarchical clustering experiment 35
4.3.2. Discussion and analysis 44
4.4. Virtual Document Warehouse Implementation 47
4.4.1. Data Source Format 47
4.4.2. Virtual Document Warehouse Design 48
4.4.3. Concept Hierarchy Design 56
4.4.4. Documents Loading 57
4.4.5. Dimensions and Cubes 58
4.5. Clustering-based Search engine 61
4.6. Multi-dimensional Browser Engine 62
4.7. Application of Knowledge Maps 69
4.8. Analysis and Discussion 70
Chapter 5 Conclusion and Future Work 72
5.1. Conclusion 72
5.2. Future Work 73
Bibliography 75
dc.language.isoen
dc.subject搜尋引擎zh_TW
dc.subject階層分群zh_TW
dc.subject資訊擷取zh_TW
dc.subject文件倉儲zh_TW
dc.subject概念階層zh_TW
dc.subjectDocument Warehouseen
dc.subjectSearch Engineen
dc.subjectConcept Hierarchyen
dc.subjectInformation Retrievalen
dc.subjectHierarchical Clusteringen
dc.title以動態階層分群技術為基礎建立虛擬文件倉儲系統zh_TW
dc.titleDeveloping a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniquesen
dc.typeThesis
dc.date.schoolyear94-2
dc.description.degree碩士
dc.contributor.coadvisor吳玲玲(Ling-Ling Wu)
dc.contributor.oralexamcommittee蔡益坤(Yih-Kuen Tsay)
dc.subject.keyword資訊擷取,階層分群,文件倉儲,概念階層,搜尋引擎,zh_TW
dc.subject.keywordInformation Retrieval,Hierarchical Clustering,Document Warehouse,Concept Hierarchy,Search Engine,en
dc.relation.page78
dc.rights.note有償授權
dc.date.accepted2006-07-13
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-95-1.pdf
  未授權公開取用
5.27 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved