以動態階層分群技術為基礎建立虛擬文件倉儲系統

Wei-Lin Yang; 楊瑋琳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎(Seng-Cho Chou)
dc.contributor.author	Wei-Lin Yang	en
dc.contributor.author	楊瑋琳	zh_TW
dc.date.accessioned	2021-06-13T05:46:27Z	-
dc.date.available	2006-07-14
dc.date.copyright	2006-07-14
dc.date.issued	2006
dc.date.submitted	2006-07-11
dc.identifier.citation	[1] A. Hotho, A. Maedche, and S. Staab, Ontology-based text clustering, In Proceedings of the IJCAI-2001 Workshop, Text Learning: Beyond Supervision, August, Seattle, USA, 2001. [2] ACM Digital Library, http://portal.acm.org/dl.cfm . [3] B. Fung, K. Wang, and M. Ester. Large hierarchical document clustering using frequent itemsets. In SDM03. [4] Chekuri C., Goldwasser M., Raghavan P., and Upfal E. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7–11), 1997. [5] Chen, H. and Dumais, S. Bringing order to the web: Automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00),2000, pp.145-152. [6] Chuang, S.-L. and Chien, L.-F., A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. CIKM-04, 2004. [7] DMOZ Open Directory Project, http://dmoz.org/ . [8] Dublin Core Metadata Initiative, http://dublincore.org/ . [9] Dumais, S. T., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, Nov. 1998. [10] Extensible Markup Language (XML) in W3C, http://www.w3.org/XML/ . [11] F. Giannotti, M. Nanni, and D. Pedreschi. Webcat: Automatic categorization of web search results. In SEB D03. [12] Ferragina, Paolo and Antonio Gulli. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proceedings of WWW 2005, 10-14 May 2005, Chiba, Japan, pp. 801-810. [13] F.S.C. Tseng and A.Y.H. Chou, The concept of document warehousing and its applications on managing enterprise business intelligence. In: C.P. Wei (ed.), Proceedings of the 8th Pacific Asia Conference on Information Systems: PACIS 2004, July 8-11, 2004, Shanghai, China (AIS, Shanghai, 2004) pp. 563-574. [14] Google Desktop Search Engine Kit's API, http://www.google.com.tw/apis/index.html . [15] Google Desktop Search, http://desktop.google.com/index.html. [16] Google Software Development Kit, http://desktop.google.com/developer.html . [17] H. Zeng, Q. He, Z. Chen, and W. Ma. Learning to cluster web search results. In SIGIR04. [18] LookSmart, http://search.looksmart.com/ . [19] M. R. Genesereth, A. M. Keller and O. Duschka, Infomaster: An Information Integration System, In proceedings of 1997 ACM SIGMOD Conference, May 1997. [20] M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000. [21] Martin Porter. An Algorithm for Suffix Stripping. In Program, vol.14, no.3, 1980. [22] O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. In WWW8, 1999. [23] Paolo Ferragina, Antonio Gullì. The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets, Lecture Notes in Computer Science, Volume 3202, Jan 2004, pp. 506 – 508. [24] Platt, J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods -Support Vector Learning. B. Sch61kopf, C. Burges, and A. Smola, eds., MIT Press, (1999). [25] R.M. Bruckner, T. Wang Ling, O. Mangisengi and A.M. Tjoa, A framework for a multidimensional OLAP model using topic maps. In C. Claramunt et al. (eds), Second International Conference on Web Information Systems Engineering (WISE'01), Vol. 2, Dec. 2001 (IEEE Computer Society, Kyoto, 2001), pp.109-118. [26] SnakeT, http://snaket.di.unipi.it/ . [27] Thian-Huat Ong et al., Newsmap: A Knowledge Map for Online News, Decision Support Systems 39 (2005) pp.583-587. [28] Thomas H. Davenport & Laurence Prusak, Working Knowledge - How Organizations Manage What They Know, Harvard Business School Press, 1998, pp. 187 [29] V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [30] Vivisimo, http://www.vivisimo.com/ . [31] Wenxian Wang, Weiyi Meng, Clement Yu. Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment, First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1, 2000, pp. 0283 [32] What Are Topic Maps, http://www.xml.com/pub/a/2002/09/11/topicmaps.html/ . [33] Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi. Retriever: Improving web search engine results using clustering. In Managing Business with Electronic Commerce 02.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803	-
dc.description.abstract	以關鍵字為搜尋基礎的搜尋引擎是主要用來作為從大量資料中擷取相關文件的方法之一，回傳的搜尋結果(Snippet)並未加以組織，且僅以關鍵字為單一維度的選擇條件，無法提供多面向的瀏覽。在資訊擷取(Information Retrieval)領域中，分類(Classification)與分群(Clustering)是用來自動給定文件集合不同語意目錄類別的兩個方法。前者必須先訓練部分文件集合，形成分類模式以進行自動化分類。而後者則以統計方法計算文件之間的相似度，達到自動分群的目的。由於搜尋結果具有動態特性，且預先定義的目錄類別不具彈性，因此以分群技術作為本研究的工具之一。本研究提出以具多維度瀏覽功能的虛擬文件倉儲系統作為提供多面向瀏覽搜尋結果的方法。結合現有搜尋引擎，以HAC+P階層分群演算法形成語意上的階層結構，即形成以語意為基礎的概念階層，透過不斷的搜尋與分群，可形成屬於個人的概念化知識地圖，藉此改善使用者的瀏覽經驗，更有效地找到相關的主題及文件內容。	zh_TW
dc.description.abstract	Searching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering. In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T05:46:27Z (GMT). No. of bitstreams: 1 ntu-95-R93725037-1.pdf: 5393869 bytes, checksum: bda1c47d1dbad8ce78c068a5363c582b (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1. Motivation 1 1.2. Objective 3 1.3. Organization 4 Chapter 2 Literature Review 5 2.1. Text Classification Techniques 5 2.2. Document Clustering Techniques 6 2.2.1. Partitioning Clustering 6 2.2.2. Hierarchical Clustering 8 2.2.3. Agglomerative Hierarchical Clustering 9 2.3. Document Warehouses 11 2.3.1. Data Warehouses and Document Warehouses 11 2.3.2. Virtual Data Warehouses 14 2.3.3. Concept Hierarchy 15 2.3.4. Dimensions 16 2.3.5. Data Cube 18 2.3.6. On-Line Analytical Processing Operations 18 2.4. Knowledge Maps 19 Chapter 3 System Design 22 3.1. System Architecture 22 3.2. System Components 22 3.2.1. Heterogeneous document sources 23 3.2.2. Clustering-based Search Engine 23 3.2.3. Warehouse Administrator 26 3.2.4. Multi-dimensional Browser Engine 28 3.3. System Flow 29 3.3.1. Extracting, Transforming, and Loading (ETL) Function 29 3.3.2. Virtual Document Warehousing and Cube Function 30 3.3.3. Multi-dimensional Analysis Function 31 Chapter 4 System Implementation and Experiment Analysis 33 4.1. Scenario 33 4.2. Development Tools 34 4.3. Hierarchical Clustering Experiment and Analysis 35 4.3.1. Hierarchical clustering experiment 35 4.3.2. Discussion and analysis 44 4.4. Virtual Document Warehouse Implementation 47 4.4.1. Data Source Format 47 4.4.2. Virtual Document Warehouse Design 48 4.4.3. Concept Hierarchy Design 56 4.4.4. Documents Loading 57 4.4.5. Dimensions and Cubes 58 4.5. Clustering-based Search engine 61 4.6. Multi-dimensional Browser Engine 62 4.7. Application of Knowledge Maps 69 4.8. Analysis and Discussion 70 Chapter 5 Conclusion and Future Work 72 5.1. Conclusion 72 5.2. Future Work 73 Bibliography 75
dc.language.iso	en
dc.subject	搜尋引擎	zh_TW
dc.subject	階層分群	zh_TW
dc.subject	資訊擷取	zh_TW
dc.subject	文件倉儲	zh_TW
dc.subject	概念階層	zh_TW
dc.subject	Document Warehouse	en
dc.subject	Search Engine	en
dc.subject	Concept Hierarchy	en
dc.subject	Information Retrieval	en
dc.subject	Hierarchical Clustering	en
dc.title	以動態階層分群技術為基礎建立虛擬文件倉儲系統	zh_TW
dc.title	Developing a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniques	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.coadvisor	吳玲玲(Ling-Ling Wu)
dc.contributor.oralexamcommittee	蔡益坤(Yih-Kuen Tsay)
dc.subject.keyword	資訊擷取,階層分群,文件倉儲,概念階層,搜尋引擎,	zh_TW
dc.subject.keyword	Information Retrieval,Hierarchical Clustering,Document Warehouse,Concept Hierarchy,Search Engine,	en
dc.relation.page	78
dc.rights.note	有償授權
dc.date.accepted	2006-07-13
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	5.27 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。