請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 曹承礎(Seng-Cho Chou) | |
| dc.contributor.author | Wei-Lin Yang | en |
| dc.contributor.author | 楊瑋琳 | zh_TW |
| dc.date.accessioned | 2021-06-13T05:46:27Z | - |
| dc.date.available | 2006-07-14 | |
| dc.date.copyright | 2006-07-14 | |
| dc.date.issued | 2006 | |
| dc.date.submitted | 2006-07-11 | |
| dc.identifier.citation | [1] A. Hotho, A. Maedche, and S. Staab, Ontology-based text clustering, In Proceedings of the IJCAI-2001 Workshop, Text Learning: Beyond Supervision, August, Seattle, USA, 2001.
[2] ACM Digital Library, http://portal.acm.org/dl.cfm . [3] B. Fung, K. Wang, and M. Ester. Large hierarchical document clustering using frequent itemsets. In SDM03. [4] Chekuri C., Goldwasser M., Raghavan P., and Upfal E. Web search using automated classification. In Proceedings of the 6th International World Wide Web Conference (Santa Clara, Calif., Apr. 7–11), 1997. [5] Chen, H. and Dumais, S. Bringing order to the web: Automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00),2000, pp.145-152. [6] Chuang, S.-L. and Chien, L.-F., A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. CIKM-04, 2004. [7] DMOZ Open Directory Project, http://dmoz.org/ . [8] Dublin Core Metadata Initiative, http://dublincore.org/ . [9] Dumais, S. T., Platt, J., Heckerman, D. and Sahami, M. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, Nov. 1998. [10] Extensible Markup Language (XML) in W3C, http://www.w3.org/XML/ . [11] F. Giannotti, M. Nanni, and D. Pedreschi. Webcat: Automatic categorization of web search results. In SEB D03. [12] Ferragina, Paolo and Antonio Gulli. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proceedings of WWW 2005, 10-14 May 2005, Chiba, Japan, pp. 801-810. [13] F.S.C. Tseng and A.Y.H. Chou, The concept of document warehousing and its applications on managing enterprise business intelligence. In: C.P. Wei (ed.), Proceedings of the 8th Pacific Asia Conference on Information Systems: PACIS 2004, July 8-11, 2004, Shanghai, China (AIS, Shanghai, 2004) pp. 563-574. [14] Google Desktop Search Engine Kit's API, http://www.google.com.tw/apis/index.html . [15] Google Desktop Search, http://desktop.google.com/index.html. [16] Google Software Development Kit, http://desktop.google.com/developer.html . [17] H. Zeng, Q. He, Z. Chen, and W. Ma. Learning to cluster web search results. In SIGIR04. [18] LookSmart, http://search.looksmart.com/ . [19] M. R. Genesereth, A. M. Keller and O. Duschka, Infomaster: An Information Integration System, In proceedings of 1997 ACM SIGMOD Conference, May 1997. [20] M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Text Mining, 2000. [21] Martin Porter. An Algorithm for Suffix Stripping. In Program, vol.14, no.3, 1980. [22] O. Zamir and O. Etzioni. Grouper: a dynamic clustering interface to Web search results. In WWW8, 1999. [23] Paolo Ferragina, Antonio Gullì. The Anatomy of SnakeT: A Hierarchical Clustering Engine for Web-Page Snippets, Lecture Notes in Computer Science, Volume 3202, Jan 2004, pp. 506 – 508. [24] Platt, J. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods -Support Vector Learning. B. Sch61kopf, C. Burges, and A. Smola, eds., MIT Press, (1999). [25] R.M. Bruckner, T. Wang Ling, O. Mangisengi and A.M. Tjoa, A framework for a multidimensional OLAP model using topic maps. In C. Claramunt et al. (eds), Second International Conference on Web Information Systems Engineering (WISE'01), Vol. 2, Dec. 2001 (IEEE Computer Society, Kyoto, 2001), pp.109-118. [26] SnakeT, http://snaket.di.unipi.it/ . [27] Thian-Huat Ong et al., Newsmap: A Knowledge Map for Online News, Decision Support Systems 39 (2005) pp.583-587. [28] Thomas H. Davenport & Laurence Prusak, Working Knowledge - How Organizations Manage What They Know, Harvard Business School Press, 1998, pp. 187 [29] V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [30] Vivisimo, http://www.vivisimo.com/ . [31] Wenxian Wang, Weiyi Meng, Clement Yu. Concept Hierarchy Based Text Database Categorization in a Metasearch Engine Environment, First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1, 2000, pp. 0283 [32] What Are Topic Maps, http://www.xml.com/pub/a/2002/09/11/topicmaps.html/ . [33] Z. Jiang, A. Joshi, R. Krishnapuram, and L. Yi. Retriever: Improving web search engine results using clustering. In Managing Business with Electronic Commerce 02. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/33803 | - |
| dc.description.abstract | 以關鍵字為搜尋基礎的搜尋引擎是主要用來作為從大量資料中擷取相關文件的方法之一,回傳的搜尋結果(Snippet)並未加以組織,且僅以關鍵字為單一維度的選擇條件,無法提供多面向的瀏覽。在資訊擷取(Information Retrieval)領域中,分類(Classification)與分群(Clustering)是用來自動給定文件集合不同語意目錄類別的兩個方法。前者必須先訓練部分文件集合,形成分類模式以進行自動化分類。而後者則以統計方法計算文件之間的相似度,達到自動分群的目的。由於搜尋結果具有動態特性,且預先定義的目錄類別不具彈性,因此以分群技術作為本研究的工具之一。
本研究提出以具多維度瀏覽功能的虛擬文件倉儲系統作為提供多面向瀏覽搜尋結果的方法。結合現有搜尋引擎,以HAC+P階層分群演算法形成語意上的階層結構,即形成以語意為基礎的概念階層,透過不斷的搜尋與分群,可形成屬於個人的概念化知識地圖,藉此改善使用者的瀏覽經驗,更有效地找到相關的主題及文件內容。 | zh_TW |
| dc.description.abstract | Searching for information based on the keyword-based retrieval by using search engines has limited ability to mine the most important and relevant knowledge. The retrieved search results are disorganized results and lack of dimensions. In the information retrieval (IR) field, text categorization has been investigated for many years to organize search results automatically into corresponding categories, which contains classification and clustering.
In this thesis, we propose and describe the Virtual Document Warehouse System, which contains an integrated interface for multi-dimensional analysis for knowledge management and decision-making. The system extracts relevant documents by using search engines and we utilize clustering algorithms to dynamically and automatically organize information retrieved from heterogeneous sources into hierarchical structures, and to combine different concept hierarchies. Finally, we propose an approach that makes searching more convenient and multi-dimensional, and present the application of personalized conceptual knowledge maps. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T05:46:27Z (GMT). No. of bitstreams: 1 ntu-95-R93725037-1.pdf: 5393869 bytes, checksum: bda1c47d1dbad8ce78c068a5363c582b (MD5) Previous issue date: 2006 | en |
| dc.description.tableofcontents | Chapter 1 Introduction 1
1.1. Motivation 1 1.2. Objective 3 1.3. Organization 4 Chapter 2 Literature Review 5 2.1. Text Classification Techniques 5 2.2. Document Clustering Techniques 6 2.2.1. Partitioning Clustering 6 2.2.2. Hierarchical Clustering 8 2.2.3. Agglomerative Hierarchical Clustering 9 2.3. Document Warehouses 11 2.3.1. Data Warehouses and Document Warehouses 11 2.3.2. Virtual Data Warehouses 14 2.3.3. Concept Hierarchy 15 2.3.4. Dimensions 16 2.3.5. Data Cube 18 2.3.6. On-Line Analytical Processing Operations 18 2.4. Knowledge Maps 19 Chapter 3 System Design 22 3.1. System Architecture 22 3.2. System Components 22 3.2.1. Heterogeneous document sources 23 3.2.2. Clustering-based Search Engine 23 3.2.3. Warehouse Administrator 26 3.2.4. Multi-dimensional Browser Engine 28 3.3. System Flow 29 3.3.1. Extracting, Transforming, and Loading (ETL) Function 29 3.3.2. Virtual Document Warehousing and Cube Function 30 3.3.3. Multi-dimensional Analysis Function 31 Chapter 4 System Implementation and Experiment Analysis 33 4.1. Scenario 33 4.2. Development Tools 34 4.3. Hierarchical Clustering Experiment and Analysis 35 4.3.1. Hierarchical clustering experiment 35 4.3.2. Discussion and analysis 44 4.4. Virtual Document Warehouse Implementation 47 4.4.1. Data Source Format 47 4.4.2. Virtual Document Warehouse Design 48 4.4.3. Concept Hierarchy Design 56 4.4.4. Documents Loading 57 4.4.5. Dimensions and Cubes 58 4.5. Clustering-based Search engine 61 4.6. Multi-dimensional Browser Engine 62 4.7. Application of Knowledge Maps 69 4.8. Analysis and Discussion 70 Chapter 5 Conclusion and Future Work 72 5.1. Conclusion 72 5.2. Future Work 73 Bibliography 75 | |
| dc.language.iso | en | |
| dc.subject | 搜尋引擎 | zh_TW |
| dc.subject | 階層分群 | zh_TW |
| dc.subject | 資訊擷取 | zh_TW |
| dc.subject | 文件倉儲 | zh_TW |
| dc.subject | 概念階層 | zh_TW |
| dc.subject | Document Warehouse | en |
| dc.subject | Search Engine | en |
| dc.subject | Concept Hierarchy | en |
| dc.subject | Information Retrieval | en |
| dc.subject | Hierarchical Clustering | en |
| dc.title | 以動態階層分群技術為基礎建立虛擬文件倉儲系統 | zh_TW |
| dc.title | Developing a Virtual Document Warehouse with Dynamic Hierarchical Clustering Techniques | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 94-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 吳玲玲(Ling-Ling Wu) | |
| dc.contributor.oralexamcommittee | 蔡益坤(Yih-Kuen Tsay) | |
| dc.subject.keyword | 資訊擷取,階層分群,文件倉儲,概念階層,搜尋引擎, | zh_TW |
| dc.subject.keyword | Information Retrieval,Hierarchical Clustering,Document Warehouse,Concept Hierarchy,Search Engine, | en |
| dc.relation.page | 78 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2006-07-13 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-95-1.pdf 未授權公開取用 | 5.27 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
