Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 醫學工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36165
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor翁昭旼(Jau-Min Wong)
dc.contributor.authorHsiang-Chun Tsaien
dc.contributor.author蔡香君zh_TW
dc.date.accessioned2021-06-13T07:52:48Z-
dc.date.available2005-07-27
dc.date.copyright2005-07-27
dc.date.issued2005
dc.date.submitted2005-07-25
dc.identifier.citation[1] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107-117, 1998.
[2] Entrez PubMed, http://www.ncbi.nlm.nih.gov/entrez/
[3] U. M. Fayyad and E. Simoudis. Data mining and knowledge discovery. In Proceedings of 1st International Conf. Prac. App. KDD & Data Mining, 3-16, 1997.
[4] G. Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, 3, 1289-1305, 2003.
[5] L. Ertoz, M. Steinbach, and V. Kumar. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. SIAM International Conference on Data Mining, San Francisco, CA, 2003.
[6] W. Pratt, et al. A Knowledge-Based Approach to Organizing Retrieved Documents. In Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, 80-85, 1999
[7] Y. F. B. Wu and X. Chen. Extracting Features from Web Search Returned Hits for Hierarchical Classification. In Proceedings of the 2003 International Conference on Information and Knowledge Engineering (IKE'03), 103-108, 2003.
[8] S. Sakurai and A. Suyama. Rule Discovery from Textual Data based on Key Phrase Patterns. ACM Symposium on Applied Computing, 606-612, 2004.
[9] G. Dias, S. Guilloré and J. G. P. Lopes. Extracting Textual Associations from Part-Of-Speech Tagged Corpora. European Association for Machine Translation Workshop on Harvesting Existing Resources, Ljubljana, Slovenia, 2000.
[10] Y. S. Maarek, R. Fagin, I. Z. Ben-Shaul, and D. Pelleg. Ephemeral document clustering for web applications. Technical Report RJ 10186, IBM Research, 2000.
[11] R. Al-Kamha and D. W. Embley. Grouping Search-Engine Returned Citations for Person-Name Queries. In Proceedings of the 6th annual ACM international workshop on Web information and data management, 96-103, 2004.
[12] I-J. Chiang, T.Y. Lin, and J.Y.-J. Hsu. Generating Hypergraph of Term Associations for Automatic Document Concept Clustering. Artificial Intelligence and Soft Computing, Marbella, Spain, 2004.
[13] C. Zhang and S. Zhang. Association Rule Mining. Springer-Verlagz, Berlin Heidelberg, 2002.
[14] R. Agrawal, T. Imielinski and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Database, 207-216, 1993.
[15] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In Proceedings ACM SIGMOD International Conference on Management of Data, 265-276, 1997.
[16] O. R. Zaïane and M. L. Antonie. Classifying Text Documents by Associating Terms with Text. In Proceedings of the thirteenth Australasian conference on Database technologies, 5, 215-222, 2002.
[17] P. Berkhin. Survey of Clustering Data Mining Techniques. Technical report, Accrue Software, San Jose, California, 2002.
[18] A. K. Jain, M. N. Murty, and P. J. Flynn. Data Clustering: A Review. ACM Computing Surveys, 31, 264-323, 1999.
[19] A. K. Jain and R. C. Dubes. Algorithm for Clustering Data. Prentice-Hall, Englewood Cliffs, NJ, 1988.
[20] C. Ordonez. Clustering Binary Data Streams with K-means. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 12-19, 2003
[21] J. C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell, MA, 1981.
[22] D. D. Lewis. http://www.research.att.com/~lewis
[23] H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma and J. Ma. Learning to Cluster Web Search Results. In Proceedings of the 27th annual international conference on Research and development in information retrieval, 210-217, 2004.
[24] K. Kummamuru, et al. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of the 13th international conference on World Wide Web, 658-665, 2004.
[25] F. SEBASTIANI. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34, 1-47, 2002.
[26] P. Ferragina and A. Gullì. The Anatomy of a Hierarchical Clustering Engine for Web-page, News and Book Snippets. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), 395-398, 2004.
[27] O. Zamir and O. Etzioni. Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks, 31, 1361-1743, 1999.
[28] W. Xu and Y. H. Gong. Document Clustering by Concept Factorization. In Proceedings of the 27th annual international conference on Research and development in information retrieval, 202-209, 2004.
[29] T. Li, S. Ma and M. Ogihara. Document Clustering via Adaptive Subspace Iteration. In Proceedings of the 27th annual international conference on Research and development in information retrieval, 218-225, 2004.
[30] S. Siersdorfer and S. Sizov. Restrictive Clustering and Metaclustering for Self-Organizing Document Collections. In Proceedings of the 27th annual international conference on Research and development in information retrieval, 226-233, 2004.
[31] Vivísimo, http://www.vivisimo.com
[32] KartOO, http://www.kartoo.com
[33] Mooter, http://www.mooter.com
[34] O. Mason. QTAG. http://www.english.bham.ac.uk/staff/omason/index.html
[35] Z. H. Deng et al. A Comparative Study on Feature Weight in Text Categorization. In Proceedings of The Sixth Asia Pacific Web Conference (APWEB 2004), Hangzhou, China, 2004, published by Springer-Verlag as Lecture Note Series in Computer Science (LNCS 3007), 588-597.
[36] TouchGraph. http://touchgraph.sourceforge.net
[37] Y. T. Chang. Biology Knowledge Representation. M.S. Thesis, Institute of Biomedical Engineering, National Taiwan University.
http://bioinfo.bme.ntu.edu.tw/ontomarker/
[38] B. C. M. Fung, K. Wang, and M. Ester. Hierarchical Document Clustering Using Frequent Itemsets. In Proceedings of the 2003 SIAM International Conference on Data Mining (SDM'03), 59-70, 2003.
[39] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36165-
dc.description.abstract隨著資訊時代的來臨,數位化文獻的資料量也以急劇速度不斷地成長。如何能在大量的數位化資料中迅速尋找出高關聯度的資料、淬取出相關的知識,無疑是一個急迫解決的重要議題。我們在這篇論文中提出一個叢集化的方法(Literature Clustering Search, LCS)。利用這個方法可以將大量的資料分類成階層式叢集,並進一步幫助使用者在短時間內能對大量的資料進行初步的了解以及具有初步的概念。我們的方法共有四個步驟。首先,Metadata Retrieval可以將資料做格式化的動作。第二步,進行Feature Selection的程序,在這個步驟中我們只留下具有文章代表性的單字或單詞做為Feature。第三步,再利用Association Rule Mining的程序計算出所有Feature之間的關係。最後,我們依據這些關係形成一個階層式叢集。由於Association Rules代表著一群共同出現的字詞,我們可以藉由這一群共同出現的字詞輕易地了解群組中所代表的涵意。除此之外,我們同時建立了一個線上文獻叢集化搜尋服務,以展示我們的方法與成果。zh_TW
dc.description.abstractIn the past two decades it has been seen a dramatic increase in the amount of information or data being stored in electronic format. Retrieving relevant information from large data set becomes important issue. We propose a clustering method which generates hierarchical clusters and helps us to have overall picture of the concepts through the massive information in a short time. We call it Literature Clustering Search (LCS). There are four steps to accomplish the task. First, metadata retrieval will help normalizing the data format. Second, feature selection extracts words/phrases which could represent the document. Third, association rule mining generates relations between features. Finally, group the documents that share the same association rules. Since association rules represent a set of terms that co-occur frequently, we could easily obtain the concept of the cluster based on the association rules of the cluster. In addition, we build an online clustering web service to demonstrate the methodology of literature clustering search.en
dc.description.provenanceMade available in DSpace on 2021-06-13T07:52:48Z (GMT). No. of bitstreams: 1
ntu-94-R92548053-1.pdf: 2997641 bytes, checksum: f629b95ce1ecaf4eef06856a5c7c55ce (MD5)
Previous issue date: 2005
en
dc.description.tableofcontents中文摘要 ii
ABSTRACT iii
ACKNOWLEDGEMENTS iv
TABLE OF CONTENTS v
List of Figures viii
List of Tables x
Chapter 1 INTRODUCTION 1
1.1 Motivation 1
1.2 Purpose 1
1.3 Our Approach 2
1.4 Outline 2
Chapter 2 RELATED WORKS 3
2.1 Feature Selection 4
2.2 Association Rule Mining 5
2.3 Clustering 7
2.3.1 Components of a Clustering Task 7
2.3.2 Well-known Clustering Algorithms 8
2.3.3 Previous approaches to Document Clustering 9
2.4 A Brief Introduction of Clustering Search Engines 10
Chapter 3 MATERIALS 13
3.1 PubMed 13
3.2 Google™ Search Engine 14
3.3 Reuters-21578, Distribution 1.0 14
Chapter 4 METHODS 15
4.1 Metadata Retrieval 15
4.2 Feature Extraction 19
4.2.1 The Framework of Feature Extraction 19
4.2.2 Part-Of-Speech Tagging 20
4.2.3 Definition of Phrase Patterns 20
4.2.4 Feature Selection 21
4.3 Association Rules 23
4.3.1 Support 23
4.3.2 Confidence 23
4.4 Clustering by Association Rules 24
Chapter 5 CLUSTERING WEBSITE – DESIGN AND EVALUATION 28
5.1 Introduction 28
5.2 General architecture 28
5.3 The Client 29
5.3.1 The client environment 29
5.3.2 A look at the user interface 29
5.4 The Server 32
5.4.1 The Server Environment 32
5.4.2 Design Objectives 33
5.4.3 The Clustering Web Server Framework 33
5.5 Evaluation of Clustering Website 36
Chapter 6 EXPERIMENTS AND DISCUSSION 37
6.1 PubMed 37
6.1.1 Experimental Design 37
6.1.2 Experimental Results 37
6.1.3 Improve Accuracy of Results 40
6.2 Google Search Engine 42
6.2.1 Experimental design 42
6.2.2 Experimental Results 42
6.2.3 Limitations 47
6.3 Reuters-21578 48
6.3.1 Data Corpora 48
6.3.2 Evaluation Metrics 48
6.3.3 Effect of feature selection 51
6.3.4 Experimental Design 51
6.3.5 Experimental Results 52
6.4 Discussion 56
Chapter 7 CONCLUSIONS 57
7.1 Contributions 57
7.2 Limitations 57
7.3 Future Works 58
BIBLIOGRAPHY 59
dc.language.isoen
dc.subject叢集化zh_TW
dc.subject關聯法則zh_TW
dc.subject資料探勘zh_TW
dc.subjectText Miningen
dc.subjectDocument Clusteringen
dc.subjectAssociation Ruleen
dc.title網頁文獻叢集化搜尋zh_TW
dc.titleWeb-base Literature Clustering Searchen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.coadvisor蔣以仁(I-Jen Chiang)
dc.contributor.oralexamcommittee陳中明
dc.subject.keyword叢集化,關聯法則,資料探勘,zh_TW
dc.subject.keywordDocument Clustering,Association Rule,Text Mining,en
dc.relation.page62
dc.rights.note有償授權
dc.date.accepted2005-07-25
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept醫學工程學研究所zh_TW
顯示於系所單位:醫學工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
2.93 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved