請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25706
標題: | 生醫文獻自動化分群系統與評估 Automatic Biomedical Literature Clustering System and Evaluations |
作者: | You-Sheng Li 李祐陞 |
指導教授: | 翁昭旼(Jau-Min Wong),蔣以仁(I-Jen Chiang) |
關鍵字: | 群聚分析,社群,文字探勘, Cluster Analysis,Community,Text mining, |
出版年 : | 2006 |
學位: | 碩士 |
摘要: | 異質性資料在文件上的共現問題導致了複雜的結構,如何解釋它們之間的關聯一直以來是很多研究者想解決的問題。尤其現今電腦網際網路(Internet)時代來臨,大部份的人皆被網路便利性、快速性等性質深深吸引著,人們漸漸以網際網路作為尋找資料、分享資料的主要管道,使得文字電子資訊量大增,在文獻、網頁、新聞或企業文件量上皆成指數成長,因此如何有效管理這些大量文件變成一個重要議題。
本論文主要目的是發展一套生醫文獻自動化分群系統,希望能從這些散亂的文獻中自動化將類似領域主題知識聚集在一起。藉此幫助使用者在面對龐大的醫學文獻時能有效、快速瞭解其知識結構內容。在這篇論文中我們以關聯法則實作Clique Percolation Method Simplex概念,最後與Literature Clustering Search在Reuters- 21578與OHSUMED兩個文件分類測試集(Benchmark)上評估其Precision、Recall、Normalized mutual Information、Pairwise Testing之間的差異。 The co-occurrence of items in data always induces a complex structure. Many researchers try to discover them. However, heterogeneity lets the data hard to analysis. Especially associated with the arrival of the Internet era, most of the people become deeply attract to the convenience and effectiveness of Internet, therefore, try to find a way to explain its model. As Internet has gradually become a major access for people to search for information and share it with others, which brings about the large increase in electronic texts—the growth in the number of literature, web pages, news reports, and business documents is exponential. Therefore, how to effectively arrange this large amount of texts has become a crucial issue. This essay aims to develop a set of automatic biomedical literature clustering system and compare them. Hopefully, it will be able to automatically arrange these disorderly texts into an organized knowledge database, in the meantime categorizing them according to different themes and fields. We hope this system will be of help to its users to effectively grasp the structure and content of the knowledge they seek for when they encounter such great deal of medical literature. In this thesis, we apply the association rule to the clique percolation method and the concept of simplex. Then, for the literature clustering search, we will adopt two text categorization and collection benchmarks—Reuters-21578 and OHSUMED, discerning the differences of the precision, recall, normalized mutual information, and pairwise testing of the two. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/25706 |
全文授權: | 未授權 |
顯示於系所單位: | 醫學工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-95-1.pdf 目前未授權公開取用 | 1.7 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。