生物醫學文獻知識視覺化呈現

Yung-ta Chang; 張永達

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24465

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	翁昭旼(Jan-Min Wong)
dc.contributor.author	Yung-ta Chang	en
dc.contributor.author	張永達	zh_TW
dc.date.accessioned	2021-06-08T05:27:03Z	-
dc.date.copyright	2005-07-20
dc.date.issued	2005
dc.date.submitted	2005-07-16
dc.identifier.citation	REF 1-1: Human Genome Project 人類基因計劃 http://www.genome.gov/ REF 1-2: Entrez PubMed http://www.ncbi.nlm.nih.gov/entrez/query.fcgi REF 1-3: GenBank http://www.ncbi.nlm.nih.gov/Genbank/ REF 1-4: The Gene Ontology http://www.geneontology.org/ REF 2-1: 王昕，《綜述：本體的概念、方法和應用》，2002。 http://www.prdm.net/papers/knowledge/Ontology_20overview.htm REF 2-2: 阮明淑、溫達茂，「ontology應用於知識組織之初探」佛教圖書館館訊第三十二期 91年12月 http://www.gaya.org.tw/journal/m32/32-main1.htm REF 2-3: Gruber, T. 'Ontolingua:A translation approach to portable ontology specifications'. Knowledge Acquisition 5(2), 1993, pp. 199-200. REF 2-4: Saccharomyces Genome Database http://www.yeastgenome.org/ REF 2-5: Mouse Genome Informatics http://www.informatics.jax.org/ REF 2-6: FlyBase http://flybase.bio.indiana.edu/ REF 2-7: M.E. Maron. 'AutoMatic Indexing: An Experimental Inquiry', Journal of the ACM, vol.10, no.1, 1961, pp.404-417. REF 2-8: 曾元顯，1997，「關鍵詞自動擷取技術之探討」，中國圖書館學會會訊，第一零六期，第 26-29 頁。 http://blue.lins.fju.edu.tw/~tseng/papers/keyword.htm REF 2-9: 曾元顯，1997，「關鍵詞自動擷取技術與相關詞回饋」，中國圖書館學會會報，第五十九卷，第59-64 頁 REF 2-10: Ricardo, B.-Y. and Berthier, R.-N., 1999, 'Modern information retrieval', New York,Addison-Wesley. REF 2-11: Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C., 2004. 'Recognizing names in biomedical texts: a machine learning approach', Bioinformatics 20, 1178–1190. REF 2-12: Wikipedia, the free encyclopedia http://www.wikipedia.org/ http://en.wikipedia.org/wiki/Natural_Language_Processing REF 2-13: Karp, P.D., Riley, M., Paley, S.M., Pellegrini-Toole, A. & Krummenacker, M. 'EcoCyc: Encyclopedia of Escherichia coli genes and metabolism'. Nucleic Acids Res. 25, 43–51 (1997). REF 2-14: Andrade M.A. Ouzounis-C. Blaschke, C. and A. Valencia. 'Automatic extraction of biological information from scientific text: Protein-protein interactions'. In Proceedings of the seventh International Conference on Intelligent Systems for Molecular Biology, pages 60–67, eidelberg, Germany, 1999. AAAI Press. REF 2-15: Hishigaki H. Tanigami-A. Ono, T. and T. Takagi. 'Automated extraction of information on protein-protein interactions from the biological literature'. Bioinformatics, 17(2):155–161, 2001. REF 2-16: S. K. Ng and M. Wong. 'Toward routine automatic pathway discovery from on-line scientific text abstracts'. In The Tenth Workshop on Genome Informatics, volume 10, pages 104–112, 1999. REF 2-17: Rechenmann F. Proux, D. and L. Julliard. 'Detecting gene symbols and names in biological texts: A first step toward pertinent information extraction'. In The Ninth Workshop on Genome Informatics, pages72–80, 1998. REF 2-18: Kim H. S. Park, J. C.and J. J. Kim. 'Bidirectional incremental parsing for automatic pathway identification with combinatory categorial grammar'. In Proceedings of the Pacific Symposium on Biocomputing, volume 6, pages396–407, 2001. REF 2-19: Friedman, C., P. Kra, M. Krauthammer, H. Yu, and A. Rzhetsky (2001). 'GENIES: A natural-language processing system for the extraction of molecular pathways from journal articles'. In Proc. 9th ISMB,Copenhagen, Denmark. REF 2-20: Milward-D. Ouzounis C. Pulman S. Thomas, J. and M. Carroll. 'Automated extraction of protein interactions from scientific abstracts'. In Proceedings of the Pacific Symposium on Biocomputing, volume 5, pages538–549, January2000. REF 2-21: National Center for Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/ REF 2-22: National Institutes of Health (NIH) http://www.nih.gov/ REF 2-23: National Library of Medicine (NLM) http://www.nlm.nih.gov/ REF 2-24: AmiGO! Your friend in the Gene Ontology. http://www.godatabase.org/ REF 2-25: Mouse Genome Informatics (MGI) http://www.informatics.jax.org/searches/GO_form.shtml REF 2-26: QuickGO: GO Browser http://www.ebi.ac.uk/ego/ REF 2-27: European Bioinformatics Institute (EBI) http://www.ebi.ac.uk/ REF 2-28: InterPro http://www.ebi.ac.uk/interpro/ REF 2-29: Database for Annotation, Visualization and Integrated Discovery (DAVID) http://david.niaid.nih.gov/ REF 2-30: EASE: the Expression Analysis Systematic Explorer http://david.niaid.nih.gov/david/ease.htm REF 2-31: Gene Ontology Annotation http://www.ebi.ac.uk/GOA/ REF 2-32: Camon,E., Barrell,D., Brooksbank,C., Magrane,M., and Apweiler,R. (2003) 'The Gene Ontology Annotation (GOA) project-application of GO in SWISS-PROT, TrEMBL and InterPro'. Comp. Funct. Genomics, 4, 71-74. REF 2-33: Camon,E., Magrane,M., Barrell,D., Fleischmann,W., Kersey,P., Mulder,N., Oinn,T., Maslen,J., Cos,A. et al. (2003) 'The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL and InterPro'. Genome Res., 13, 622-672. REF 2-34: Camon,E., Magrane,M., Barrell,D., Lee,V., Dimmer,E., Maslen,J., Binns,D., Harte,N., Lopez,R. and Apweiler R. (2004) 'The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology'. Nucleic Acids Res., 32, D262–D265. REF 3-1: Java Technology http://java.sun.com/ REF 3-2: JavaServer Pages Technology http://java.sun.com/products/jsp/ REF 3-3: Apache Tomcat http://jakarta.apache.org/tomcat/ REF 3-4: The Apache Struts Web Application Framework http://struts.apache.org/ REF 3-5: PostgreSQL: The world's most advanced open source database http://www.postgresql.org/ REF 3-6: Stanford NLP Group http://www-nlp.stanford.edu/javanlp/ REF 3-7: Inxight Software, Inc. http://www.inxight.com/ REF 3-8: Hyperbolic Tree Java Library http://sourceforge.net/projects/hypertree/ REF 3-9: Extensible Markup Language (XML) http://www.w3.org/XML/ REF 3-10: Z. J. Hwang, Discrete Mathematics, 3rd, 2001 REF 4-1: Ontology http://www.ontology.org/ REF 4-2: H.C. Tsai, Web-base Literature Clustering Search. http://ginni.bme.ntu.edu.tw/ REF 5-1: The Stanford Natural Language Processing Group http://www-.stanford.edu/javanlp/ REF 5-2: Klein, D., Manning, C.D. 'Fast exact inference with a factored model for natural language parsing'. In: Advances in Neural Information Processing Systems. Volume 15., MIT Press (2003) REF 5-3: D. Klein and C. D. Manning. 'Accurate unlexicalized parsing'. In Proceedings of the 41st Annual Meeting of the Associationfor Computational Linguistics, 2003 REF 5-4: Tufi﹐s, D., Mason O. (1998): 'Tagging Romanian Texts: a Case Study for QTAG, a Language Independent Probabilistic Tagger'. In Proceedings of First International Conference on Language Resources and Evaluation, Granada, Spain, 589–596 REF 5-5: Oliver Mason's Webpages http://www.english.bham.ac.uk/staff/omason/index.html REF 5-6: Dekang Lin. 1994. 'Principar--an efficient, broad-coverage, principle-based parser'. In Proceedings of COLING-94, pages 482-488. Kyoto, Japan. REF 5-7: Alias-i LingPipe http://www.alias-i.com/lingpipe/
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24465	-
dc.description.abstract	近年來由於科技資訊發達的影響下，生物醫學界每天都有大量資訊的電子文件刊登於各種不同的期刊上，然而面對大量且複雜的生物醫學文獻資料，要從資料中找出特定且對於研究人員有意義的資訊可說是非常重要但卻不容易，若能減少研究人員對於這些文獻資料處理所花費的時間，也就是間接的節省人力、物力以及時間，故如何從文件中擷取出相關知識勢必將成為問題的關鍵。　　那又該從啥麼樣的資料著手？又該怎樣的處理資料才有效率？處理完畢後又該如何加以呈現？對於生物醫學文獻處理與知識擷取，傳統的文件探勘技術已經不敷使用，本研究將導入「基因本體論 (Gene Ontology) 」的概念，對於在生物醫學文獻中關於基因功能性描述 (Functional Profile) 的知識加以擷取、處理與呈現。本研究主要分為幾個階段：首先是針對原有的 Gene Ontology 資訊加以處理，再來則是從生物醫學文獻中擷取出相關資訊進而加以呈現，並且設計一個圖形化使用者介面系統 - OntoMarker ，讓研究人員透過簡單的操作方式，即可萃取出存在於單篇文獻或某一主題中的重要資訊。最後並以在 PubMed / MEDLINE 中有關 CARD15/NOD2 基因的醫學文獻來驗証本系統，驗証結果確實可行，對於研究人員有所幫助。	zh_TW
dc.description.abstract	Under the influence of developed information technology in these years, biomedical circles have numerous e-documents published in various periodicals everyday; however, it is very importance but difficult to find out specific and meaningful information to researchers from numerous and complicated biomedical literatures. Therefore, if the processing time of these literatures by the researchers could be reduced, it is to save manpower, material and time indirectly; thus how to extract relevant knowledge from documents will certainly become the key of the problem. But what document shall begin with? How to process data effectively? How to present after process? For the process of biomedical literatures and extract of knowledge, the traditional document mining technology is no longer sufficient. Therefore, this research will introduce the concept of 'Gene Ontology' to extract, process, and present the knowledge of describing gene function in biomedical literatures. This research is divided to several stages: first is to process the original information of Gene Ontology, and then is to extract and present relevant information from biomedical literatures, and design a graphic user interface system – OntoMarker to enable researchers to extract the important information that existed in single literature or certain topic through easy operation. Finally is to verify the system by the medical literatures relevant to CARD15/NOD2 gene in PubMed/MEDLINE. The verified results are true and feasible, and helpful to researchers.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T05:27:03Z (GMT). No. of bitstreams: 1 ntu-94-R92548028-1.pdf: 15637738 bytes, checksum: 61acebbb4a79692ab18912fb8d4fc861 (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	目次論文摘要 i Abstract iii 致謝 v 目次 vi 表格 viii 圖表 ix 第 1 章緒論 1 1.1 研究背景 1 1.2 研究動機與目的 2 1.3 論文架構 4 第 2 章相關工作 5 2.1 Ontology 與 Gene Ontology 介紹 5 2.1.1 Ontology 5 2.1.2 Gene Ontology 6 2.2 文件探勘方法探討 7 2.2.1 關鍵詞擷取 7 2.2.2 資訊檢索 9 2.3 以醫學文獻為分析的資訊系統 9 2.3.1 醫學文獻來源 10 2.3.2 以醫學文獻為分析對象之相關研究 10 第 3 章研究方法 17 3.1 設計目標 17 3.2 系統架構 19 3.3 Gene Ontology 與資料前處理 19 3.3.1 GO Node 19 3.3.2 Data Converter 27 3.3.3 資料來源 28 3.3.4 Medical Document Processor 41 第 4 章視覺化資料呈現 49 4.1 GO Node Information 49 4.2 GO Hyperbolic Tree Viewer 50 4.3 GO Tree Viewer 50 4.4 Document Viewer 51 4.4.1 GO Table 52 4.4.2 BioName Table 53 4.4.3 GO Document Tree Viewer 54 4.4.4 GO Document Hyperbolic Tree 55 4.4.5 GO Document TouchGraph 55 4.5 Gene Document Batch Creator 56 4.6 Statistics 58 4.6.1 GO Node Information 58 4.6.2 Statistics - Gene Ontology Tree 59 4.6.3 Statistics - Hyperbolic Tree 60 4.7 OntoMarker Web-base 62 4.7.1 介紹 62 4.7.2 Demo 63 第 5 章研究結果與討論 69 5.1 研究結果 69 5.1.1 OntoMarker 69 5.1.2 Cluster Analysis 73 5.2 綜合討論 78 5.2.1 OntoMarker 中的自然語言處理 79 5.2.2 如果 Gene Ontology 中產生環路 81 第 6 章結論 83 6.1 未來研究方向 83 6.2 結論 84 附錄 85 Part-of-speech 85 參考文獻 87 表格表格 1: Gene Ontology 三大分支 7 表格 2: Gene Ontology 三大分支 9 表格 3: 開發環境 18 表格 4: Gene Ontology Data Format 28 表格 5: Gene Ontology to OntoMarker 30 表格 6: POS Example 34 表格 7: POS Identifier 35 表格 8: Stop POS 35 表格 9: Common Word 36 表格 10: Example - Common Word 38 表格 11: The Same Prefix 39 表格 12: Example - goinfo Table 40 表格 13: Example - gowordinfo Table 40 表格 14: Example - biotokeninfo Table 41 表格 15: Tokenizer Example 43 表格 16: Check Inverted Index 44 表格 17: Mapped Ratio Example 46 表格 18: Mapped Example 47 表格 19: GO Table Description 53 表格 20: BioName Table Description 54 表格 21: Statistics - GO Node Information 58 表格 22: Statistics - Gene Ontology Tree 60 表格 23: TP, FP, FN, TN 70 表格 24: Precision, Recall 71 表格 25: False Negative 71 表格 26: False Positive 73 表格 27: Online Medical Literature Clustering Mechanism from PubMed - NOD2 75 表格 28: Online Medical Literature Clustering Mechanism from PubMed - NOD2 (2) 77 表格 29: NOD2/CARD15 Result 78 表格 30: Stop POS 80 圖表圖表 1 3 圖表 2 3 圖表 3: AmiGO 11 圖表 4: MGI 12 圖表 5: QuickGO 13 圖表 6: DAVID 14 圖表 7: EASE 14 圖表 8: Gene Ontology Annotation 15 圖表 9: N-Tier 18 圖表 10: Architecture 19 圖表 11: GOp 20 圖表 12: GOp+, GOp* 21 圖表 13: Leafp 22 圖表 14: Minimal Length 23 圖表 15: Gene Ontology Structure (Part) 25 圖表 16: Cycle 26 圖表 17: Architecture - Data Converter 27 圖表 18: Gene Ontology to OntoMarker Data Converter 28 圖表 19: Gene Ontology PostgreSQL Diagram (Part) 29 圖表 20: NLP 33 圖表 21: Medical Document Processor 42 圖表 22: Medical Document Processor 42 圖表 23: GO Node Information 49 圖表 24: GO Hyperbolic Tree Viewer 50 圖表 25: GO Tree Viewer 51 圖表 26: Document Viewer Diagram 51 圖表 27: GO Table 52 圖表 28: BioName Table 53 圖表 29: GO Document Tree Viewer 54 圖表 30: GO Document Hyperbolic Tree 55 圖表 31: GO Document TouchGraph 55 圖表 32: Batch Creator - Source 57 圖表 33: Batch Creator - Status 57 圖表 34: Statistics - GO Node Information 59 圖表 35: Statistics - Gene Ontology Tree 60 圖表 36: Statistics - Hyperbolic Tree 61 圖表 37: Statistics - TouchGraph 62 圖表 38: Website - Introduction 63 圖表 39: Website - FAQ [REF 4-1] 63 圖表 40: Website - Demo 64 圖表 41: Website Demo - Gene Document 65 圖表 42: Website Demo - GO Table 65 圖表 43: Website Demo - GO Tree 66 圖表 44: Website Demo - Gene Document Hyperbolic Viewer 66 圖表 45: Website Demo - Gene Document TouchGraph 67 圖表 46: Online Medical Literature Clustering Mechanism from PubMed - NOD2 74 圖表 47: Online Medical Literature Clustering Mechanism from PubMed - NOD2 (2) 77
dc.language.iso	zh-TW
dc.subject	知識視覺化呈現	zh_TW
dc.subject	文件探勘	zh_TW
dc.subject	基因本體論	zh_TW
dc.subject	knowledge visualization	en
dc.subject	document retrieval	en
dc.subject	Gene Ontology	en
dc.title	生物醫學文獻知識視覺化呈現	zh_TW
dc.title	Knowledge Visualization in Biomedical Literatures	en
dc.type	Thesis
dc.date.schoolyear	93-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蔣以仁(I-Jen Chiang),陳中明(Chung-Ming Chen)
dc.subject.keyword	基因本體論,文件探勘,知識視覺化呈現,	zh_TW
dc.subject.keyword	Gene Ontology,document retrieval,knowledge visualization,	en
dc.relation.page	90
dc.rights.note	未授權
dc.date.accepted	2005-07-19
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	醫學工程學研究所	zh_TW
顯示於系所單位：	醫學工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	15.27 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。