Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 醫學工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28469
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor翁昭旼(Jau-Min Wong)
dc.contributor.authorChih-Wei Chenen
dc.contributor.author陳志偉zh_TW
dc.date.accessioned2021-06-13T00:09:11Z-
dc.date.available2009-07-31
dc.date.copyright2007-07-31
dc.date.issued2007
dc.date.submitted2007-07-26
dc.identifier.citation1. MedLine, http://www.ncbi.nlm.nih.gov/PubMed/.
2. Bairoch, A., The SWISS-PROT protein sequence databank and its new supplement TrEMBL. Nucleic Acids Research, 1997. 25:31-36.
3. Kanehisa, M., The KEGG database at GenomeNet. Necleic Acids Research, 2002. 30:42-46.
4. Krauthammer, M. and G. Nenadic, Term identification in the biomedical literature. Journal of Biomedical Informatics, 2004. 37(6): p. 512-526.
5. MUC6. in Proceedings of the Sixth Message Understanding Conference(MUC-6). 1998. Columbia,Maryland: Morgan Kaufmann Publishers,Inc.
6. MUC7. in Proceedings of the Seventh Message Understanding Conference(MUC-7). 1998. Fairfax,Virginia: Morgan Kaufmann Publishers,Inc.
7. Shatkay, H. and R. Feldman, Mining The Biomedical Literature in the Genomic Era: An Overview. Journal of Computational Biology, 2003. 10(6): p. 821-855.
8. Ohta, T., T. Yuka, and J.-D. Kim. GENIA corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain. in Proceedings of Human Language Technology Conference (HLT 2002). 2002.
9. Hirschman, L., A. A. Morgan, and A. S. Yeh, Rutabaga by any other name: extracting biological names. J Biomed Inform,, 2002. 35(4): p. 247-59.
10. Hanisch, D., et al., Playing biology's name game: identifying protein names in scientific text. Pac Symp Biocomput, 2003: p. 403-14.
11. Hanisch, D., et al., ProMiner: rule-based protein and gene entity recognition. BMC Bioinformatics, 2005. 6 Suppl 1: p. S14.
12. Fundel, K., et al., A simple approach for protein name identification: prospects and limits. BMC Bioinformatics, 2005. 6(Suppl 1): p. S15.
13. Tsuruoka, Y. and J.i. Tsujii, Boosting precision and recall of dictionary-based protein name recognition. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine, 2003. 13
14. Yu, H., et al., Automatically identifying gene/protein terms in MEDLINE abstracts. J Biomed Inform, 2002. 35(5-6): p. 322-30.
15. Fukuda, K., et al., Toward information extraction: identifying protein names from biological papers. Pac Symp Biocomput, 1998: p. 707-18.
16. Tanabe, L., et al., GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 2005. 6(Suppl I).
17. Tsai, R.T.-H., et al., Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics, 2006. 7(1): p. 92.
18. Zhang, J., et al., Enhancing HMM-based biomedical named entity recognition by studying special phenomena. Journal of Biomedical Informatics, 2004. 37(6): p. 411-422.
19. Rabiner, L.R. and B.H. Juang, An introduction to hidden Markov models. ASSP Magazine, IEEE, 1986. 3(1): p. 4 - 16
20. Rabiner, L.R., A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 1989. 77(2): p. 257 - 286
21. Kazama, J.i., et al. Tuning Support Vector Machines for Biomedical Named Entity Recognition. in Proc. of the Workshop on Natural Language Processing in the Biomedical Domain. 2002. Philadelphia.
22. Tanabe, L. and W.J. Wilbur, Tagging gene and protein names in biomedical text. Bioinformatics, 2002. 18(8): p. 1124-32.
23. Chieu, H.L. and H.T. Ng. Named entity recognition with a maximum entropy approach. in Proceedings of CoNLL-2003. 2003. Edmonton, Canada.
24. Mayfield, J., P. McNamee, and C. Piatko. Named entity recognition using hundreds of thousands of features. in Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-2003). 2003. Edmonton, Canada.
25. Zhou, G. and J. Su. Named entity recognition using an HMM-based chunk tagger in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics 2001
26. Collier, N., C. Nobata, and J.-i. Tsujii. Extracting the Names of Genes and Gene Products with a Hidden Markov Model. in Proceedings of the 18th conference on Computational linguistics 2000.
27. Shen, D., et al., Effective adaptation of a Hidden Markov Model-based named entity recognizer for biomedical domain. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine, 2003. 13.
28. Lee, K.-J., et al., Biomedical named entity recognition using two-phase model based on SVMs. Journal of Biomedical Informatics, 2004. 37(6): p. 436-447.
29. Koike, A., Y. Niwa, and T. Takagi, Automatic extraction of gene/protein biological functions from biomedical text. Bioinformatics, 2005. 21(7): p. 1227-36.
30. Genia Tagger: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/.
31. Nobata C, C.N., Tsujii J. Automatic term identification and classification in biology texts. in Proceedings of the 5th NLPRS. 1999.
32. Takeuchi K, C.N. Use of support vector machines in extended named entity recognition. in Proceedings of the sixth conference on natural language learning( CONLL 2002 ). 2002.
33. Defense Virtual Library. http://dvl.dtic.mil/stop_list.html. 2000.
34. http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stopwords.
35. Search Engine World. http://www.searchengineworld.com/spy/stopwords.htm. 2002.
36. Frakes, W.B. and R. Baeza-Yates, Information retrieval data structure & algorithms. Prentice Hall PTR Released, 1992.
37. Torii, M., S. Kamboj, and K. Vijay-Shanker, An Investigation of Various Information Sources for Classifying Biological Names. Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine, 2003. 13: p. 113 - 120
38. Rindflesch TC, T.L., Weinstein JW, Hunter L. EDGAR: extraction of drugs, genes and relations from the biomedical literature. in Proceedings of the pacific symposium on biocomputing 2000( PSB'2000 ). 2000.
39. Subramaniam, L.V., et al. Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application. in In the Proceedings of the ACM Conference on Information and Knowledge Management. 2003. New Orleans, Lousiana.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28469-
dc.description.abstract在生醫文獻中的生醫詞彙,存在著例如複合字、同義詞、慣用語、甚至新的命名法則的問題,造成不同文獻中的生醫詞彙未必具有一致性,這使得自動化生醫資料整合的目標因此困難重重。而其中最初步,對系統效能影響最深遠的,莫過於如何從文獻中正確的找出生醫詞彙,即生物醫學名詞辨識( Biomedical Named Entity Recognition, Biomedical NER )。
我們在這篇論文中將利用隱藏式馬可夫模型( Hidden Markov Model ),針對文獻中的摘要部份進行剖析。目標是從文獻摘要中找出生醫詞彙。我們的方法共有四個步驟:首先利用五種生醫詞彙的特徵對文字做分群。第二步,利用分群好的訓練資料產生一個隱藏式馬可夫模型。第三步,將使用者輸入的文章讀入,並且依照前述的四種生醫詞彙特徵對文字做分群。最後,利用Machine Learning演算法,將讀入的文章中,系統判定為生醫詞彙之文字做標記。
zh_TW
dc.description.abstractWith the progress of biomedical science, text mining in biomedical domain is getting important. Since there are many irregularities and ambiguous contexts in biomedical literature such as various compound words, synonyms, acronyms, and even the laws of naming are not literally consistent, how to correctly identify biological terms from text is a fundamental requirement for information extraction.
In this paper we propose a biological term extractor which is based on Hidden Markov Models. There are four steps to accomplish our task. First, the tokens in training data are clustered by five features at the first stage. Second, train a Hidden Markov Model by these clustering tokens. Third, normalize user’s input and cluster these tokens. Finally, annotate the biological terms according to the Machine Learning algorithm.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T00:09:11Z (GMT). No. of bitstreams: 1
ntu-96-R94548059-1.pdf: 811285 bytes, checksum: f3a73ac12bc0d6d26dafc70a7269c0b0 (MD5)
Previous issue date: 2007
en
dc.description.tableofcontents口試委員會審定書
誌謝 1
中文摘要 5
英文摘要 6
第一章 緒論 7
1.1 研究動機與目的 7
1.2 研究材料 9
1.3 研究方法與論文架構 10
第二章 相關工作 11
2.1 使用字典 11
2.2 使用規則資料庫( rule-base )過濾法 12
2.3 使用機器學習 14
第三章 研究方法 15
3.1 系統架構 15
3.2 隱藏式馬可夫模型 16
3.3 系統所使用的特徵集( Feature Set ) 25
3.3.1詞性特徵( Part-of-speech feature ) 27
3.3.2 拼字法特徵( Orthographic features ) 27
3.3.3 形態學特徵( Morphological feature ) 32
3.3.4 詞首特徵( Head noun feature ) 32
3.3.5 形容詞特徵 33
第四章 結果與討論 35
4.1 Genia Corpus 3.02版 36
第五章 結論與未來 39
參考文獻 40
圖表目錄
圖1.1 Genia Corpus之Ontology 10
圖3.1 系統架構圖 15
圖3.2 擲骰子狀態轉移圖 17
圖3.3 Forward變數示意圖 21
表3.1 標點符號標籤說明表 29
表3.2 型態符號標籤說明表 31
表4.1 加入形容詞特徵前後效能比較表 35
表4.2 效能比較表 36
圖4.1 MEDLINE:95280913 36
圖4.2 MEDLINE:95256242 37
表4.3 Genia Corpus分歧紀錄表 37
表4.4 Genia Corpus修改前後系統效能 38
dc.language.isozh-TW
dc.subject文字探勘zh_TW
dc.subject隱藏式馬可夫模型zh_TW
dc.subject機器學習zh_TW
dc.subject生醫文獻探勘zh_TW
dc.subject生物醫學名詞辨識zh_TW
dc.subjectBiomedical Term Extractionen
dc.subjectText Miningen
dc.subjectHidden Markov Modelsen
dc.subjectMachine Learningen
dc.subjectBiomedical Named Entity Recognitionen
dc.title生醫詞彙辨識:利用隱藏式馬可夫模型zh_TW
dc.titleBiological Terms Recognition:Using Hidden Markov Modelsen
dc.typeThesis
dc.date.schoolyear95-2
dc.description.degree碩士
dc.contributor.oralexamcommittee蔣以仁(I-Jen Chiang),陳中明(Chung-Ming Chen)
dc.subject.keyword隱藏式馬可夫模型,機器學習,生醫文獻探勘,生物醫學名詞辨識,文字探勘,zh_TW
dc.subject.keywordHidden Markov Models,Machine Learning,Biomedical Term Extraction,Biomedical Named Entity Recognition,Text Mining,en
dc.relation.page42
dc.rights.note有償授權
dc.date.accepted2007-07-30
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept醫學工程學研究所zh_TW
顯示於系所單位:醫學工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-96-1.pdf
  未授權公開取用
792.27 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved