生醫詞彙辨識：利用隱藏式馬可夫模型

Chih-Wei Chen; 陳志偉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28469

標題:	生醫詞彙辨識：利用隱藏式馬可夫模型 Biological Terms Recognition：Using Hidden Markov Models
作者:	Chih-Wei Chen 陳志偉
指導教授:	翁昭旼(Jau-Min Wong)
關鍵字:	隱藏式馬可夫模型,機器學習,生醫文獻探勘,生物醫學名詞辨識,文字探勘, Hidden Markov Models,Machine Learning,Biomedical Term Extraction,Biomedical Named Entity Recognition,Text Mining,
出版年 :	2007
學位:	碩士
摘要:	在生醫文獻中的生醫詞彙，存在著例如複合字、同義詞、慣用語、甚至新的命名法則的問題，造成不同文獻中的生醫詞彙未必具有一致性，這使得自動化生醫資料整合的目標因此困難重重。而其中最初步，對系統效能影響最深遠的，莫過於如何從文獻中正確的找出生醫詞彙，即生物醫學名詞辨識( Biomedical Named Entity Recognition, Biomedical NER )。我們在這篇論文中將利用隱藏式馬可夫模型( Hidden Markov Model )，針對文獻中的摘要部份進行剖析。目標是從文獻摘要中找出生醫詞彙。我們的方法共有四個步驟：首先利用五種生醫詞彙的特徵對文字做分群。第二步，利用分群好的訓練資料產生一個隱藏式馬可夫模型。第三步，將使用者輸入的文章讀入，並且依照前述的四種生醫詞彙特徵對文字做分群。最後，利用Machine Learning演算法，將讀入的文章中，系統判定為生醫詞彙之文字做標記。 With the progress of biomedical science, text mining in biomedical domain is getting important. Since there are many irregularities and ambiguous contexts in biomedical literature such as various compound words, synonyms, acronyms, and even the laws of naming are not literally consistent, how to correctly identify biological terms from text is a fundamental requirement for information extraction. In this paper we propose a biological term extractor which is based on Hidden Markov Models. There are four steps to accomplish our task. First, the tokens in training data are clustered by five features at the first stage. Second, train a Hidden Markov Model by these clustering tokens. Third, normalize user’s input and cluster these tokens. Finally, annotate the biological terms according to the Machine Learning algorithm.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28469
全文授權:	有償授權
顯示於系所單位：	醫學工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 目前未授權公開取用	792.27 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。