基於統計準則式方法偵測生醫文獻中的生物關聯

Nai-Wen Chang; 張乃文

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15370

Title:	基於統計準則式方法偵測生醫文獻中的生物關聯 A Statistical Principle-Based Approach (SPBA) for Detecting Biological Relations from Biomedical Literature
Authors:	Nai-Wen Chang 張乃文
Advisor:	歐陽彥正(Yen-Jen Oyang),許聞廉(Wen-Lian Hsu)
Keyword:	統計準則式方法,生醫關聯擷取,微小核醣核酸與其標靶基因之交互作用,人類疾病,蛋白質交互作用, Statistical Principle-Based Approach,Biomedical Relation Extraction,miRNA-Target interaction,Human diseases,Protein-Protein Interaction,
Publication Year :	2020
Degree:	博士
Abstract:	對於生物醫學研究者而言，生物醫學文獻不僅僅是發表研究成果的地方，更重要的是可以在文獻中取得來自世界各地研究者的成果來驗證以及推進自己的研究。然而，要從海量文獻中快速又正確的找出正確訊息是資訊爆炸時代的一大難題。在本論文中，我們提出了一個基於統計原理的演算法-透過合併統計準則式模型Statistical Principle-based approach (SPBA) 以及擷取法則式模型與統計模型的優點來協助使用者更準確的擷取蛋白質相互作用（PPI）及微小核醣核酸miRNA-基因-疾病關聯（MGDA）等重要且常見的生醫分子間的關聯辨識。 SPBA共有三個步驟。首先，透過domain expert建構相關領域的本體論/語意地圖(ontology/MAP)，針對語料進行語意標注(semantic labeling)，標注後所產生的資料稱為樣板（pattern），然而這些樣板是雜亂無章的；我們再藉由SPBA演算法中的準則生成（principle generation）步驟，將準則（principle）加上關鍵樣元的挑選，整合成具有代表性的principles。最後，在準則匹配（principle matching）的步驟中則藉由允許substitution, insertion及deletion，強化了傳統的正規表示式(regular expression)過於僵化而無法做到的彈性比對，讓principle matching的自由度更高(flexible)。在本論文中，我們也藉由四種不同的實驗來驗證本論文之演算法及其他由機器學習及深度學習提出之演算法的正確率、召回率及效能 (F-Score) 作為評比標準。SPBA在所有的實驗資料集中獲得了5項正確率第一、8項召回率第一及8項效能第一的成果。SPBA的效能在miRNA，基因和疾病之間的關係及蛋白質交互作用的提取任務中，已經超越了目前最熱門的機器學習及深度學習演算法。SPBA不僅具有自動擷取生醫文獻中相關資訊的功能，亦能產生可讀性的資料及規則，將有助於自然語言處理未來在生醫文獻探勘上的應用。 Biomedical relations in biological literature are indispensable to assist the research progression. In this thesis, we focus on several relations such as protein-protein interaction and miRNA-gene-disease association (MGDA) extraction. MicroRNAs (miRNAs) are small non-coding RNAs, which negatively regulate the gene expression at the post-transcriptional level. miRNAs have been considered as good candidates for early detection or prognosis biomarkers for various diseases. Validated miRNA targets are usually reported in literature, necessitating researchers to manually screen through the related literature to keep up-to-date with novel findings. However, the amount of miRNA-related literature is increasing rapidly which makes it difficult for researchers to keep up to date. Moreover, identifying interactions between proteins is also important to understand underlying biological processes. Extracting a protein-protein interaction (PPI) from the raw text is very difficult. The proposed method, Statistical Principle-Based Approach (SPBA), consists of two major modules, concept labeling and the primary function of SPBA which are implemented in sentence level. The input sentences are labeled through several steps such as named entity recognition (NER), event detection, and trigger word identification. The principle generation module in SPBA takes over the subsequent step to give confidential principles based on the prior labeled sentences. In principle generation, a dominating algorithm is responsible to achieve the most representative principles summarization according to training data. And SPBA exceeds the state-of-the-art methods on both well-known PPI datasets and manually curated corpora in the relation extraction between miRNAs, genes and diseases.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15370
DOI:	10.6342/NTU202001123
Fulltext Rights:	未授權
Appears in Collections:	生醫電子與資訊學研究所

Files in This Item:

File	Size	Format
U0001-2306202018242400.pdf Restricted Access	9.67 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets