利用依存句法於生物醫學關係萃取之表示學習

Yao-Chang Chu; 朱瑤章

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8317

標題:	利用依存句法於生物醫學關係萃取之表示學習 Representation Learning for Biomedical Relation Extraction with Dependency Parsing
作者:	Yao-Chang Chu 朱瑤章
指導教授:	魏志平(Chih-Ping Wei)
關鍵字:	關係萃取,生醫關係萃取,關係分類,深度學習,非監督式學習,自監督式學習, Relation extraction,Biomedical relation extraction,Relation classification,Deep learning,Unsupervised Learning,Self-supervised Learning,
出版年 :	2020
學位:	碩士
摘要:	關係萃取的任務是從文本中自動學習、抽取兩個實體間的關係。近年來，神經網路模型被廣泛應用在關係萃取上，也取得了優異的表現。然而，神經網路需要大量的訓練資料，而在生醫領域，因為標記成本昂貴，缺乏大量的訓練資料，所以我們進一步探索只需要少量標記資料來微調模型的自監督式學習方法。 MTB 是一個利用自監督式學習方法的關係萃取模型，藉由相同兩實體組成的實體對(entity pair)出現在不同句子也可能隱含相同關係的假設，MTB 得以訓練任意兩實體間的關係向量表示。不像過去許多深度學習之關係萃取模型，MTB 並未利用額外的自然語言特徵，故我們認為若加入兩實體間的依存路徑資訊，有機會讓 MTB 訓練得更好。另外，由於 MTB 僅利用不同的兩實體對是否相同當作訓練依據，負面樣本(非完全相同的實體對)的選定格外重要，因此，我們認為除了 MTB 提出的兩種負面樣本外，還存在使 MTB 訓練更有效的負面樣本。因此，基於 MTB 模型，我們提出兩個改善方向：(1) 藉由四種網路模組編碼並嵌入實體對之間的依存關係 (2) 藉由行內(inline)負樣本，使 MTB 模型不能只學會關鍵字匹配，而作為真正學到基於上下文的關係表示。在不同設置的實驗下，我們證明了相對於 MTB 原本架構，我們提出的兩個改善方向都能有效地提升關係萃取的效能。我們並探索了在簡單或複雜的句法關係下，更適合的依存神經網路模組，也證明了在更細粒度的方向性關係下，我們的模型仍能有效辨別並超越 MTB 原始架構的表現。 Relation extraction is the task that learns and extracts relations between entities from the text. In recent years, neural network models have been widely used in relation extraction, and have achieved the state-of-the-art performance. However, neural networks require a large amount of training data. In the biomedical domain, because acquiring labeled instances is expensive and the training dataset is often small-sized, we further explore self-supervised learning methods that require only a small amount of labeling data for fine-tuning the model. Matching The Blank (MTB) is a self-supervised based relation extraction model. With the assumption that if two entity pair from two sentences are the same, it also implies that they are having the same relation, MTB can train the vector of relation representation between any two entities. However, unlike many deep learning relationship extraction models in the past, MTB does not use additional natural language features other than text. Hence, we believe that if the dependency parsing information between the two entities in a sentence is taken into account, there is an opportunity for MTB to be trained better. In addition, since negative samples play an important role in MTB training, the selection of negative samples (non-identical entity pairs) is particularly important. Therefore, we believe there exists a new type of negative samples that is more effective for MTB training. Therefore, based on the MTB model, we propose two directions for improvement: (1) four neural network modules to encode the dependency relationship between entities, and (2) inline negative samples that the MTB model will not just learn to do keyword matching, but will truly learn context-based relation representation. With the various experiment settings for robustness, we prove that compared with the original structure of MTB, the two directions that we propose can improve the effectiveness of relation extraction. We also explore more suitable dependency modules under simple or complex dependency relationships of an entity pair, and also prove that under more fine-grained directional relations, our model can still effectively identify and outperform the original structure of MTB.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8317
DOI:	10.6342/NTU202002454
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
U0001-0508202013275700.pdf	2.27 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。