知識圖譜驗證：特徵工程與深度學習方法

Yu-Chen Her; 何昱辰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81935

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	魏志平(Chih-Ping Wei)
dc.contributor.author	Yu-Chen Her	en
dc.contributor.author	何昱辰	zh_TW
dc.date.accessioned	2022-11-25T03:06:58Z	-
dc.date.available	2023-08-16
dc.date.copyright	2021-11-02
dc.date.issued	2021
dc.date.submitted	2021-09-13
dc.identifier.citation	Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z. (2007). DBpedia: A Nucleus for a Web of Open Data. Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference, 722–735. Bodenreider, O. (2004). The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Research, 32(Suppl_1), D267–D270. Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A. (2007). Conditional functional dependencies for data cleaning. Proceedings of IEEE 23rd International Conference on Data Engineering, 746–755. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J. (2008). Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1247–1250. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O. (2013). Translating Embeddings for Modeling Multi-relational Data. Advances in Neural Information Processing Systems, 26. Bordes, A., Chopra, S., Weston, J. (2014). Question Answering with Subgraph Embeddings. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 615–620. Chu, X., Morcos, J., Ilyas, I. F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y. (2015). KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1247–1261. Coumou, H. C. H., Meijman, F. J. (2006). How Do Primary Care Physicians Seek Answers to Clinical Questions? A Literature Review. Journal of the Medical Library Association, 94(1), 55–60. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), 1, 4171–4186. Dunn, K., Marshall, J. G., Wells, A. L., Backus, J. E. B. (2017). Examining the Role of MEDLINE as A Patient Care Information Resource: An Analysis of Data from the Value of Libraries Study. Journal of the Medical Library Association, 105(4), 336–346. Ge, C., Gao, Y., Weng, H., Zhang, C., Miao, X., Zheng, B. (2020). KGClean: An Embedding Powered Knowledge Graph Cleaning Framework. arXiv preprint arXiv:2004.14478. He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. Hinton, G. E., Krizhevsky, A., Sutskever, I., Srivastva, N. (2016). System and Method for Addressing Overfitting in A Neural Network. US Patent 9,406,017. Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. Hogan, A., Blomqvist, E., Cochez, M., D’amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Gayo, J. E. L., Navigli, R., Neumaier, S., Ngomo, A. N., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A. (2021). Knowledge Graphs. ACM Computing Surveys, 54(4), 1–37. Kilicoglu, H., Fiszman, M., Rodriguez, A., Shin, D., Ripple, A., Rindflesch, T. C. (2008). Semantic MEDLINE: A Web Application for Managing the Results of PubMed Searches. Proceedings of the 3rd International Symposium for Semantic Mining in Biomedicine (Vol. 2008), 69–76. Kilicoglu, H., Rosemblat, G., Fiszman, M., Shin, D. (2020). Broad-coverage Biomedical Relation Extraction with SemRep. BMC Bioinformatics, 21(1), 1–28. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T. C. (2012). SemMedDB: A PubMed-scale Repository of Biomedical Semantic Predications. Bioinformatics, 28(23), 3158–3160. Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., Kang, J. (2019). BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining. Bioinformatics, 36(4), 1234–1240. Lindberg, D. A., Humphreys, B. L., McCray, A. T. (1993). The Unified Medical Language System. Methods of Information in Medicine, 32(4), 281–291. Rindflesch, T. C., Fiszman, M. (2003). The Interaction of Domain Knowledge and Linguistic Structure in Natural Language Processing: Interpreting Hypernymic Propositions in Biomedical text. Journal of Biomedical Informatics, 36(6), 462–477. Rogers, F. B. (1964). The Development of MEDLARS. Bulletin of the Medical Library Association, 52(1), 150–151. Schneider, E. W. (1973). Course Modularization Applied: The Interface System and Its Implications for Sequence Control and Data Analysis. Presented at Meeting of the Association for the Development of Instrumental Systems (April 1972). Schuster, M., Paliwal, K. K. (1997). Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56), 1929–1958. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 5998–6008. Veličković, P., Casanova, A., Liò, P., Cucurull, G., Romero, A., Bengio, Y. (2018). Graph Attention Networks. Proceedings of the 6th International Conference on Learning Representations (ICLR). Wang, P., He, Y. (2019). Uni-detect: A Unified Approach to Automated Error Detection in Tables. Proceedings of the 2019 International Conference on Management of Data, 811–828. Wang, Q., Mao, Z., Wang, B., Guo, L. (2017). Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering, 29(12), 2724–2743. Wang, Z., Zhang, J., Feng, J., Chen, Z. (2014). Knowledge Graph Embedding by Translating on Hyperplanes. Proceedings of the AAAI Conference on Artificial Intelligence, 28(1), 1112–1119. Weston, J., Bordes, A., Yakhnenko, O., Usunier, N. (2013). Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction. Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 1366–1371. Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., Leswing, K., Pande, V. (2018). MoleculeNet: A Benchmark for Molecular Machine Learning. Chemical Science, 9(2), 513–530. Zeng, D., Liu, K., Chen, Y., Zhao, J. (2015). Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 1753–1762. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J. (2014). Relation Classification via Convolutional Deep Neural Network. Proceedings the 25th International Conference on Computational Linguistics (COLING): Technical Papers, 2335–2344. Zhang, R., Hristovski, D., Schutte, D., Kastrin, A., Fiszman, M., Kilicoglu, H. (2021). Drug Repurposing for COVID-19 via Knowledge Graph Completion. Journal of Biomedical Informatics, 115, 103696.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/81935	-
dc.description.abstract	知識圖譜驗證，或者稱作知識圖譜清理，是一項辨識圖譜中的實體間關係是否正確的任務，除了能夠有效地改善圖譜的品質之外，經過清理後的知識圖譜亦能提升其延伸應用的表現。以生醫領域為例，若能去除不正確的關係，將能協助進行舊藥新用等其他應用。過往有些許研究致力於知識圖譜驗證的發展，例如在2019年Wang等學者提出以功能相依等規則作為度量衡，藉由計算移除特定資料前後的度量衡差距來判斷是否應該移除這些特定資料。另外2020年Ge等學者的研究則是先使用圖神經網路技術從乾淨的知識圖譜中萃取出各個實體及關係的嵌入向量，再利用嵌入向量來訓練一個能夠判斷關係正確性的分類模型。儘管上述方法均有不錯的表現，但仍存在一些限制。首先，制定當作衡量標準的規則是一項費時且需要相關領域知識的工作，而使用嵌入向量則會有未登錄詞的問題。此外，若方法需仰賴額外的乾淨知識圖譜，則會增加該方法的不可行性。在這項研究中，我們認為能夠使用兩個實體間所有關係的資訊來判斷一個關係是否正確，並且根據此想法提出特徵工程方法、深度學習方法、以及兩者混合後的方法來驗證生醫知識圖譜。根據實驗結果，我們發現實體間的其他關係確實能幫助判斷關係的正確性。此外混合後的方法在準確率、精準度、召回率以及F1分數都取得了更好的成績，表示設計良好的特徵能夠有效地提升深度學習模型的表現。	zh_TW
dc.description.provenance	Made available in DSpace on 2022-11-25T03:06:58Z (GMT). No. of bitstreams: 1 U0001-0809202100134000.pdf: 2843688 bytes, checksum: 0adf4abdbadc51af82c3cfa46652c786 (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii Table of Contents v List of Figures viii List of Tables ix Chapter 1 Introduction 1 1.1. Background 1 1.2. Research Motivation 5 1.3. Research Objective 7 Chapter 2 Literature Review 9 2.1. Biomedical Literature Resource 9 2.2. Knowledge Graph Cleaning 13 2.2.1. Knowledge Graph 13 2.2.2. Rule-based Methods 14 2.2.3. Knowledge Graph-based Methods 17 Chapter 3 Our Proposed Methods 21 3.1. Term Definition and Problem Statement 21 3.1.1. Term Definition 21 3.1.2. Problem Statement 24 3.2. Data Collection 24 3.3. Knowledge Graph Verification Methods 27 3.3.1. BI (Bibliographic Information) Method 28 3.3.2. SA (Sibling Attention) Method 34 3.3.3. SABI (Sibling Attention with Bibliographic Information) Method 42 3.4. Objective Function 44 3.5. Training Strategies 44 Chapter 4 Experiments 47 4.1. Dataset and Evaluation Metrics 47 4.2. Experimental Settings 50 4.3. Performance Comparison 52 4.4. Detailed Analysis of the Proposed BI Method 54 4.5. Detailed Analysis of the Proposed SA Method 56 4.5.1. Effects of BioBERT Fine-tuning Strategy 57 4.5.2. Effects of Use of Residual Block 58 4.5.3. Effects of Use of Feature Concatenation Layer 59 4.6. Performance on Different Category Sets 60 Chapter 5 Conclusion 64 5.1. Contributions 64 5.2. Future Works 65 References 67
dc.language.iso	en
dc.subject	深度學習	zh_TW
dc.subject	預訓練語言模型	zh_TW
dc.subject	特徵工程	zh_TW
dc.subject	生醫知識圖譜驗證	zh_TW
dc.subject	知識圖譜驗證	zh_TW
dc.subject	自我注意力機制	zh_TW
dc.subject	Pretrained language model	en
dc.subject	Feature engineering	en
dc.subject	Biomedical knowledge graph verification	en
dc.subject	Self-attention mechanism	en
dc.subject	Knowledge graph verification	en
dc.subject	Deep learning	en
dc.title	知識圖譜驗證：特徵工程與深度學習方法	zh_TW
dc.title	Knowledge Graph Verification: Feature Engineering vs. Deep Learning Methods	en
dc.date.schoolyear	109-2
dc.description.degree	碩士
dc.contributor.advisor-orcid	魏志平(0000-0003-4150-3926)
dc.contributor.oralexamcommittee	楊錦生(Hsin-Tsai Liu),吳家齊(Chih-Yang Tseng)
dc.subject.keyword	知識圖譜驗證,生醫知識圖譜驗證,特徵工程,深度學習,預訓練語言模型,自我注意力機制,	zh_TW
dc.subject.keyword	Knowledge graph verification,Biomedical knowledge graph verification,Feature engineering,Deep learning,Pretrained language model,Self-attention mechanism,	en
dc.relation.page	73
dc.identifier.doi	10.6342/NTU202103048
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2021-09-14
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
dc.date.embargo-lift	2023-08-16	-
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
U0001-0809202100134000.pdf	2.78 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。