基於統計準則式方法偵測生醫文獻中的生物關聯

Nai-Wen Chang; 張乃文

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15370

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	歐陽彥正(Yen-Jen Oyang),許聞廉(Wen-Lian Hsu)
dc.contributor.author	Nai-Wen Chang	en
dc.contributor.author	張乃文	zh_TW
dc.date.accessioned	2021-06-07T17:33:24Z	-
dc.date.copyright	2020-07-02
dc.date.issued	2020
dc.date.submitted	2020-06-30
dc.identifier.citation	1. Bagewadi, S., Bobić, T., Hofmann-Apitius, M., Fluck, J., Klinger, R. (2014). Detecting miRNA mentions and relations in biomedical literature. F1000Research, 3. 2. Baskerville, S., Bartel, D. P. (2005). Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA, 11(3), 241-247. 3. Bravo, À., Piñero, J., Queralt-Rosinach, N., Rautschka, M., Furlong, L. I. (2015). Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. Bmc Bioinformatics, 16(1), 55. 4. Bunescu, R., Ge, R. F., Kate, R. J., Marcotte, E. M., Mooney, R. J., Ramani, A. K., Wong, Y. W. (2005). Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine, 33(2), 139-155. doi:10.1016/j.artmed.2004.07.016 5. Chang, N.-W., Dai, H.-J., Hsieh, Y.-L., Hsu, W.-L. (2016). Statistical principle-based approach for detecting miRNA-target gene interaction articles. Paper presented at the 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE). 6. Chang, Y.-C., Chen, C.-C., Hsieh, Y.-L., Chen, C. C., Hsu, W.-L. (2015). Linguistic template extraction for recognizing reader-emotion and emotional resonance writing assistance. Paper presented at the Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 7. Chang, Y.-C., Chu, C.-H., Chen, C. C., Hsu, W.-L. (2016). Linguistic template extraction for recognizing reader-emotion. Paper presented at the International Journal of Computational Linguistics Chinese Language Processing, Volume 21, Number 1, June 2016. 8. Chang, Y.-C., Chu, C.-H., Su, Y.-C., Chen, C. C., Hsu, W.-L. (2016). PIPE: a protein–protein interaction passage extraction module for BioCreative challenge. Database, 2016. 9. Chang, Y.-C., Hsieh, Y.-L., Chen, C.-C., Hsu, W.-L. (2017). A semantic frame-based intelligent agent for topic detection. Soft Computing, 21(2), 391-401. 10. Chou, C.-H., Chang, N.-W., Shrestha, S., Hsu, S.-D., Lin, Y.-L., Lee, W.-H., . . . Tu, S.-J. (2016). miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic acids research, 44(D1), D239-D247. 11. Chou, C.-H., Shrestha, S., Yang, C.-D., Chang, N.-W., Lin, Y.-L., Liao, K.-W., . . . Lee, W.-H. (2018). miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions. Nucleic acids research, 46(D1), D296-D302. 12. Chun, H.-W., Tsuruoka, Y., Kim, J.-D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, J. i. (2006). Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In Biocomputing 2006 (pp. 4-15): World Scientific. 13. Culotta, A., Sorensen, J. (2004). Dependency tree kernels for relation extraction. Paper presented at the Proceedings of the 42nd annual meeting on association for computational linguistics. 14. Dai, H.-J., Singh, O., Jonnagaddala, J., Su, E. C.-Y. (2016). NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions. Database, 2016. 15. Dai, H.-J., Su, C.-H., Lai, P.-T., Huang, M.-S., Jonnagaddala, J., Rose Jue, T., . . . Singh, O. (2016). MET network in PubMed: a text-mined network visualization and curation system. Database, 2016. 16. Dai, H.-J., Wu, J. C.-Y., Tsai, R. T.-H., Pan, W.-H., Hsu, W.-L. (2013). T-HOD: a literature-based candidate gene database for hypertension, obesity and diabetes. Database, 2013. 17. Dweep, H., Sticht, C., Pandey, P., Gretz, N. (2011). miRWalk–database: prediction of possible miRNA binding sites by “walking” the genes of three genomes. Journal of biomedical informatics, 44(5), 839-847. 18. Enright, A. J., John, B., Gaul, U., Tuschl, T., Sander, C., Marks, D. S. (2003). MicroRNA targets in Drosophila. Genome biology, 5(1), R1. 19. Esquela-Kerscher, A., Slack, F. J. (2006). Oncomirs—microRNAs with a role in cancer. Nature reviews cancer, 6(4), 259-269. 20. Fundel, K., Kuffner, R., Zimmer, R. (2007). RelEx - Relation extraction using dependency parse trees. Bioinformatics, 23(3), 365-371. doi:10.1093/bioinformatics/btl616 21. Garcia, D. M., Baek, D., Shin, C., Bell, G. W., Grimson, A., Bartel, D. P. (2011). Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nature structural molecular biology, 18(10), 1139. 22. Gupta, S., Ross, K. E., Tudor, C. O., Wu, C. H., Schmidt, C. J., Vijay-Shanker, K. (2016). miriad: A text mining tool for detecting associations of micrornas with diseases. Journal of biomedical semantics, 7(1), 9. 23. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., McKusick, V. A. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research, 33(suppl_1), D514-D517. 24. Hao, Y., Zhu, X., Huang, M., Li, M. (2005). Discovering patterns to extract protein–protein interactions from the literature: Part II. Bioinformatics, 21(15), 3294-3300. 25. Helwak, A., Kudla, G., Dudnakova, T., Tollervey, D. (2013). Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell, 153(3), 654-665. 26. Hou, W.-J., Kuo, B.-Y. (2016). Discovery of gene-disease associations from biomedical texts. Computer Science and Information Technology, 4(1), 1-8. 27. Hsieh, Y.-L., Chang, Y.-C., Chang, N.-W., Hsu, W.-L. (2017). Identifying protein-protein interactions in biomedical literature using recurrent neural networks with long short-term memory. Paper presented at the Proceedings of the eighth international joint conference on natural language processing (volume 2: short papers). 28. Hsu, W.-L., Wu, S.-H., Chen, Y.-S. (2001). Event identification based on the information map-INFOMAP. Paper presented at the 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236). 29. Huang, H.-Y., Lin, Y.-C.-D., Li, J., Huang, K.-Y., Shrestha, S., Hong, H.-C., . . . Yu, Y. (2020). miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database. Nucleic acids research, 48(D1), D148-D154. 30. Huang, M., Zhu, X., Hao, Y., Payan, D. G., Qu, K., Li, M. (2004). Discovering patterns to extract protein–protein interactions from full texts. Bioinformatics, 20(18), 3604-3612. 31. Iorio, M. V., Ferracin, M., Liu, C.-G., Veronese, A., Spizzo, R., Sabbioni, S., . . . Campiglio, M. (2005). MicroRNA gene expression deregulation in human breast cancer. Cancer research, 65(16), 7065-7070. 32. Jiang, Q., Wang, Y., Hao, Y., Juan, L., Teng, M., Zhang, X., . . . Liu, Y. (2009). miR2Disease: a manually curated database for microRNA deregulation in human disease. Nucleic acids research, 37(suppl_1), D98-D104. 33. Krek, A., Grün, D., Poy, M. N., Wolf, R., Rosenberg, L., Epstein, E. J., . . . Stoffel, M. (2005). Combinatorial microRNA target predictions. Nature genetics, 37(5), 495-500. 34. Lai, P.-T., Huang, M.-S., Yang, T.-H., Hsu, W.-L., Tsai, R. T.-H. (2018). Statistical principle-based approach for gene and protein related object recognition. Journal of cheminformatics, 10(1), 64. 35. Li, C., Liakata, M., Rebholz-Schuhmann, D. (2014). Biological network extraction from scientific literature: state of the art and challenges. Briefings in bioinformatics, 15(5), 856-877. 36. Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265. 37. Loper, E., Bird, S. (2002). NLTK: the natural language toolkit. arXiv preprint cs/0205028. 38. Lynam‐Lennon, N., Maher, S. G., Reynolds, J. V. (2009). The roles of microRNA in cancer and apoptosis. Biological Reviews, 84(1), 55-71. 39. Mahmood, A. A., Wu, T.-J., Mazumder, R., Vijay-Shanker, K. (2016). DiMeX: a text mining system for mutation-disease association extraction. PloS one, 11(4). 40. Naeem, H., Küffner, R., Csaba, G., Zimmer, R. (2010). miRSel: automated extraction of associations between microRNAs and genes from the biomedical literature. Bmc Bioinformatics, 11(1), 135. 41. Nédellec, C., Bossy, R., Kim, J.-D., Kim, J.-J., Ohta, T., Pyysalo, S., Zweigenbaum, P. (2013). Overview of BioNLP shared task 2013. Paper presented at the Proceedings of the BioNLP shared task 2013 workshop. 42. Peng, Y., Gupta, S., Wu, C., Vijay-Shanker, K. (2015). An extended dependency graph for relation extraction in biomedical texts. Paper presented at the Proceedings of BioNLP 15. 43. Peng, Y., Lu, Z. (2017). Deep learning for extracting protein-protein interactions from biomedical literature. arXiv preprint arXiv:1706.01556. 44. Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., Salakoski, T. (2007). BioInfer: a corpus for information extraction in the biomedical domain. Bmc Bioinformatics, 8. doi:Artn 50 45. 10.1186/1471-2105-8-50 46. Singh, O., Jonnagaddala, J., Dai, H. J., Su, E. C.-Y. (2015). NTTMUNSW BioC modules for recognizing and normalizing species and gene/protein mentions in full text articles. Paper presented at the BioCreative V Workshop, Seville, Spain. 47. Smith, L., Rindflesch, T., Wilbur, W. J. (2004). MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics, 20(14), 2320-2321. 48. Soifer, H. S., Rossi, J. J., Sætrom, P. (2007). MicroRNAs in disease and potential therapeutic applications. Molecular therapy, 15(12), 2070-2079. 49. Subramani, S., Natarajan, J. (2015). An integrated text mining system based on network analysis for knowledge discovery of human gene-disease associations (GenDisFinder). Paper presented at the Proceedings of the Fifth BioCreative Challenge Evaluation Workshop. 50. Van Landeghem, S., Saeys, Y., De Baets, B., Van de Peer, Y. (2008). Extracting protein-protein interactions from text using rich feature vectors and feature selection. Paper presented at the 3rd International symposium on Semantic Mining in Biomedicine (SMBM 2008). 51. Vergoulis, T., Vlachos, I. S., Alexiou, P., Georgakilas, G., Maragkakis, M., Reczko, M., . . . Hatzigeorgiou, A. G. (2012). TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support. Nucleic acids research, 40(D1), D222-D229. 52. Veronese, A., Pepe, F., Chiacchia, J., Pagotto, S., Lanuti, P., Veschi, S., . . . Visone, R. (2015). Allele-specific loss and transcription of the miR-15a/16-1 cluster in chronic lymphocytic leukemia. Leukemia, 29(1), 86-95. doi:10.1038/leu.2014.139 53. Vlachos, I. S., Paraskevopoulou, M. D., Karagkouni, D., Georgakilas, G., Vergoulis, T., Kanellos, I., . . . Kalfakakou, D. (2015). DIANA-TarBase v7. 0: indexing more than half a million experimentally supported miRNA: mRNA interactions. Nucleic acids research, 43(D1), D153-D159. 54. Xie, B., Ding, Q., Han, H., Wu, D. (2013). miRCancer: a microRNA–cancer association database constructed by text mining on literature. Bioinformatics, 29(5), 638-644. 55. Zelenko, D., Aone, C., Richardella, A. (2003). Kernel methods for relation extraction. Journal of machine learning research, 3(Feb), 1083-1106. 56. Zhang, H., Huang, M., Zhu, X. (2011). Protein-protein interaction extraction from bio-literature with compact features and data sampling strategy. Paper presented at the 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI). 57. Zhang, Y., Lin, H., Yang, Z., Li, Y. (2011). Neighborhood hash graph kernel for protein–protein interaction extraction. Journal of biomedical informatics, 44(6), 1086-1092. 58. Zhou, D., Zhong, D., He, Y. (2014). Biomedical relation extraction: from binary to complex. Computational and mathematical methods in medicine, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15370	-
dc.description.abstract	對於生物醫學研究者而言，生物醫學文獻不僅僅是發表研究成果的地方，更重要的是可以在文獻中取得來自世界各地研究者的成果來驗證以及推進自己的研究。然而，要從海量文獻中快速又正確的找出正確訊息是資訊爆炸時代的一大難題。在本論文中，我們提出了一個基於統計原理的演算法-透過合併統計準則式模型Statistical Principle-based approach (SPBA) 以及擷取法則式模型與統計模型的優點來協助使用者更準確的擷取蛋白質相互作用（PPI）及微小核醣核酸miRNA-基因-疾病關聯（MGDA）等重要且常見的生醫分子間的關聯辨識。 SPBA共有三個步驟。首先，透過domain expert建構相關領域的本體論/語意地圖(ontology/MAP)，針對語料進行語意標注(semantic labeling)，標注後所產生的資料稱為樣板（pattern），然而這些樣板是雜亂無章的；我們再藉由SPBA演算法中的準則生成（principle generation）步驟，將準則（principle）加上關鍵樣元的挑選，整合成具有代表性的principles。最後，在準則匹配（principle matching）的步驟中則藉由允許substitution, insertion及deletion，強化了傳統的正規表示式(regular expression)過於僵化而無法做到的彈性比對，讓principle matching的自由度更高(flexible)。在本論文中，我們也藉由四種不同的實驗來驗證本論文之演算法及其他由機器學習及深度學習提出之演算法的正確率、召回率及效能 (F-Score) 作為評比標準。SPBA在所有的實驗資料集中獲得了5項正確率第一、8項召回率第一及8項效能第一的成果。SPBA的效能在miRNA，基因和疾病之間的關係及蛋白質交互作用的提取任務中，已經超越了目前最熱門的機器學習及深度學習演算法。SPBA不僅具有自動擷取生醫文獻中相關資訊的功能，亦能產生可讀性的資料及規則，將有助於自然語言處理未來在生醫文獻探勘上的應用。	zh_TW
dc.description.abstract	Biomedical relations in biological literature are indispensable to assist the research progression. In this thesis, we focus on several relations such as protein-protein interaction and miRNA-gene-disease association (MGDA) extraction. MicroRNAs (miRNAs) are small non-coding RNAs, which negatively regulate the gene expression at the post-transcriptional level. miRNAs have been considered as good candidates for early detection or prognosis biomarkers for various diseases. Validated miRNA targets are usually reported in literature, necessitating researchers to manually screen through the related literature to keep up-to-date with novel findings. However, the amount of miRNA-related literature is increasing rapidly which makes it difficult for researchers to keep up to date. Moreover, identifying interactions between proteins is also important to understand underlying biological processes. Extracting a protein-protein interaction (PPI) from the raw text is very difficult. The proposed method, Statistical Principle-Based Approach (SPBA), consists of two major modules, concept labeling and the primary function of SPBA which are implemented in sentence level. The input sentences are labeled through several steps such as named entity recognition (NER), event detection, and trigger word identification. The principle generation module in SPBA takes over the subsequent step to give confidential principles based on the prior labeled sentences. In principle generation, a dominating algorithm is responsible to achieve the most representative principles summarization according to training data. And SPBA exceeds the state-of-the-art methods on both well-known PPI datasets and manually curated corpora in the relation extraction between miRNAs, genes and diseases.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T17:33:24Z (GMT). No. of bitstreams: 1 U0001-2306202018242400.pdf: 9898641 bytes, checksum: fa29d6448a432939a969caa2e64e96f4 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	Content 口試委員審定書 i 誌謝 ii 中文摘要 iii Abstract iv Content v List of Figures viii List of Tables ix 1. Introduction 1 2. Specific Aims 10 3. Material and Method 12 3.1. Workflow of SPBA 12 3.2. Preprocessing 13 3.3. Construction of the Entity Knowledge Base (EKB) 15 3.3.1. The Entity Knowledge Base (EKB) of miRNA 16 3.3.2. The Entity Knowledge Base (EKB) of Gene 18 3.4. Construction of the Relation Knowledge Base (RKB) 19 3.4.1. The Relation Knowledge Base (RKB) of miRNA-Target Interaction (MTI) 19 3.5. Principle Generation 22 3.6. Principle Matching 24 3.7. An Example of the relation knowledge base construction for Gene-Disease Association (GDA) Detection 26 3.7.1. Concept Labeling module 27 3.7.2. Statistical Principle-Based Approach (SPBA) for Gene-Disease Association (GDA) Extraction 28 3.8. The Pseudo Code of Two-Step algorithm: Statistical Principle-Based Approach (SPBA) 29 3.8.1. Step 1. Principle Generation 29 3.8.2. Step 2. Principle Matching 33 4. Experiments and Results 36 4.1. Five well-known protein-protein interaction (PPI) datasets: AIMed, BioInfer, HPRD50, LLL and IEPA 37 4.2. A manually curated protein-protein interaction (PPI) dataset for detecting protein event: Protein Event Detection Dataset (PEDD) 44 4.3. MTI Dataset for training and testing 50 4.4. GDA Datasets for training and testing 51 4.5. Evaluation Metrics 53 4.6. Summary 53 5. Discussions 55 5.1. Error Analysis of Five Well-Known Protein-Protein Interaction (PPI) Datasets 55 5.2. Error Analysis of MTI dataset 59 6. Conclusion 61 Reference 63 Appendix I. PEDD Annotation Guideline 68 Appendix II. PubMed Query for PEDD 75 Appendix III. Potential Trigger in PEDD 86
dc.language.iso	en
dc.subject	蛋白質交互作用	zh_TW
dc.subject	微小核醣核酸與其標靶基因之交互作用	zh_TW
dc.subject	生醫關聯擷取	zh_TW
dc.subject	統計準則式方法	zh_TW
dc.subject	人類疾病	zh_TW
dc.subject	Biomedical Relation Extraction	en
dc.subject	Protein-Protein Interaction	en
dc.subject	Human diseases	en
dc.subject	miRNA-Target interaction	en
dc.subject	Statistical Principle-Based Approach	en
dc.title	基於統計準則式方法偵測生醫文獻中的生物關聯	zh_TW
dc.title	A Statistical Principle-Based Approach (SPBA) for Detecting Biological Relations from Biomedical Literature	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	賴飛羆(Fei-Pei Lai),陳信希(Hsin-Hsi Chen),馬偉雲(Wei-Yun Ma),戴鴻傑(Hong-Jie Dai)
dc.subject.keyword	統計準則式方法,生醫關聯擷取,微小核醣核酸與其標靶基因之交互作用,人類疾病,蛋白質交互作用,	zh_TW
dc.subject.keyword	Statistical Principle-Based Approach,Biomedical Relation Extraction,miRNA-Target interaction,Human diseases,Protein-Protein Interaction,	en
dc.relation.page	87
dc.identifier.doi	10.6342/NTU202001123
dc.rights.note	未授權
dc.date.accepted	2020-06-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
Appears in Collections:	生醫電子與資訊學研究所

Files in This Item:

File	Size	Format
U0001-2306202018242400.pdf Restricted Access	9.67 MB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets