請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88142完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 賴飛羆 | zh_TW |
| dc.contributor.advisor | Fei-Pei Lai | en |
| dc.contributor.author | 鄭佳芳 | zh_TW |
| dc.contributor.author | Chia-Fang Cheng | en |
| dc.date.accessioned | 2023-08-08T16:29:15Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-08-08 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-07-17 | - |
| dc.identifier.citation | 1. Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
2. Rokach, L., Maimon, O. (2005). Decision Trees. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9 3. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785 4. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. PMID: 24409142; PMCID: PMC3885826. 5. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157. 6. Amunts, A., Brown, A., Toots, J., Scheres, S. H. W., & Ramakrishnan, V. (2015). Ribosome. The structure of the human mitochondrial ribosome. Science (New York, N.Y.), 348(6230), 95–98. https://doi.org/10.1126/science.aaa1193 7. Jan-Willem Taanman, The mitochondrial genome: structure, transcription, translation and replication, Biochimica et Biophysica Acta (BBA) - Bioenergetics, Volume 1410, Issue 2, 1999, Pages 103-123, ISSN 0005-2728, https://doi.org/10.1016/S0005-2728(98)00161-3. 8. Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M., & Taylor, R. W. (2017). The genetics and pathology of mitochondrial disease. The Journal of pathology, 241(2), 236–250. https://doi.org/10.1002/path.4809 9. Helen A.L. Tuppen, Emma L. Blakely, Douglass M. Turnbull, Robert W. Taylor, Mitochondrial DNA mutations and human disease, Biochimica et Biophysica Acta (BBA) - Bioenergetics, Volume 1797, Issue 2, 2010, Pages 113-128, ISSN 0005-2728, https://doi.org/10.1016/j.bbabio.2009.09.005. 10. Ferrari, A., Del'Olio, S., & Barrientos, A. (2021). The Diseased Mitoribosome. FEBS letters, 595(8), 1025–1061. https://doi.org/10.1002/1873-3468.14024 11. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, null (3/1/2012), 281–305. 12. Jiménez, Á.B., Lázaro, J.L., Dorronsoro, J.R. (2007). Finding Optimal Model Parameters by Discrete Grid Search. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_17 13. Sonney, S., Leipzig, J., Lott, M. T., Zhang, S., Procaccio, V., Wallace, D. C., & Sondheimer, N. (2017). Predicting the pathogenicity of novel variants in mitochondrial tRNA with MitoTIP. PLoS computational biology, 13(12), e1005867. https://doi.org/10.1371/journal.pcbi.1005867 14. Abhishek Niroula , Mauno Vihinen, PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations, Nucleic Acids Research, Volume 44, Issue 5, 18 March 2016, Pages 2020–2027, https://doi.org/10.1093/nar/gkw046 15. Cabrera-Alarcon, J.L., Martinez, J.G., Enríquez, J.A. et al. Variant pathogenic prediction by locus variability: the importance of the current picture of evolution. Eur J Hum Genet 30, 555–559 (2022). https://doi.org/10.1038/s41431-021-01034-1 16. Ruiz-Pesini, E., Lott, M. T., Procaccio, V., Poole, J. C., Brandon, M. C., Mishmar, D., Yi, C., Kreuziger, J., Baldi, P., & Wallace, D. C. (2007). An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic acids research, 35(Database issue), D823–D828. https://doi.org/10.1093/nar/gkl927 17. Gudmundsson, S., Singer-Berk, M., Watts, N. A., Phu, W., Goodrich, J. K., Solomonson, M., Genome Aggregation Database Consortium, Rehm, H. L., MacArthur, D. G., & O'Donnell-Luria, A. (2022). Variant interpretation using population databases: Lessons from gnomAD. Human mutation, 43(8), 1012–1030. https://doi.org/10.1002/humu.24309 18. Ratnaike, T. E., Greene, D., Wei, W., Sanchis-Juan, A., Schon, K. R., van den Ameele, J., Raymond, L., Horvath, R., Turro, E., & Chinnery, P. F. (2021). MitoPhen database: a human phenotype ontology-based approach to identify mitochondrial DNA diseases. Nucleic acids research, 49(17), 9686–9695.https://doi.org/10.1093/nar/gkab726 19. Rehm, H. L., Berg, J. S., Brooks, L. D., Bustamante, C. D., Evans, J. P., Landrum, M. J., Ledbetter, D. H., Maglott, D. R., Martin, C. L., Nussbaum, R. L., Plon, S. E., Ramos, E. M., Sherry, S. T., Watson, M. S., & ClinGen (2015). ClinGen--the Clinical Genome Resource. The New England journal of medicine, 372(23), 2235–2242. https://doi.org/10.1056/NEJMsr1406261 20. Melissa J Landrum and others, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153 21. http://www.egl-eurofins.com/emvclass/emvclass.php 22. Rubino, F., Piredda, R., Calabrese, F. M., Simone, D., Lang, M., Calabrese, C., Petruzzella, V., Tommaseo-Ponzetta, M., Gasparre, G., & Attimonelli, M. (2012). HmtDB, a genomic resource for mitochondrion-based human variability studies. Nucleic acids research, 40(Database issue), D1150–D1159. https://doi.org/10.1093/nar/gkr1086 23. Christos Kopanos and others, VarSome: the human genomic variant search engine, Bioinformatics, Volume 35, Issue 11, June 2019, Pages 1978–1980, https://doi.org/10.1093/bioinformatics/bty897 24. DiMauro, S. and Schon, E.A. (2001), Mitochondrial DNA mutations in human disease. Am. J. Med. Genet., 106: 18-26. https://doi.org/10.1002/ajmg.1392 25. Grady, J. P., Pickett, S. J., Ng, Y. S., Alston, C. L., Blakely, E. L., Hardy, S. A., Feeney, C. L., Bright, A. A., Schaefer, A. M., Gorman, G. S., McNally, R. J., Taylor, R. W., Turnbull, D. M., & McFarland, R. (2018). mtDNA heteroplasmy level and copy number indicate disease burden in m.3243A>G mitochondrial disease. EMBO molecular medicine, 10(6), e8262. https://doi.org/10.15252/emmm.201708262 26. Pedersen, A. B., Mikkelsen, E. M., Cronin-Fenton, D., Kristensen, N. R., Pham, T. M., Pedersen, L., & Petersen, I. (2017). Missing data and multiple imputation in clinical epidemiological research. Clinical epidemiology, 9, 157–166. https://doi.org/10.2147/CLEP.S129785 27. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 1 (January 2002), 321–357. 28. Davis, M. J. (2010). Contrast coding in multiple regression analysis: Strengths, weaknesses, and utility of popular coding structures. Journal of data science, 8(1), 61-73. 29. LESNE, A. (2014). Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3), E240311. doi:10.1017/S0960129512000783 30. G.A. Jeffrey, H. Maluszynska, J. Mitra, Hydrogen bonding in nucleosides and nucleotides, International Journal of Biological Macromolecules, Volume 7, Issue 6, 1985, Pages 336-348, ISSN 0141-8130, https://doi.org/10.1016/0141-8130(85)90048-0. 31. Moffatt, B. A., & Ashihara, H. (2002). Purine and pyrimidine nucleotide synthesis and metabolism. The arabidopsis book, 1, e0018. https://doi.org/10.1199/tab.0018 32. Steven Ackerman, William Horton, Chapter 2.4 - Effects of Environmental Factors on DNA: Damage and Mutations, Editor(s): Béla Török, Timothy Dransfield, Green Chemistry, Elsevier, 2018, Pages 109-128 33. Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88142 | - |
| dc.description.abstract | 本研究提出了一項對於預測人類粒線體核糖體DNA變異致病性的綜合分析。我們提出了一種基於機器學習極限梯度提升加上特徵整合的新方法,該方法集成了多個因素,包括同質性、異質性、等位基因頻率、異質性程度、變異導致的良性或致病性變化率,通過核苷酸突變熵計算的核苷酸突變的可變性與複雜性,以及核苷酸突變導致的序列信息改變(例如結構變化、酮基氨基存在等),並通過SHAP找出模型預測致病性所判定的特徵重要度。目前尚未發表任何針對人類粒線體核糖體DNA的預測方法,我們的方法是第一個且在評估數據集上取得了0.9886的F1分數。通過利用機器學習的力量並考慮粒線體核糖體DNA的獨特特徵,我們的方法為準確預測粒線體核糖體DNA變異的致病性提供了一個有價值的工具。 | zh_TW |
| dc.description.abstract | This study proposes a comprehensive analysis for predicting the pathogenicity of human mitochondrial ribosomal DNA (mt-rDNA) variations. We introduce a novel approach based on XGB model with feature integration, which integrates multiple factors including homogeneity, heterogeneity, allele frequency, heteroplasmy level, variation-induced benign or pathogenic rate of change, variability and complexity of nucleotide mutations calculated through nucleotide mutation entropy, and sequence information alterations caused by nucleotide mutations (such as structural changes and presence of keto-amino bases). Additionally, we utilize SHAP (Shapley Additive Explanations) to identify feature importance in determining the pathogenicity predicted by the model. Currently, no prediction methods specifically targeting human mt-rDNA variations have been published, and XGB with feature integration is the first to achieve an F1 score of 0.9886 on the evaluation dataset. By harnessing the power of machine learning and considering the unique characteristics of mt-rDNA, our approach provides a valuable tool for accurately predicting the pathogenicity of mitochondrial ribosomal DNA variations. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-08T16:29:15Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-08-08T16:29:15Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
誌謝 ii Acknowledgment iii 中文摘要 iv Abstract v CONTENTS vi LIST OF FIGURES ix LIST OF TABLES x Chapter 1 Introduction 1 1.1 Background and motivation of the study 2 1.2 Research Objectives and Significance 2 1.3 Research Questions and Hypotheses 3 1.4 Brief Overview of Research Methods 3 Chapter 2 Mitochondrial DNA and Pathogenicity 5 2.1 What Is Mitochondrial DNA 5 2.2 The Functions of Different Segments of Mitochondrial DNA 6 2.3 Why and How Mutation of Mitochondrial DNA Will Cause Disease 6 2.4 Mutation Types of Mitochondrial DNA 7 2.5 Our Target in this Research 8 Chapter 3 Machine Learning Techniques 9 3.1 What Is Machine Learning? 10 3.2 The Typical Procedure of Machine Learning 10 3.3 Preprocessing 11 3.4 Model Training 12 3.4.1 Model types 13 3.4.2 Hyperparameter tuning 14 3.5 Evaluation 15 3.5.1 Cross Validation 15 3.5.2 Accuracy, Precision, Recall, F1 Score 16 3.5.3 AUC Curve 17 3.5.4 SHAP (Shapley Additive Explanations) and feature importance 17 Chapter 4 Related Works: ML and Pathogenicity Prediction 19 Chapter 5 Proposed Method 22 5.1 Method Overview 22 5.2 Dataset Collection 23 5.2.1 Database 23 5.3 Dataset Preprocessing and Merge 25 5.3.1 Feature Selection: 25 5.3.2 Imputation 27 5.3.3 Oversampling 27 5.3.4 One-Hot Encoding Transformation 28 5.3.5 Dataset Merge 29 5.4 Feature Integration 30 5.5 ML Techniques 36 5.5.1 Randomized Search 36 5.5.2 XGB Model: 37 5.6 Model Evaluation 38 5.7 Make Prediction on Unknown Label Data 39 Chapter 6 Experimental Results 40 6.1 Baselines 40 6.2 Numerical Result 45 6.3 SHAP values and feature importance 48 Chapter 7 Conclusion 55 REFERENCE 57 Appendices 62 | - |
| dc.language.iso | en | - |
| dc.subject | 極限梯度提升 | zh_TW |
| dc.subject | 人類粒線體核糖體DNA | zh_TW |
| dc.subject | 機器學習 | zh_TW |
| dc.subject | 特徵整合 | zh_TW |
| dc.subject | 致病預測 | zh_TW |
| dc.subject | pathogenicity predictor | en |
| dc.subject | XGB | en |
| dc.subject | Human mitochondrial ribosomal DNA | en |
| dc.subject | machine learning | en |
| dc.subject | feature integration | en |
| dc.title | 以機器學習方法預測人類粒線體核糖體DNA變異之致病性 | zh_TW |
| dc.title | Using machine learning methods to predict the pathogenicity of human mitochondrial ribosomal DNA mutations | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.coadvisor | 李妮鍾 | zh_TW |
| dc.contributor.coadvisor | Ni-Chung Lee | en |
| dc.contributor.oralexamcommittee | 莊志明;曾新育;張哲瑋 | zh_TW |
| dc.contributor.oralexamcommittee | Jyh-Ming Juang;Hsin-Yu Tseng;Che-Wei Chang | en |
| dc.subject.keyword | 人類粒線體核糖體DNA,機器學習,極限梯度提升,特徵整合,致病預測, | zh_TW |
| dc.subject.keyword | Human mitochondrial ribosomal DNA,machine learning,XGB,feature integration,pathogenicity predictor, | en |
| dc.relation.page | 62 | - |
| dc.identifier.doi | 10.6342/NTU202301584 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2023-07-17 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 生醫電子與資訊學研究所 | - |
| 顯示於系所單位: | 生醫電子與資訊學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf 未授權公開取用 | 3.95 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
