Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 生醫電子與資訊學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88142
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor賴飛羆zh_TW
dc.contributor.advisorFei-Pei Laien
dc.contributor.author鄭佳芳zh_TW
dc.contributor.authorChia-Fang Chengen
dc.date.accessioned2023-08-08T16:29:15Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-08-
dc.date.issued2023-
dc.date.submitted2023-07-17-
dc.identifier.citation1. Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
2. Rokach, L., Maimon, O. (2005). Decision Trees. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_9
3. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). Association for Computing Machinery, New York, NY, USA, 785–794. https://doi.org/10.1145/2939672.2939785
4. Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec 4;7:21. doi: 10.3389/fnbot.2013.00021. PMID: 24409142; PMCID: PMC3885826.
5. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157.
6. Amunts, A., Brown, A., Toots, J., Scheres, S. H. W., & Ramakrishnan, V. (2015). Ribosome. The structure of the human mitochondrial ribosome. Science (New York, N.Y.), 348(6230), 95–98. https://doi.org/10.1126/science.aaa1193
7. Jan-Willem Taanman, The mitochondrial genome: structure, transcription, translation and replication, Biochimica et Biophysica Acta (BBA) - Bioenergetics, Volume 1410, Issue 2, 1999, Pages 103-123, ISSN 0005-2728, https://doi.org/10.1016/S0005-2728(98)00161-3.
8. Alston, C. L., Rocha, M. C., Lax, N. Z., Turnbull, D. M., & Taylor, R. W. (2017). The genetics and pathology of mitochondrial disease. The Journal of pathology, 241(2), 236–250. https://doi.org/10.1002/path.4809
9. Helen A.L. Tuppen, Emma L. Blakely, Douglass M. Turnbull, Robert W. Taylor, Mitochondrial DNA mutations and human disease, Biochimica et Biophysica Acta (BBA) - Bioenergetics, Volume 1797, Issue 2, 2010, Pages 113-128, ISSN 0005-2728, https://doi.org/10.1016/j.bbabio.2009.09.005.
10. Ferrari, A., Del'Olio, S., & Barrientos, A. (2021). The Diseased Mitoribosome. FEBS letters, 595(8), 1025–1061. https://doi.org/10.1002/1873-3468.14024
11. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, null (3/1/2012), 281–305.
12. Jiménez, Á.B., Lázaro, J.L., Dorronsoro, J.R. (2007). Finding Optimal Model Parameters by Discrete Grid Search. In: Corchado, E., Corchado, J.M., Abraham, A. (eds) Innovations in Hybrid Intelligent Systems. Advances in Soft Computing, vol 44. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74972-1_17
13. Sonney, S., Leipzig, J., Lott, M. T., Zhang, S., Procaccio, V., Wallace, D. C., & Sondheimer, N. (2017). Predicting the pathogenicity of novel variants in mitochondrial tRNA with MitoTIP. PLoS computational biology, 13(12), e1005867. https://doi.org/10.1371/journal.pcbi.1005867
14. Abhishek Niroula , Mauno Vihinen, PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations, Nucleic Acids Research, Volume 44, Issue 5, 18 March 2016, Pages 2020–2027, https://doi.org/10.1093/nar/gkw046
15. Cabrera-Alarcon, J.L., Martinez, J.G., Enríquez, J.A. et al. Variant pathogenic prediction by locus variability: the importance of the current picture of evolution. Eur J Hum Genet 30, 555–559 (2022). https://doi.org/10.1038/s41431-021-01034-1
16. Ruiz-Pesini, E., Lott, M. T., Procaccio, V., Poole, J. C., Brandon, M. C., Mishmar, D., Yi, C., Kreuziger, J., Baldi, P., & Wallace, D. C. (2007). An enhanced MITOMAP with a global mtDNA mutational phylogeny. Nucleic acids research, 35(Database issue), D823–D828. https://doi.org/10.1093/nar/gkl927
17. Gudmundsson, S., Singer-Berk, M., Watts, N. A., Phu, W., Goodrich, J. K., Solomonson, M., Genome Aggregation Database Consortium, Rehm, H. L., MacArthur, D. G., & O'Donnell-Luria, A. (2022). Variant interpretation using population databases: Lessons from gnomAD. Human mutation, 43(8), 1012–1030. https://doi.org/10.1002/humu.24309
18. Ratnaike, T. E., Greene, D., Wei, W., Sanchis-Juan, A., Schon, K. R., van den Ameele, J., Raymond, L., Horvath, R., Turro, E., & Chinnery, P. F. (2021). MitoPhen database: a human phenotype ontology-based approach to identify mitochondrial DNA diseases. Nucleic acids research, 49(17), 9686–9695.https://doi.org/10.1093/nar/gkab726
19. Rehm, H. L., Berg, J. S., Brooks, L. D., Bustamante, C. D., Evans, J. P., Landrum, M. J., Ledbetter, D. H., Maglott, D. R., Martin, C. L., Nussbaum, R. L., Plon, S. E., Ramos, E. M., Sherry, S. T., Watson, M. S., & ClinGen (2015). ClinGen--the Clinical Genome Resource. The New England journal of medicine, 372(23), 2235–2242. https://doi.org/10.1056/NEJMsr1406261
20. Melissa J Landrum and others, ClinVar: improving access to variant interpretations and supporting evidence, Nucleic Acids Research, Volume 46, Issue D1, 4 January 2018, Pages D1062–D1067, https://doi.org/10.1093/nar/gkx1153
21. http://www.egl-eurofins.com/emvclass/emvclass.php
22. Rubino, F., Piredda, R., Calabrese, F. M., Simone, D., Lang, M., Calabrese, C., Petruzzella, V., Tommaseo-Ponzetta, M., Gasparre, G., & Attimonelli, M. (2012). HmtDB, a genomic resource for mitochondrion-based human variability studies. Nucleic acids research, 40(Database issue), D1150–D1159. https://doi.org/10.1093/nar/gkr1086
23. Christos Kopanos and others, VarSome: the human genomic variant search engine, Bioinformatics, Volume 35, Issue 11, June 2019, Pages 1978–1980, https://doi.org/10.1093/bioinformatics/bty897
24. DiMauro, S. and Schon, E.A. (2001), Mitochondrial DNA mutations in human disease. Am. J. Med. Genet., 106: 18-26. https://doi.org/10.1002/ajmg.1392
25. Grady, J. P., Pickett, S. J., Ng, Y. S., Alston, C. L., Blakely, E. L., Hardy, S. A., Feeney, C. L., Bright, A. A., Schaefer, A. M., Gorman, G. S., McNally, R. J., Taylor, R. W., Turnbull, D. M., & McFarland, R. (2018). mtDNA heteroplasmy level and copy number indicate disease burden in m.3243A>G mitochondrial disease. EMBO molecular medicine, 10(6), e8262. https://doi.org/10.15252/emmm.201708262
26. Pedersen, A. B., Mikkelsen, E. M., Cronin-Fenton, D., Kristensen, N. R., Pham, T. M., Pedersen, L., & Petersen, I. (2017). Missing data and multiple imputation in clinical epidemiological research. Clinical epidemiology, 9, 157–166. https://doi.org/10.2147/CLEP.S129785
27. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16, 1 (January 2002), 321–357.
28. Davis, M. J. (2010). Contrast coding in multiple regression analysis: Strengths, weaknesses, and utility of popular coding structures. Journal of data science, 8(1), 61-73.
29. LESNE, A. (2014). Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3), E240311. doi:10.1017/S0960129512000783
30. G.A. Jeffrey, H. Maluszynska, J. Mitra, Hydrogen bonding in nucleosides and nucleotides, International Journal of Biological Macromolecules, Volume 7, Issue 6, 1985, Pages 336-348, ISSN 0141-8130, https://doi.org/10.1016/0141-8130(85)90048-0.
31. Moffatt, B. A., & Ashihara, H. (2002). Purine and pyrimidine nucleotide synthesis and metabolism. The arabidopsis book, 1, e0018. https://doi.org/10.1199/tab.0018
32. Steven Ackerman, William Horton, Chapter 2.4 - Effects of Environmental Factors on DNA: Damage and Mutations, Editor(s): Béla Török, Timothy Dransfield, Green Chemistry, Elsevier, 2018, Pages 109-128
33. Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 4768–4777.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88142-
dc.description.abstract本研究提出了一項對於預測人類粒線體核糖體DNA變異致病性的綜合分析。我們提出了一種基於機器學習極限梯度提升加上特徵整合的新方法,該方法集成了多個因素,包括同質性、異質性、等位基因頻率、異質性程度、變異導致的良性或致病性變化率,通過核苷酸突變熵計算的核苷酸突變的可變性與複雜性,以及核苷酸突變導致的序列信息改變(例如結構變化、酮基氨基存在等),並通過SHAP找出模型預測致病性所判定的特徵重要度。目前尚未發表任何針對人類粒線體核糖體DNA的預測方法,我們的方法是第一個且在評估數據集上取得了0.9886的F1分數。通過利用機器學習的力量並考慮粒線體核糖體DNA的獨特特徵,我們的方法為準確預測粒線體核糖體DNA變異的致病性提供了一個有價值的工具。zh_TW
dc.description.abstractThis study proposes a comprehensive analysis for predicting the pathogenicity of human mitochondrial ribosomal DNA (mt-rDNA) variations. We introduce a novel approach based on XGB model with feature integration, which integrates multiple factors including homogeneity, heterogeneity, allele frequency, heteroplasmy level, variation-induced benign or pathogenic rate of change, variability and complexity of nucleotide mutations calculated through nucleotide mutation entropy, and sequence information alterations caused by nucleotide mutations (such as structural changes and presence of keto-amino bases). Additionally, we utilize SHAP (Shapley Additive Explanations) to identify feature importance in determining the pathogenicity predicted by the model. Currently, no prediction methods specifically targeting human mt-rDNA variations have been published, and XGB with feature integration is the first to achieve an F1 score of 0.9886 on the evaluation dataset. By harnessing the power of machine learning and considering the unique characteristics of mt-rDNA, our approach provides a valuable tool for accurately predicting the pathogenicity of mitochondrial ribosomal DNA variations.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-08T16:29:15Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-08T16:29:15Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
Acknowledgment iii
中文摘要 iv
Abstract v
CONTENTS vi
LIST OF FIGURES ix
LIST OF TABLES x
Chapter 1 Introduction 1
1.1 Background and motivation of the study 2
1.2 Research Objectives and Significance 2
1.3 Research Questions and Hypotheses 3
1.4 Brief Overview of Research Methods 3
Chapter 2 Mitochondrial DNA and Pathogenicity 5
2.1 What Is Mitochondrial DNA 5
2.2 The Functions of Different Segments of Mitochondrial DNA 6
2.3 Why and How Mutation of Mitochondrial DNA Will Cause Disease 6
2.4 Mutation Types of Mitochondrial DNA 7
2.5 Our Target in this Research 8
Chapter 3 Machine Learning Techniques 9
3.1 What Is Machine Learning? 10
3.2 The Typical Procedure of Machine Learning 10
3.3 Preprocessing 11
3.4 Model Training 12
3.4.1 Model types 13
3.4.2 Hyperparameter tuning 14
3.5 Evaluation 15
3.5.1 Cross Validation 15
3.5.2 Accuracy, Precision, Recall, F1 Score 16
3.5.3 AUC Curve 17
3.5.4 SHAP (Shapley Additive Explanations) and feature importance 17

Chapter 4 Related Works: ML and Pathogenicity Prediction 19
Chapter 5 Proposed Method 22
5.1 Method Overview 22
5.2 Dataset Collection 23
5.2.1 Database 23
5.3 Dataset Preprocessing and Merge 25
5.3.1 Feature Selection: 25
5.3.2 Imputation 27
5.3.3 Oversampling 27
5.3.4 One-Hot Encoding Transformation 28
5.3.5 Dataset Merge 29
5.4 Feature Integration 30
5.5 ML Techniques 36
5.5.1 Randomized Search 36
5.5.2 XGB Model: 37
5.6 Model Evaluation 38
5.7 Make Prediction on Unknown Label Data 39
Chapter 6 Experimental Results 40
6.1 Baselines 40
6.2 Numerical Result 45
6.3 SHAP values and feature importance 48
Chapter 7 Conclusion 55
REFERENCE 57
Appendices 62
-
dc.language.isoen-
dc.subject極限梯度提升zh_TW
dc.subject人類粒線體核糖體DNAzh_TW
dc.subject機器學習zh_TW
dc.subject特徵整合zh_TW
dc.subject致病預測zh_TW
dc.subjectpathogenicity predictoren
dc.subjectXGBen
dc.subjectHuman mitochondrial ribosomal DNAen
dc.subjectmachine learningen
dc.subjectfeature integrationen
dc.title以機器學習方法預測人類粒線體核糖體DNA變異之致病性zh_TW
dc.titleUsing machine learning methods to predict the pathogenicity of human mitochondrial ribosomal DNA mutationsen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.coadvisor李妮鍾zh_TW
dc.contributor.coadvisorNi-Chung Leeen
dc.contributor.oralexamcommittee莊志明;曾新育;張哲瑋zh_TW
dc.contributor.oralexamcommitteeJyh-Ming Juang;Hsin-Yu Tseng;Che-Wei Changen
dc.subject.keyword人類粒線體核糖體DNA,機器學習,極限梯度提升,特徵整合,致病預測,zh_TW
dc.subject.keywordHuman mitochondrial ribosomal DNA,machine learning,XGB,feature integration,pathogenicity predictor,en
dc.relation.page62-
dc.identifier.doi10.6342/NTU202301584-
dc.rights.note未授權-
dc.date.accepted2023-07-17-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept生醫電子與資訊學研究所-
顯示於系所單位:生醫電子與資訊學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
3.95 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved