請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7174
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 歐陽彥正 | |
dc.contributor.author | Yu-Chuan Chang | en |
dc.contributor.author | 張育銓 | zh_TW |
dc.date.accessioned | 2021-05-19T17:39:59Z | - |
dc.date.available | 2025-03-02 | |
dc.date.available | 2021-05-19T17:39:59Z | - |
dc.date.copyright | 2020-03-02 | |
dc.date.issued | 2020 | |
dc.date.submitted | 2020-02-27 | |
dc.identifier.citation | 1. Kingsmore, S.F., et al., Genome-wide association studies: progress and potential for drug discovery and development. Nature Reviews Drug Discovery, 2008. 7(3): p. 221-230. 2. Ozaki, K., et al., Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet, 2002. 32(4): p. 650-4. 3. Klein, R.J., et al., Complement factor H polymorphism in age-related macular degeneration. Science, 2005. 308(5720): p. 385-9. 4. Pinero, J., et al., DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res, 2017. 45(D1): p. D833-D839. 5. McCarthy, M.I., et al., Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 2008. 9(5): p. 356-69. 6. Eichler, E.E., et al., Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet, 2010. 11(6): p. 446-50. 7. Manolio, T.A., et al., Finding the missing heritability of complex diseases. Nature, 2009. 461(7265): p. 747-53. 8. Frazer, K.A., et al., Human genetic variation and its contribution to complex traits. Nat Rev Genet, 2009. 10(4): p. 241-51. 9. Shriner, D., et al., Problems with genome-wide association studies. Science, 2007. 316(5833): p. 1840-2. 10. Carlborg, O. and C.S. Haley, Epistasis: too often neglected in complex trait studies? Nat Rev Genet, 2004. 5(8): p. 618-25. 11. Cordell, H.J., Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet, 2009. 10(6): p. 392-404. 12. Easton, D.F., et al., Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 2007. 447(7148): p. 1087-93. 13. Bodmer, W. and C. Bonilla, Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet, 2008. 40(6): p. 695-701. 14. Hindorff, L.A., et al., Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A, 2009. 106(23): p. 9362-7. 15. Moore, J.H., F.W. Asselbergs, and S.M. Williams, Bioinformatics challenges for genome-wide association studies. Bioinformatics, 2010. 26(4): p. 445-55. 16. Marchini, J., P. Donnelly, and L.R. Cardon, Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet, 2005. 37(4): p. 413-7. 17. Wei, W.H., G. Hemani, and C.S. Haley, Detecting epistasis in human complex traits. Nat Rev Genet, 2014. 15(11): p. 722-33. 18. Schupbach, T., et al., FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics, 2010. 26(11): p. 1468-9. 19. Wan, X., et al., BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet, 2010. 87(3): p. 325-40. 20. Purcell, S., et al., PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75. 21. Chang, C.C., et al., Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 2015. 4: p. 7. 22. Moore, J.H. and S.M. Williams, New strategies for identifying gene-gene interactions in hypertension. Ann Med, 2002. 34(2): p. 88-95. 23. Yang, P., et al., Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics, 2011. 12 Suppl 1: p. S10. 24. Bureau, A., et al., Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol, 2005. 28(2): p. 171-82. 25. Schwarz, D.F., I.R. Konig, and A. Ziegler, On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics, 2010. 26(14): p. 1752-8. 26. Wan, X., et al., MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics, 2009. 10: p. 13. 27. Murk, W. and A.T. DeWan, Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases. G3 (Bethesda), 2016. 6(7): p. 2043-50. 28. Allen, G.I., et al., Crowdsourced estimation of cognitive decline and resilience in Alzheimer's disease. Alzheimers Dement, 2016. 12(6): p. 645-53. 29. Bettens, K., K. Sleegers, and C. Van Broeckhoven, Genetic insights in Alzheimer's disease. The Lancet Neurology, 2013. 12(1): p. 92-104. 30. Levy, E., et al., Mutation of the Alzheimer's disease amyloid gene in hereditary cerebral hemorrhage, Dutch type. Science, 1990. 248(4959): p. 1124-1126. 31. Goate, A., et al., Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer's disease. Nature, 1991. 349(6311): p. 704. 32. Levy-Lahad, E., et al., Candidate gene for the chromosome 1 familial Alzheimer's disease locus. Science, 1995. 269(5226): p. 973-977. 33. Rogaev, E., et al., Familial Alzheimer's disease in kindreds with missense mutations in a gene on chromosome 1 related to the Alzheimer's disease type 3 gene. Nature, 1995. 376(6543): p. 775. 34. Denise Harold, e.a., Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nature Genetics, 2009. 41.10: p. 1088-1093. 35. Teri A. Manolio, M.D., Ph.D, Genomewide Association Studies and Assessment of the Risk of Disease. N Engl J Med, 2010. 363: p. 166-176. 36. Sudha Seshadri, M.A.L.F., PhD; M. Arfan Ikram, MD, PhD; et al, Genome-wide Analysis of Genetic Loci Associated With Alzheimer Disease. JAMA, 2010. 303: p. 1832-1840. 37. Hollingworth P, e.a., Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nature Genetics, 2011. 43: p. 429-435. 38. Adam C Naj, e.a., Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nature Genetics, 2011. 43: p. 436-441. 39. Jean-Charles Lambert, e.a., Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nature Genetics, 2013. 45: p. 1452-1458. 40. Genevera I. Allen, e.a., Crowdsourced estimation of cognitive decline and resilience in Alzheimer’s disease. Alzheimer’s Dementia-, 2016. 12(6): p. 645-653. 41. Jennifer Couzin, e.a., What don't we know? Science, 2005. 309: p. 78-102. 42. Stephen F. Kingsmore, I.E.L., Joann Mudge, Damian D. Gessler and William D. Beavis, Genome-wide association studies: progress and potential for drug discovery and development. Nature Review, 2008. 7: p. 221-230. 43. Kouichi Ozaki, e.a., Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nature Genetics, 2002. 32: p. 650-654. 44. Robert J. Klein, e.a., Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science, 2005. 308(5720): p. 385-389. 45. Janet Piñero, e.a., DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res, 2107. 45(D1): p. 833-839. 46. Mark I. McCarthy, e.a., Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics, 2008. 9: p. 356-369. 47. Douglas F. Easton, e.a., Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 2007. 447: p. 1087-1093. 48. Daniel Shriner, e.a., Problems with Genome-Wide Association Studies. Science, 2007. 316(5833): p. 1840-1842. 49. Sheila M. Schmutz, T.G.B., Angela D. Goldfinch, TYRP1 and MC1R genotypes and their effects on coat color in dogs. Mammalian Genome, 2002. 13(7): p. 380-387. 50. J. A. Kerns, M.O., et al., Exclusion of Melanocortin-1Receptor (Mc1r) and Agouti as Candidates for Dominant Black in Dogs. Journal of Heredity, 2003. 94(I): p. 75-79. 51. Bateson William, G.M., Mendel's principles of heredity. 1913: University press. 52. Phillips, P.C., Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics, 2008. 9: p. 855-867. 53. Cordell, H.J., Detecting gene–gene interactions that underlie human diseases. Nature Reviews Genetics, 2009. 10: p. 392-404. 54. Walter Bodmer, C.B., Common and rare variants in multifactorial susceptibility to common diseases. Nature Genetics, 2008. 40: p. 695-701. 55. Kelly A. Frazer, S.S.M., Nicholas J. Schork, Eric J. Topol, Human genetic variation and its contribution to complex traits. Nature Reviews Genetics, 2009. 10: p. 241-251. 56. Teri A. Manolio, e.a., Finding the missing heritability of complex diseases. Nature, 2009. 461: p. 747-753. 57. Lucia A. Hindorff, e.a., Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS, 2009. 106: p. 9362-9367. 58. Jason H. Moore, F.W.A., Scott M. Williams, Bioinformatics challenges for genome-wide association studies. Bioinformatics, 2010. 26(4): p. 445-455. 59. Jonathan Marchini, P.D., Lon R Cardon, Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics, 2005. 37: p. 413-417. 60. Cordell, H.J., Detecting gene–gene interactions that underlie human diseases. Nature Reviews Genetics, 2009. 10(6): p. 392-404. 61. Wen-Hua Wei, G.H., Chris S. Haley, Detecting epistasis in human complex traits. Nature Reviews Genetics, 2014. 15: p. 722-733. 62. Shaun Purcell, e.a., PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics, 2007. 81: p. 559-575. 63. Christopher C Chang, e.a., Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 2015. 4. 64. Thierry Schüpbach, e.a., FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics, 2010. 26: p. 1468-1469. 65. Xiang Wan, e.a., BOOST: A Fast Approach to Detecting Gene-Gene Interactions in Genome-wide Case-Control Studies. The American Society of Human Genetics, 2010. 87: p. 325-340. 66. DeWan, W.M.a.A.T., Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases. G3: Genes, Genomes, Genetics, 2016. 6: p. 2043-2050. 67. Jason H Moore, S.M.W., New strategies for identifying gene-gene interactions in hypertension. Ann Med, 2002. 34: p. 88-95. 68. Tricia A. Thornton-Wells, J.H.M.a.J.L.H., Genetics, statistics and human disease: analytical retooling for complexity. TRENDS in Genetics, 2004. 20: p. 640-647. 69. Wahlsten, D., Insensitivity of the analysis of variance to heredity-environment interaction. Behavioral and Brain Sciences, 1990. 13(1): p. 109-120. 70. Rosanna Upstill-Goddard, D.E., Joerg Fliege and Andrew Collins, Machine learning approaches for the discovery of gene^gene interactions in disease data. Briefings in Bioinformatics, 2012. 14: p. 251-260. 71. Ho, T.K., Random Decision Forests, in Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995: Montreal, QC. p. 278-282. 72. Ho, T.K., The Random Subspace Method for Constructing Decision Forests, in IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998. p. 832-844. 73. Pengyi Yang, J.W.H., Yee Hwa Yang, Bing B Zhou, Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics, 2011. 12:(Suppl 1):S10. 74. Bureau, A., Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology, 2005. 28: p. 171-182. 75. Daniel F. Schwarz, I.R.K.a.A.Z., On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics, 2010. 26: p. 1752-1758. 76. David M. Reif, e.a., Feature Selection using a Random Forests Classifier for the Integrated Analysis of Multiple Data Types, in Computational Intelligence and Bioinformatics and Computational Biology. 2006, IEEE: Toronto, Ont., Canada. 77. L Breiman, J.F., CJ Stone, RA Olshen, Classification and regression trees. 1984, Monterey, CA: Wadsworth Brooks/Cole Advanced Books Software. 358. 78. Mingers, J., An empirical comparison of selection measures for decision-tree induction. Machine Learning, 1989. 3(4): p. 319-342. 79. Quinlan, J.R., Discovering rules by induction from large collections of examples. 1979: Expert systems in the micro electronic age. Edinburgh University Press. 168-201. 80. Kathryn L Lunetta, e.a., Screening large-scale association study data: exploiting interactions using random forests. BMC Genetics, 2004. 5:32. 81. Xiang Wan, e.a., MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics, 2009. 10:13. 82. Xiang-Yang Lou, e.a., A Generalized Combinatorial Approach for Detecting Gene-by-Gene and Gene-by-Environment Interactions with Application to Nicotine Dependence. Am. J. Hum. Genet, 2007. 80: p. 1125-1137. 83. al., D.R.V.e., A Balanced Accuracy Function for Epistasis Modeling in Imbalanced Datasets using Multifactor Dimensionality Reduction. Genetic Epidemiology, 2007. 31: p. 306-315. 84. Kristine A. Pattin, e.a., A Computationally Efficient Hypothesis Testing Method for Epistasis Analysis Using Multifactor Dimensionality Reduction. Genetic Epidemiology, 2009. 33: p. 87-94. 85. Kira, K.a.R., L.A., A practical approach to feature selection, in Machine Learning: Proceedings of the AAAI’92. 1992: San Francisco. 86. Ma, L., A.G. Clark, and A. Keinan, Gene-based testing of interactions in association studies of quantitative traits. PLoS Genet, 2013. 9(2): p. e1003321. 87. Oh, S., et al., A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR. BMC Bioinformatics, 2012. 13 Suppl 9: p. S5. 88. Li, S. and Y. Cui, Gene-centric gene–gene interaction: A model-based kernel machine method. The Annals of Applied Statistics, 2012. 6(3): p. 1134-1161. 89. Wu, X., et al., A novel statistic for genome-wide interaction analysis. PLoS Genet, 2010. 6(9): p. e1001131. 90. Kent, W.J., et al., The human genome browser at UCSC. Genome Res, 2002. 12(6): p. 996-1006. 91. Rosenbloom, K.R., et al., The UCSC Genome Browser database: 2015 update. Nucleic Acids Res, 2015. 43(Database issue): p. D670-81. 92. Lewontin, R. and K.i. Kojima, The evolutionary dynamics of complex polymorphisms. Evolution, 1960. 14(4): p. 458-472. 93. Lewontin, R.C., The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models. Genetics, 1964. 49(1): p. 49-67. 94. Hill, W.G., Estimation of linkage disequilibrium in randomly mating populations. Heredity, 1974. 33(2): p. 229. 95. Friedman, J., T. Hastie, and R. Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw, 2010. 33(1): p. 1-22. 96. Meinshausen, N. and P. Bühlmann, Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2010. 72(4): p. 417-473. 97. Urbanowicz, R.J., et al., Relief-Based Feature Selection: Introduction and Review. arXiv preprint arXiv:1711.08421, 2017. 98. Urbanowicz, R.J., et al., GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min, 2012. 5(1): p. 16. 99. Consortium, G.T., The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013. 45(6): p. 580-5. 100. Liddelow, S.A., et al., Neurotoxic reactive astrocytes are induced by activated microglia. Nature, 2017. 541(7638): p. 481-487. 101. von Bernhardi, R. and G. Ramirez, Microglia-astrocyte interaction in Alzheimer's disease: friends or foes for the nervous system? Biol Res, 2001. 34(2): p. 123-8. 102. Raj, D., et al., Increased White Matter Inflammation in Aging- and Alzheimer's Disease Brain. Front Mol Neurosci, 2017. 10: p. 206. 103. Ward, L.D. and M. Kellis, HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res, 2012. 40(Database issue): p. D930-4. 104. Li, P., et al., Epoxyeicosatrienoic acids enhance embryonic haematopoiesis and adult marrow engraftment. Nature, 2015. 523(7561): p. 468. 105. Kim, E., et al., AMPK gamma2 subunit gene PRKAG2 polymorphism associated with cognitive impairment as well as diabetes in old age. Psychoneuroendocrinology, 2012. 37(3): p. 358-65. 106. Peixoto, C.A., et al., AMPK activation: Role in the signaling pathways of neuroinflammation and neurodegeneration. Exp Neurol, 2017. 298(Pt A): p. 31-41. 107. Whitfield, J.F., et al., The Possible Roles of the Dentate Granule Cell's Leptin and Other Ciliary Receptors in Alzheimer's Neuropathology. Cells, 2015. 4(3): p. 253-74. 108. Vivier, E., E. Tomasello, and P. Paul, Lymphocyte activation via NKG2D: towards a new paradigm in immune recognition? Curr Opin Immunol, 2002. 14(3): p. 306-11. 109. Collins, R.W., Human MHC class I chain related (MIC) genes: their biological function and relevance to disease and transplantation. Eur J Immunogenet, 2004. 31(3): p. 105-14. 110. Quiroga, I., et al., Association study of MICA and MICB in Alzheimer's disease. Tissue Antigens, 2009. 74(3): p. 241-3. 111. Mirza, Z. and N. Rajeh, Identification Of Electrophysiological Changes In Alzheimer's Disease: A Microarray Based Transcriptomics And Molecular Pathway Analysis Study. CNS Neurol Disord Drug Targets, 2017. 112. Saura, C.A., A. Parra-Damas, and L. Enriquez-Barreto, Gene expression parallels synaptic excitability and plasticity changes in Alzheimer's disease. Front Cell Neurosci, 2015. 9: p. 318. 113. Daschil, N., et al., CaV1.2 calcium channel expression in reactive astrocytes is associated with the formation of amyloid-beta plaques in an Alzheimer's disease mouse model. J Alzheimers Dis, 2013. 37(2): p. 439-51. 114. Porcellini, E., et al., Alzheimer's disease gene signature says: beware of brain viral infections. Immun Ageing, 2010. 7: p. 16. 115. Talkowski, M.E., et al., Disruption of a large intergenic noncoding RNA in subjects with neurodevelopmental disabilities. Am J Hum Genet, 2012. 91(6): p. 1128-34. 116. Uhrig, M., et al., New Alzheimer amyloid beta responsive genes identified in human neuroblastoma cells by hierarchical clustering. PLoS One, 2009. 4(8): p. e6779. 117. Wang, W., et al., A Multi-Marker Genetic Association Test Based on the Rasch Model Applied to Alzheimer's Disease. PLoS One, 2015. 10(9): p. e0138223. 118. Babic Leko, M., et al., Predictive Value of Cerebrospinal Fluid Visinin-Like Protein-1 Levels for Alzheimer's Disease Early Detection and Differential Diagnosis in Patients with Mild Cognitive Impairment. J Alzheimers Dis, 2016. 50(3): p. 765-78. 119. Kirkwood, C.M., et al., Altered Levels of Visinin-Like Protein 1 Correspond to Regional Neuronal Loss in Alzheimer Disease and Frontotemporal Lobar Degeneration. J Neuropathol Exp Neurol, 2016. 75(2): p. 175-82. 120. Luo, X., et al., CSF levels of the neuronal injury biomarker visinin-like protein-1 in Alzheimer's disease and dementia with Lewy bodies. J Neurochem, 2013. 127(5): p. 681-90. 121. Pietrzak, M., et al., Epigenetic silencing of nucleolar rRNA genes in Alzheimer's disease. PLoS One, 2011. 6(7): p. e22585. 122. Rouillard, A.D., et al., The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database (Oxford), 2016. 2016. 123. Lambert, J.C., et al., Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet, 2013. 45(12): p. 1452-8. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7174 | - |
dc.description.abstract | 全基因體關聯性分析(Genome-wide association study)是一個偵測基因變異(Genetic variant)與外顯型(Phenotype)之間關聯性的常用方法。然而,全基因體關聯性分析在偵測基因變異之間的交互作用與外顯型的關聯性,或稱為上位作用(Epistasis),的能力有限。我們認為,開發一個有效且有效率的全基因體關聯性分析方法來偵測上位作用,將有助於解開 像是阿茲罕默症(Alzheimer’s disease)等複雜疾病(Complex disease)的致病機制。因此,本研究開發一個演算法:GenEpi,此演算法利用機器學習(Machine learning)來偵測變異之間的交互作用與外顯型的關聯性。由於在生物學上,基因是最小的功能單位,故GenEpi的核心概念便是利用基因(Gene)在基因體中的區段為分割區塊,並分兩個階段進行特徵值的萃取,試圖解決偵測全基因體上位作用計算複雜度(Computational complexity)過高的問題,以及多重檢定(Multiple testing)導致統計信度下降的問題。GenEpi的兩個階段分別為基因內(Within-gene)的上位作用偵測,以及基因間(Cross-gene)的上位作用偵測。在這兩個階段我們皆使用二元組合編碼(Two-element combinatorial encoding)來產生代表上位作用的特徵值,並利用正規化回歸(L1-regularized regression)以及穩定性選擇法(Stability selection)來篩選特徵值並建立模型。本研究將GenEpi運用於阿茲罕默症的資料集上,來預測樣本是否為阿茲罕默症病患或預測阿茲罕默症的病程快慢,藉此驗證演算法成效,並期待因此能進一步揭開更多阿茲罕默症可能的致病因子。不論是在模擬資料或是阿茲罕默症真實資料,結果顯示GenEpi的預測準確度及計算時間皆優於現行的演算法,如:FastEpistasis,BOOST,ReliefF等方法。足見GenEpi將有助於其他全基因體關聯性分析,特別是針對複雜疾病的研究,預期將可提供生醫研究人員進行實驗設計時更多有用的參考資訊。
可用性:GenEpi是一個開源的Python套件,授權給非商業行為的學術人員使用。原始碼已公開在PyPi套件庫,以及GitHub (https://github.com/Chester75321/GenEpi)。 | zh_TW |
dc.description.abstract | Genome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach, and illustrates the application of GenEpi on predicting the diagnosis and the progression of AD. The key concept of GenEpi is a two-stage feature extraction process based on gene structures. Since a gene is the minimal physical and functional unit of heredity, GenEpi considers a gene as a unit to retrieve genetic variants as features. GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. Features are first modeled using combinatorial encoding followed by L1-regularized regression with stability selection to detect the epistasis within a single gene. The selected features for each gene are then pooled together to identify cross-gene epistasis, using L1-regularized regression with stability selection again. This study compared GenEpi with several commonly used algorithms for detecting epistasis, including FastEpistasis, BOOST and ReliefF. The simulation data demonstrated that GenEpi outperforms the other methods in ranking the true epistasis as the top one. As real data is concerned, the results suggested that the epistasis selected by GenEpi has the best predictive power for two major phenotypes in the AD dataset: diagnosis of AD and disease progression. For diagnosis, the proposed model of predicting AD contains three clinical features (Age, Gender and Education) and 14 genetic features, including 24 SNPs from 12 genes that contain the well-known causal gene, APOE. The 2-fold cross validation (CV) and leave-one-out CV (LOO CV) accuracy of this model are 0.83 and 0.81, respectively. On the other hand, for predicting progression, the proposed model contains eight clinical features (Age, Gender, Education, Cognitively Normal (CN), Early Mild Cognitive Impairment (MCI), Late MCI, AD, and MMSE at baseline) and four genetic features, including seven SNPs from six genes, where all of the four genetic features are cross-gene epistasis with significant p-values (< 10-11). The average of Pearson and Spearman correlation of 2-fold CV and LOO CV are 0.52 and 0.53, respectively. The results on AD revealed the capability of GenEpi in finding disease-related variants and epistasis that show both biological meanings and predictive power. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future.
Availability: GenEpi is an open-source Python package and available free of charge for non-commercial users. The package has been published on The Python Package Index, and GitHub (https://github.com/Chester75321/GenEpi) | en |
dc.description.provenance | Made available in DSpace on 2021-05-19T17:39:59Z (GMT). No. of bitstreams: 1 ntu-109-D00945009-1.pdf: 2598518 bytes, checksum: 0c7bd8f450a12de482cd846c34ea8369 (MD5) Previous issue date: 2020 | en |
dc.description.tableofcontents | 中文摘要 2 Abstract 3 List of Figures 6 List of Tables 7 1. Introduction 8 2. Related Works 11 2.1. Introduction of Alzheimer’s Disease 11 2.2. Association of genetic variants and phenotypes 13 2.3. Interaction of genetic variants - Epistasis 14 2.4. The main challenges of detecting epistasis 15 2.5. Statistical and linear-base algorithms 16 2.6. Machine learning approaches for detecting epistasis 18 3. Materials and Methods 22 3.1. Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset 22 3.2. The architecture of GenEpi 23 3.3. University of California Santa Cruz (UCSC) database 24 3.4. Estimation of linkage disequilibrium 25 3.5. Discovery of within-gene epistasis 27 3.6. Discovery of cross-gene epistasis 29 4. Results 31 4.1. Experiments on simulation data 32 4.2. Classifying AD patients 34 4.3. Predicting the progression of AD 37 4.4. Comparison with different algorithms 38 5. Discussion 42 5.1. The cross-gene epistasis of AD selected by GenEpi 42 5.2. The single-gene epistasis of AD selected by GenEpi 44 6. Conclusions 47 References 49 | |
dc.language.iso | en | |
dc.title | 開發以機器學習搭配基因資訊為基礎之上位作用偵測演算法 | zh_TW |
dc.title | Gene-based Epistasis Discovery Using Machine Learning | en |
dc.type | Thesis | |
dc.date.schoolyear | 108-1 | |
dc.description.degree | 博士 | |
dc.contributor.coadvisor | 陳倩瑜 | |
dc.contributor.oralexamcommittee | 賴飛羆,吳君泰,蔡懷寬 | |
dc.subject.keyword | GenEpi,機器學習,上位作用,全基因體關聯性分析,阿茲罕默症, | zh_TW |
dc.subject.keyword | GenEpi,Machine Learning,Epistasis,GWAS,Alzheimer’s disease, | en |
dc.relation.page | 90 | |
dc.identifier.doi | 10.6342/NTU202000613 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2020-02-27 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 生醫電子與資訊學研究所 | zh_TW |
dc.date.embargo-lift | 2025-03-02 | - |
顯示於系所單位: | 生醫電子與資訊學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-109-1.pdf 此日期後於網路公開 2025-03-02 | 2.54 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。