以獨立性偏差篩選基因交互作用之演算法

Pin-Cian Wang; 王品蒨

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63749

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	高成炎,莊曜宇,陳佩君
dc.contributor.author	Pin-Cian Wang	en
dc.contributor.author	王品蒨	zh_TW
dc.date.accessioned	2021-06-16T17:18:08Z	-
dc.date.available	2014-08-20
dc.date.copyright	2012-08-20
dc.date.issued	2012
dc.date.submitted	2012-08-17
dc.identifier.citation	[1.] Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921. [2.] Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51. [3.] Altshuler, D., M.J. Daly, and E.S. Lander, Genetic mapping in human disease. Science, 2008. 322(5903): p. 881-8. [4.] Kruglyak, L. and D.A. Nickerson, Variation is the spice of life. Nature Genetics, 2001. 27(3): p. 234-236. [5.] Stephens, J.C., et al., Haplotype variation and linkage disequilibrium in 313 human genes. Science, 2001. 293(5529): p. 489-493. [6.] Reich, D.E., S.B. Gabriel, and D. Altshuler, Quality and completeness of SNP databases. Nature Genetics, 2003. 33(4): p. 457-8. [7.] Pemberton, T.J., et al., Inference of unexpected genetic relatedness among individuals in HapMap Phase III. Am J Hum Genet, 2010. 87(4): p. 457-64. [8.] Rioux, J.D., et al., Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genetics, 2007. 39(5): p. 596-604. [9.] Immervoll, T., et al., Fine mapping and single nucleotide polymorphism association results of candidate genes for asthma and related phenotypes. Human Mutation, 2001. 18(4): p. 327-336. [10.] Colomb, E., et al., Association of a single nucleotide polymorphism in the TIGR/MYOCILIN gene promoter with the severity of primary open-angle glaucoma. Clin Genet, 2001. 60(3): p. 220-5. [11.] Tan, F.K., et al., Association of fibrillin 1 single-nucleotide polymorphism haplotypes with systemic sclerosis in Choctaw and Japanese populations. Arthritis Rheum, 2001. 44(4): p. 893-901. [12.] Zhu, Y., et al., A single nucleotide polymorphism in the matrix metalloproteinase-1 promoter enhances lung cancer susceptibility. Cancer Res, 2001. 61(21): p. 7825-9. [13.] Biros, E., et al., Polymorphism of the p53 gene within the codon 72 in lung cancer patients. Neoplasma, 2001. 48(5): p. 407-11. [14.] Kubota, T., et al., Evidence for a single nucleotide polymorphism in the KCNQ1 potassium channel that underlies susceptibility to life-threatening arrhythmias. J Cardiovasc Electrophysiol, 2001. 12(11): p. 1223-9. [15.] Donn, R.P., et al., A novel 5'-flanking region polymorphism of macrophage migration inhibitory factor is associated with systemic-onset juvenile idiopathic arthritis. Arthritis Rheum, 2001. 44(8): p. 1782-5. [16.] Koschinsky, M.L., et al., Association of a single nucleotide polymorphism in CPB2 encoding the thrombin-activable fibrinolysis inhibitor (TAF1) with blood pressure. Clin Genet, 2001. 60(5): p. 345-9. [17.] Martin, N., D. Boomsma, and G. Machin, A twin-pronged attack on complex traits. Nature Genetics, 1997. 17(4): p. 387-392. [18.] Hirschhorn, J.N. and M.J. Daly, Genome-wide association studies for common diseases and complex traits. Nat Rev Genet, 2005. 6(2): p. 95-108. [19.] Ziegler, A., I.R. Konig, and J.R. Thompson, Biostatistical aspects of genome-wide association studies. Biom J, 2008. 50(1): p. 8-28. [20.] Cochran, W.G., Some Methods for Strengthening the Common X2 Tests. Biometrics, 1954. 10(4): p. 417-451. [21.] Armitage, P., Tests for Linear Trends in Proportions and Frequencies. Biometrics, 1955. 11(3): p. 375-386. [22.] Balding, D.J., A tutorial on statistical methods for population association studies. Nat Rev Genet, 2006. 7(10): p. 781-91. [23.] Hirschhorn, J.N., et al., A comprehensive review of genetic association studies. Genetics in Medicine, 2002. 4(2): p. 45-61. [24.] Finckh, U., The future of genetic association studies in Alzheimer disease. Journal of Neural Transmission, 2003. 110(3): p. 253-266. [25.] Altmuller, J., et al., Genomewide scans of complex human diseases: True linkage is hard to find. American Journal of Human Genetics, 2001. 69(5): p. 936-950. [26.] Moore, J.H. and S.M. Williams, New strategies for identifying gene-gene interactions in hypertension. Ann Med, 2002. 34(2): p. 88-95. [27.] Moore, J.H., A global view of epistasis. Nature Genetics, 2005. 37(1): p. 13-14. [28.] Yang, P., et al., A genetic ensemble approach for gene-gene interaction identification. BMC Bioinformatics, 2010. 11: p. 524. [29.] Motsinger, A.A. and M.D. Ritchie, Multifactor dimensionality reduction: an analysis strategy for modelling and detecting gene-gene interactions in human genetics and pharmacogenomics studies. Hum Genomics, 2006. 2(5): p. 318-28. [30.] Cho, Y.M., et al., Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus. Diabetologia, 2004. 47(3): p. 549-54. [31.] Ritchie, M.D., L.W. Hahn, and J.H. Moore, Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol, 2003. 24(2): p. 150-7. [32.] Hahn, L.W., M.D. Ritchie, and J.H. Moore, Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics, 2003. 19(3): p. 376-82. [33.] Velez, D.R., et al., A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol, 2007. 31(4): p. 306-15. [34.] Winham, S., C. Wang, and A.A. Motsinger-Reif, A comparison of multifactor dimensionality reduction and L1-penalized regression to identify gene-gene interactions in genetic association studies. Stat Appl Genet Mol Biol, 2011. 10(1): p. Article 4. [35.] Goodman, J.E., et al., Exploring SNP-SNP interactions and colon cancer risk using polymorphism interaction analysis. Int J Cancer, 2006. 118(7): p. 1790-7. [36.] Mechanic, L.E., et al., Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions. BMC Bioinformatics, 2008. 9: p. 146. [37.] Wan, X., et al., BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet, 2010. 87(3): p. 325-40. [38.] Zhang, X., et al., TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics, 2010. 26(12): p. i217-27. [39.] Yang, C., et al., SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics, 2009. 25(4): p. 504-11. [40.] Zhang, Y. and J.S. Liu, Bayesian inference of epistatic interactions in case-control studies. Nature Genetics, 2007. 39(9): p. 1167-73. [41.] Wan, X., et al., MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study. BMC Bioinformatics, 2009. 10: p. 13. [42.] Yoshida, M. and A. Koike, SNPInterForest: A new method for detecting epistatic interactions. BMC Bioinformatics, 2011. 12. [43.] Breiman, L., et al., Classification and Regression Tree. 1984. [44.] Wang, Y., et al., AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes, 2010. 3: p. 117. [45.] Wan, X., et al., Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 2010. 26(1): p. 30-7. [46.] Wang, Y., et al., An empirical comparison of several recent epistatic interaction detection methods. Bioinformatics, 2011. 27(21): p. 2936-2943. [47.] Nannya, Y., et al., Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project. Human Molecular Genetics, 2007. 16(20): p. 2494-2505. [48.] Culverhouse, R., et al., A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet, 2002. 70(2): p. 461-71. [49.] Fung, H.C., et al., Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol, 2006. 5(11): p. 911-6. [50.] Piegorsch, W.W., C.R. Weinberg, and J.A. Taylor, Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med, 1994. 13(2): p. 153-62. [51.] Yang, Q., et al., Case-only design to measure gene-gene interaction. Epidemiology, 1999. 10(2): p. 167-70. [52.] Robnik-Sikonja, M. and I. Kononenko, Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 2003. 53(1-2): p. 23-69. [53.] Moore, J. and B. White, Tuning ReliefF for Genome-Wide Genetic Analysis Evolutionary Computation,Machine Learning and Data Mining in Bioinformatics, E. Marchiori, J. Moore, and J. Rajapakse, Editors. 2007, Springer Berlin / Heidelberg. p. 166-175.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63749	-
dc.description.abstract	全基因組關聯研究 (genome-wide association studies, GWAS)為基因流行病學(genetic epidemiology)中典型的研究設計，用以偵測與疾病相關的基因分子。其基因資料，多半使用微陣列晶片技術偵測的單一核苷酸多型性（single nucleotide polymorphisms, SNPs）的實驗結果作為資料來源。分析的部分，則利用統計方法比較SNPs中各基因型(genotype) 裡疾病與對照組樣本數分布差異，找出可能與疾病相關的SNPs。以往GWAS多半針對單一SNP與疾病的相關性，但複雜疾病(complex diseases) 通常導因為基因之間或基因與環境因子之間存在的交互作用。現有偵測SNPs之間交互作用的方法，多屬於窮舉搜尋法(exhaustive search)，如多因子降維法(Multifactor Dimensionality Reduction, MDR)，針對每一種可能的組合做運算，因此只適合探討少量SNP中的交互作用。本研究目的是建立一個篩選的機制，從大量SNPs資料中篩出一個候選SNPs集合 (candidate SNP set)，而此集合的SNPs有較高的機會存在對疾病有影響的交互作用。方法的建構是根據機率的獨立性，利用兩個單一SNP在樣本中分布的頻率，計算假設兩者之間獨立時，兩SNPs同時出現在樣本的頻率期望值。而另一方面算出兩SNPs同時出現在樣本的真實頻率值。根據真實值與期望值的偏差(deviance)，針對每一個成對的SNP組合，建立出一個獨立性偏差值(Deviance of Independence, DOI)，以部分反映此組合的交互作用程度。DOI演算法主要是針對GWAS裡，不具邊際效應(marginal effect)的SNP資料而設計。用來篩出那些在一階檢定不顯著，但經組合之後，能對樣本的疾病狀態有更大的鑑別力的SNPs。藉由模擬資料(simulation data)與真實資料 (real data application)的測試，我們發現利用DOI演算法進行篩選後，可以從中找出顯著的SNPs組合。此研究利用模擬資料，發現DOI演算法有良好與穩定的預測效果。另外在真實資料的實作上，在DOI篩選過後的SNPs集合中，可能可以找出有意義的SNPs組合。因此，DOI演算法為有效篩選基因交互作用的方法。	zh_TW
dc.description.abstract	Genome-wide association studies (GWAS) are commonly used study designs in genetic epidemiology to identify the genetic factors associated with diseases. Most of GWAS adopted single-locus strategy to analyze the association between individual single nucleotide polymorphism (SNP) and diseases. However, complex diseases may cause by one single gene but the gene-gene or gene- environment interactions. Exhaustive search methods, such as multifactor dimensionality reduction (MDR), are popular for detecting gene-gene interactions. Such kinds of methods require enormous computations and therefore are only feasible for small number of SNPs. As a result, this study aims to construct a filtering criterion for a candidate SNP set from large number of SNPs based on the independency of SNPs, called the deviance of independent (DOI). We apply DOI in GWAS data to filter those SNPs without marginal effect individually but have better ability to discriminate between cases and controls when they pool together. We use simulation and real data to examine DOI performance. The simulation results show that SNPs with interactions are along with higher DOI values. In addition, the 2-way and 3-way gene-gene interactions in a real data are examined as well. And the results demonstrate that possible interactions can be identified after using DOI value as filter criteria. In sum, DOI algorithm is a powerful tool to filter a candidate gene set for further interaction analysis.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T17:18:08Z (GMT). No. of bitstreams: 1 ntu-101-R99945017-1.pdf: 999891 bytes, checksum: 06ab7dca89eba021de33499bd64d5739 (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iv CONTENTS v LIST OF FIGURES vii LIST OF TABLES viii Chapter 1 Introduction 1 Chapter 2 Methods 10 2.1 MDR 10 2.2 Data source 11 2.3 Principle of Deviance of Independence (DOI) algorithm 13 Chapter 3 Simulation 18 3.1 Parameters 18 3.2 Preliminary test 19 3.3 Performance evaluation 22 Chapter 4 Real data application 26 4.1 Using DOI to filter a candidate set for interaction identification 26 4.1.1 DOI and chi-square test 26 4.1.2 DOI and MDR 36 4.2 High-order interaction identification 40 Chapter 5 Discussion 43 REFERENCES 47
dc.language.iso	en
dc.subject	單一核&#33527	zh_TW
dc.subject	獨立性偏差	zh_TW
dc.subject	基因間交互作用	zh_TW
dc.subject	全基因組關聯研究	zh_TW
dc.subject	酸多型性	zh_TW
dc.subject	Single nucleotide polymorphism	en
dc.subject	genome-wide association study	en
dc.subject	gene-gene interaction	en
dc.subject	deviance of independence	en
dc.title	以獨立性偏差篩選基因交互作用之演算法	zh_TW
dc.title	An Algorithm for Gene-Gene Interaction via Deviance of Independence	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	賴亮全,蔡孟勳
dc.subject.keyword	單一核&#33527,酸多型性,全基因組關聯研究,基因間交互作用,獨立性偏差,	zh_TW
dc.subject.keyword	Single nucleotide polymorphism,genome-wide association study,gene-gene interaction,deviance of independence,	en
dc.relation.page	49
dc.rights.note	有償授權
dc.date.accepted	2012-08-18
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	976.46 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。