請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51822完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蕭朱杏(Chuhsing Kate Hsiao) | |
| dc.contributor.author | Charlotte Wang | en |
| dc.contributor.author | 王彥雯 | zh_TW |
| dc.date.accessioned | 2021-06-15T13:51:41Z | - |
| dc.date.available | 2021-02-26 | |
| dc.date.copyright | 2016-02-26 | |
| dc.date.issued | 2015 | |
| dc.date.submitted | 2015-09-26 | |
| dc.identifier.citation | Asimit, J. and Zeggini, E. (2010). Rare variant association analysis methods for complex traits. Annual Review of Genetics 44, 293–308.
Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In Solomon, H., editor, Studies in Item Analysis and Prediction, pages 158–168. Stanford University Press. Bansal, V., Libiger, O., Torkamani, A., and Schork, N. J. (2010). Statistical analysis strategies for association studies involving rare variants. Nature Reviews Genetics 11, 773–785. Barrett, J. C., Fry, B., Maller, J., and Daly, M. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265. Bhatia, G., Bansal, V., Harismendy, O., Schork, N. J., Topol, E. J., Frazer, K., and Bafna, V. (2010). A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Computational Biology 6, e1000954. Chen, L. H. (1975). Poisson approximation for dependent trials. The Annals of Probability 3, 534–545. Chen, L. S., Hutter, C. M., Potter, J. D., Liu, Y., Prentice, R. L., Peters, U., and Hsu, L. (2010). Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. The American Journal of Human Genetics 86, 860–871. Chen, P.-C., Huang, S.-Y., Chen, W. J., and Hsiao, C. K. (2009). A new regularized least squares support vector regression for gene selection. BMC Bioinformatics 10, 44. Cirulli, E. T. and Goldstein, D. B. (2010). Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Reviews Genetics 11, 415–425. Cunnington, M. S., Koref, M. S., Mayosi, B. M., Burn, J., and Keavney, B. (2010). Chromosome 9p21 SNPs associated with multiple disease phenotypes correlate with ANRIL expression. PLoS Genetics 6, e1000899. Derkach, A., Lawless, J. F., Sun, L., et al. (2014). Pooled association tests for rare genetic variants: a review and some new results. Statistical Science 29, 302–321. Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. The Annals of Applied Statistics 1, 107–129. Ehm, W. (1991). Binomial approximation to the Poisson binomial distribution. Statistics & Probability Letters 11, 7–16. Emrich, L. J. and Piedmonte, M. R. (1991). A method for generating highdimensional multivariate binary variates. The American Statistician 45, 302–304. ENCODE Project Consortium (2004). The ENCODE (ENCyclopedia of DNA elements) project. Science 306, 636–640. Goldstein, D. B. et al. (2009). Common genetic variation and human traits. New England Journal of Medicine 360, 1696. Hamming, R. W. (1950). Error detecting and error correcting codes. Bell System Technical Journal 26, 147–160. Hardy, J. and Singleton, A. (2009). Genomewide association studies and human disease. New England Journal of Medicine 360, 1759–1768. Hodges, J. L. and Le Cam, L. (1960). The Poisson approximation to the Poisson binomial distribution. The Annals of Mathematical Statistics 31, 737–740. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics 19, 293–325. Hu, J. and Tzeng, J.-Y. (2014). Integrative gene set analysis of multi-platform data with sample heterogeneity. Bioinformatics page btu060. Huang, H., Chanda, P., Alonso, A., Bader, J. S., and Arking, D. E. (2011). Genebased tests of association. PLoS Genetics 7, e1002177. Huang, Y.-H., Lee, M.-H., Chen, W. J., and Hsiao, C. K. (2011). Using an uncertainty-coding matrix in Bayesian regression models for haplotype-specific risk detection in family association studies. PLoS ONE 6, e21890. Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304. Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification 2, 193–218. Huson, D. H., Rupp, R., and Scornavacca, C. (2010). Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press. Ionita-Laza, I., Lee, S., Makarov, V., Buxbaum, J. D., and Lin, X. (2013). Sequence kernel association tests for the combined effect of rare and common variants. The American Journal of Human Genetics 92, 841–853. Kao, W.-H. (2011). Using Hamming distance for SNP sets clustering analysis. Master’s thesis, National Taiwan University, Taiwan. Lee, A. (1990). U-Statistics: Theory and Practice. Statistics: A Series of Textbooks and Monographs. Taylor & Francis. Lee, I., Blom, U. M., Wang, P. I., Shim, J. E., and Marcotte, E. M. (2011). Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Research 21, 1109–1121. Lee, M.-H., Tzeng, J.-Y., Huang, S.-Y., and Hsiao, C. K. (2011). Combining an evolution-guided clustering algorithm and haplotype-based LRT in family association studies. BMC Genetics 12, 48. Lehmann, E. (1999). Elements of Large-Sample Theory. Springer Texts in Statistics. Springer. Lehmann, E. L. (1951). Consistency and unbiasedness of certain nonparametric tests. The Annals of Mathematical Statistics 22, 165–179. Lewontin, R. C. and Kojima, K.-I. (1960). The evolutionary dynamics of complex polymorphisms. Evolution 14, 485–472. Li, H. (2012). U-statistics in genetic association studies. Human Genetics 131, 1395–1401. Li, J., Das, K., Fu, G., Li, R., and Wu, R. (2011). The Bayesian lasso for genomewide association studies. Bioinformatics 27, 516–523. Liu, Y., Li, M., Cheung, Y. M., Sham, P. C., and Ng, M. K. (2010). SKM-SNP: SNP markers detection method. Journal of Biomedical Informatics 43, 233–239. Ma, L., Clark, A. G., and Keinan, A. (2013). Gene-based testing of interactions in association studies of quantitative traits. PLoS Genetics 9, e1003321. Malo, N., Libiger, O., and Schork, N. J. (2008). Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. The American Journal of Human Genetics 82, 375–385. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval, volume 1. Cambridge University press Cambridge. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., Cardon, L. R., Chakravarti, A., et al. (2009). Finding the missing heritability of complex diseases. Nature 461, 747–753. McClellan, J. and King, M.-C. (2010). Genetic heterogeneity in human disease. Cell 141, 210–217. Meyer, A. d. S., Garcia, A. A. F., Souza, A. P. d., and Souza Jr, C. L. d. (2004). Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (zea mays l). Genetics and Molecular Biology 27, 83–91. Mukhopadhyay, I., Feingold, E., Weeks, D. E., and Thalamuthu, A. (2010). Association tests using kernel-based measures of multi-locus genotype similarity between individuals. Genetic Epidemiology 34, 213–221. Nguyễn, L. B., Diskin, S. J., Capasso, M., Wang, K., Diamond, M. A., Glessner, J., Kim, C., Attiyeh, E. F., Mosse, Y. P., Cole, K., et al. (2011). Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility loci. PLoS Genetics 7, e1002026. Petersen, A., Alvarez, C., DeClaire, S., and Tintle, N. L. (2013). Assessing methods for assigning snps to genes in gene-based tests of association using common variants. PLoS ONE 8, e62161. Pinheiro, H. P., de Souza Pinheiro, A., and Sen, P. K. (2005). Comparison of genomic sequences using the Hamming distance. Journal of Statistical Planning and Inference 130, 325–339. Pritchard, J. K. (2001). Are rare variants responsible for susceptibility to complex diseases? The American Journal of Human Genetics 69, 124–137. Ramanan, V. K., Shen, L., Moore, J. H., and Saykin, A. J. (2012). Pathway analysis of genomic data: concepts, methods, and prospects for future development. Trends in Genetics 28, 323–332. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66, 846–850. Risch, N., Merikangas, K., et al. (1996). The future of genetic studies of complex human diseases. Science 273, 1516–1517. Schaffner, S. F., Foo, C., Gabriel, S., Reich, D., Daly, M. J., and Altshuler, D. (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome research 15, 1576–1583. Schaid, D. J., McDonnell, S. K., Hebbring, S. J., Cunningham, J. M., and Thibodeau, S. N. (2005). Nonparametric tests of association of multiple genes with human disease. The American Journal of Human Genetics 76, 780–793. Schork, N. J., Murray, S. S., Frazer, K. A., and Topol, E. J. (2009). Common vs. rare allele hypotheses for complex diseases. Current Opinion in Genetics & Development 19, 212–219. Schunkert, H., Götz, A., Braund, P., McGinnis, R., Tregouet, D.-A., Mangino, M., Linsel-Nitschke, P., Cambien, F., Hengstenberg, C., Stark, K., et al. (2008). Repeated replication and a prospective meta-analysis of the association between chromosome 9p21. 3 and coronary artery disease. Circulation 117, 1675–1684. Selinski, S. and Ickstadt, K. (2008). Cluster analysis of genetic and epidemiological data in molecular epidemiology. Journal of Toxicology and Environmental Health, Part A 71, 835–844. Soon, S. Y. (1996). Binomial approximation for dependent indicators. Statistica Sinica 6, 703–714. Soybean (small) data set (1987). UCI Machine Learning Repository. http:// archive.ics.uci.edu/ml/datasets/Soybean+(Small). accessed 2013-02-03. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550. The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678. Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 411–423. Tzeng, J.-Y., Zhang, D., Chang, S.-M., Thomas, D. C., and Davidian, M. (2009). Gene-trait similarity regression for multimarker-based association analysis. Biometrics 65, 822–832. Tzeng, J.-Y., Zhang, D., Pongpanich, M., Smith, C., McCarthy, M. I., Sale, M. M., Worrall, B. B., Hsu, F.-C., Thomas, D. C., and Sullivan, P. F. (2011). Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. The American Journal of Human Genetics 89, 277–288. Tzeng, S.-L., Wu, H.-M., and Chen, C.-H. (2009). Selection of proximity measures for matrix visualization of binary data. In Biomedical Engineering and Informatics, 2009. BMEI’09. 2nd International Conference on, pages 1–9. IEEE. Van Der Sluis, S., Verhage, M., Posthuma, D., and Dolan, C. V. (2010). Phenotypic complexity, measurement bias, and poor phenotypic resolution contribute to the missing heritability problem in genetic association studies. PLoS ONE 5, e13929. Wang, C., Kao, W.-H., and Hsiao, C. K. (2015). Using Hamming distance as information for SNP-sets clustering and testing in disease association studies. PLoS ONE 10, e0135918. Wang, J. (2010). Consistent selection of the number of clusters via crossvalidation. Biometrika 97, 893–904. Wang, K., Li, M., and Hakonarson, H. (2010). Analysing biological pathways in genome-wide association studies. Nature Reviews Genetics 11, 843–854. Wei, Z., Li, M., Rebbeck, T., and Li, H. (2008). U-statistics-based tests for multiple genes in genetic association studies. Annals of Human Genetics 72, 821–833. Wessel, J. and Schork, N. J. (2006). Generalized genomic distance–based regression methodology for multilocus association analysis. The American Journal of Human Genetics 79, 792–806. Witten, D. M. and Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association 105,. Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies. The American Journal of Human Genetics 86, 929–942. Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics 89, 82–93. Yan, M. and Ye, K. (2007). Determining the number of clusters using the weighted gap statistic. Biometrics 63, 1031–1037. Zhang, P., Wang, X., and Song, P. X.-K. (2006). Clustering categorical data based on distance vectors. Journal of the American Statistical Association 101, 355–367. Zhou, H., Sehl, M. E., Sinsheimer, J. S., and Lange, K. (2010). Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26, 2375–2382. Zoo Data Set (1990). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Zoo. accessed 2013-02-03. Zuk, O., Schaffner, S. F., Samocha, K., Do, R., Hechter, E., Kathiresan, S., Daly, M. J., Neale, B. M., Sunyaev, S. R., and Lander, E. S. (2014). Searching for missing heritability: designing rare variant association studies. Proceedings of the National Academy of Sciences 111, E455–E464. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51822 | - |
| dc.description.abstract | 在生物科技快速發展的現代,許多人嘗試要透過遺傳相關的研究找出疾病發生的先天個人因素,用以解釋發病原因在類似環境因素下仍存在的人與人之間的差異。在常見的病例對照研究(case-control study)中,遺傳性相關研究(genetic association study)會透過檢視病例組(case)與對照組(control)的單一核苷酸多型性(single nucleotide polymorphism, SNP)的差異來探討與疾病有關的基因標記(marker)或致病基因(causal SNP),試圖找出可能與發病有關的遺傳位點。針對這類型的研究,目前統計方法著重於發展標記集之檢定方法(marker-set analysis),如:單一核{Kaiti 苷}酸多型性集合之分析(SNP-set analysis),因此類型方法的檢定力(power)較好,且又可同時考慮一群單一核苷酸多型性彼此之間的交互作用。而目前這類型的方法,多是聚焦於事前已經決定好之集合,或是利用滑動視窗(sliding window)的方式將整個染色體分成小區段進行檢定,尚無在先驗資訊未知之下先定義出集合再進行檢定之方法。因此,在本研究中,將以單一核苷酸多型性之集合的角度出發,先透過群聚分析(cluster analysis)定義出單一核苷酸多型性之集合,接著再利用此集合進行遺傳相關性檢定。
在研究者沒有任何先驗知識(prior knowledge)之下,我們提出一種利用漢明距離(Hamming distance)為單一核苷酸多型性之間相似度測量(similarity measure)的階層式分群演算法(hierarchical clustering algorithm),配合最大差異性樹狀圖(dendrogram)節點(node)的篩選,定義出單一核苷酸多型性之集合;之後,一樣利用漢明距離,比較病例組與對照組間在此單一核苷酸多型性之集合上的差異性,利用此差異在組內(within group)與組間(between group)分佈的不同創造出新的統計量進行統計檢定(Hamming distance-based association test, HDAT),針對常見變異(common variants)與罕見變異(rare variants)的特性差異,本研究中也分別提出兩種不同的檢定統計量用以進行遺傳相關性檢定,此統計量為U-statistic的一種,本研究中也推導出其相關之統計性質與大樣本理論。 在實際資料的應用上,所提出的分群演算法可以正確定義出恰當的分群結果,且與其他方法相較,是較有效率的演算法,可以花費較少的時間得到分群的結果,此方法不僅適用於遺傳的資料,也可應用於類別型態的資料(categorical data);從一些模擬實驗來看,所提出的分群演算法配合最大差異性樹狀圖節點的篩選,可以正確地將有相關性的單一核苷酸多型性(correlated SNPs)分成一群,而排除沒有相關的單一核苷酸多型性。針對HDAT,在常見變異的部分,從一些模擬結果也可看出,不論訊雜比(signal-to-noise ratio)為多少(與疾病有關的單一核苷酸多型性和與疾病無關的單一核苷酸多型性的比例)、樣本數大小為何、集合中的單一核苷酸多型性是否對疾病具有一致的影響力(effect)都可以有不錯的檢定力(power),型一誤差(type I error)也可以控制在一定的範圍內;此外,也針對WTCCC研究中的冠狀動脈心臟病(Coronary artery disease, CAD)的資料進行分析,先進行群聚分析後再進行相關性檢定,可以找出一組單一核苷酸多型性之集合與疾病有關,此集合中的四個單一核苷酸多型性也曾在文獻中被提到與冠狀動脈心臟病有相關。在罕見變異的部分,從模擬的結果也可看出,不論訊雜比(signal-to-noise ratio)為多少、樣本數大小為何、病例組與對照組樣本數比例為何都可以有不錯的檢定力,型一誤差也可以控制在一定的範圍內。 本研究中所提出的分群演算法可以找出單一核苷酸多型性潛在的群聚特徵,並依照此分群結果進行HDAT相關性檢定能得到更佳的檢定力;不論從模擬研究的結果或實際資料的應用上來看,所提出的方法都可以有不錯且穩定的表現。此外,本研究同時針對Hamming distance的統計性質進行進一步的討論。然而,本研究中所提出的HDAT在分析常見變異時,若病例組與對照組樣本數不相等,表現較不如其他統計方法,針對此問題,檢定統計量需要進行一些改良;而如何將其他非類別型態的疾病相關因子納入考慮,也是另一個重要的議題。 | zh_TW |
| dc.description.abstract | With the advance in biotechnology, many researchers try to identify disease-associated markers through genetic association studies. In recent genetic association studies, developing methods to reduce intractably large numbers of genetic variants in genomic data to more computationally manageable numbers and finding ways to increase the power of statistical tests used in association studies have been two major challenges. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. Such method can also evaluate joint effect of grouped SNPs in a pre-specified genomic region. Nowadays, most association tests, however, figure out possible marker sets based on testing pre-specified SNP sets or testing through sliding window for whole genome. It seems that no combined procedure to define SNP sets in advance than to test association between SNP sets and the disease of interest.
To construct SNP sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs should be clustered. We also recommend a rule-of-thumb to determine the number of clusters after a dendrogram is produced. With the SNP sets obtained, we next develop an association test to examine susceptibility to the disease of interest. For common variants, this proposed test assesses, based on Hamming distance, whether the similarity in genotypes between a diseased and a normal individual differs from the similarity between two individuals with the same disease status. For rare variants, the proposed test evaluates whether the similarity in genotypes within the case group differs from the similarity within the control group. These two statistics are $U$-statistics, and their statistical properties and limiting behaviors are also discussed. Additionally, simulation studies and real data applications were conducted to demonstrate the performance of our proposed methods. The results showed that the Hamming distance-based clustering algorithm can identify correct clustering patterns and is also an efficient algorithm. This method can be applied not only to genetic data, but also to categorical data in general. Additionally, for common variants, the Hamming distance-based association test (HDAT) works well regardless of the sample size, effects of SNPs within the given set, and the signal-to-noise ratio (proportion of the number of disease-associated SNPs to the number of neutral SNPs). Moreover, for genotyping data of coronary artery disease (CAD) from the WTCCC, our proposed methods found one SNP set with four SNPs were associated with the disease. These four SNPs have been reported in literatures. For rare variants, the numerical results demonstrated that the HDAT works well in spite of the sample size, the case-to-control ratio, and the signal-to-noise ratio. To conclude, the proposed clustering algorithm and association test are illustrated with simulations and a genome-wide association study, and the results indicate reliable and satisfactory performance. In our proposed methodology, no inference of haplotypes is needed, and SNPs under consideration do not need to be linked. Specifically, this test works well for a SNP-set containing both SNPs with a deleterious effect and those with a protective effect, and for a set containing many neutral SNPs. Moreover, the statistical properties of the proposed methods are discussed. However, some issues remain unsolved. First, for common variants, some extensions of the HDAT to imbalanced sizes of the case and control group need to be studied. Second, even though categorical disease-related factors can be consider as pseudo genetic markers, how to incorporate disease-related factors, such as environmental factors and personal characteristics, still need to be studies. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-15T13:51:41Z (GMT). No. of bitstreams: 1 ntu-104-D99849002-1.pdf: 1859640 bytes, checksum: cde420cbaf35f8306e7ff472313eab3b (MD5) Previous issue date: 2015 | en |
| dc.description.tableofcontents | 中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . v 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Genetic Association Study . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Cluster Analysis for Genetic Sequencing Data . . . . . . . . . . . . . . 7 1.3 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Distance Metrics for Categorical Data . . . . . . . . . . . . . . . 10 1.3.2 Distance Metrics for Genetic Data . . . . . . . . . . . . . . . . . 13 1.4 Motivation and Aim of the Study . . . . . . . . . . . . . . . . . . . . . 18 1.5 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.6 Real Data Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2 Statistical Properties of Hamming Distance Statistic . . . . . . . 25 2.1 HD Statistic for Clustering . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 HD statistic for Paired Individuals . . . . . . . . . . . . . . . . . . . . 29 2.3 Asymptotic Distribution of HDAT . . . . . . . . . . . . . . . . . . . . 33 3 Clustering SNP-sets with Hamming Distance . . . . . . . . . . . . . . . . 41 3.1 Hamming Distance-based Clustering Algorithm . . . . . . . . . . . . . 42 3.2 Modified Merging Procedure and Ties . . . . . . . . . . . . . . . . . . 43 3.3 Select SNP Sets for Association Studies . . . . . . . . . . . . . . . . . 45 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.1 Indices for Cluster Validation . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Real Data Applications . . . . . . . . . . . . . . . . . . . . . . . 55 4 SNP-set Association Test via Hamming Distance for CV . . . . . . . 71 4.1 Hamming Distance-based Association Test . . . . . . . . . . . . . . . . 72 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.1 Simulation Studies for Association Tests . . . . . . . . . . . . . . 74 4.2.2 Simulation Studies for Combined Procedures . . . . . . . . . . . 79 4.2.3 Real Data Applications . . . . . . . . . . . . . . . . . . . . . . . 85 5 SNP-set Association Test via Hamming Distance for RV . . . . . . 93 5.1 HDAT for Rare Variants . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.1 Other Applications of HD-Cluster . . . . . . . . . . . . . . . . . . . . . 105 6.2 Comparison with LD Pattern and Haplotype Analysis . . . . . . . . . 110 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 A Illustration of Selecting SNP Sets . . . . . . . . . . . . . . . . . . . . . . . . . 127 B Theory of U-statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 C Proof of Statistical Properties of HDAT . . . . . . . . . . . . . . . . . . . . 135 | |
| dc.language.iso | en | |
| dc.subject | 常見變異 | zh_TW |
| dc.subject | 罕見變異 | zh_TW |
| dc.subject | 樹狀圖 | zh_TW |
| dc.subject | 漢明距離 | zh_TW |
| dc.subject | 單一核?酸多型性之集合 | zh_TW |
| dc.subject | 相似度 | zh_TW |
| dc.subject | 遺傳相關性檢定 | zh_TW |
| dc.subject | 群聚分析 | zh_TW |
| dc.subject | SNP set | en |
| dc.subject | Association test | en |
| dc.subject | clustering analysis | en |
| dc.subject | common variants | en |
| dc.subject | dendrogram | en |
| dc.subject | Hamming distance | en |
| dc.subject | rare variants | en |
| dc.subject | similarity | en |
| dc.title | 利用漢明距離偵測單一核苷酸多型性之群聚與單一核苷酸多型性集合之相關性檢定 | zh_TW |
| dc.title | SNP-set Detection and Association Test with Hamming Distance Information | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 104-1 | |
| dc.description.degree | 博士 | |
| dc.contributor.oralexamcommittee | 黃冠華(Guan-Hua Huang),程毅豪(Yi-Hau Chen),陳為堅(Wei J. Chen),杜裕康(Yu-Kang Tu),洪弘(Hung Hung) | |
| dc.subject.keyword | 遺傳相關性檢定,群聚分析,常見變異,樹狀圖,漢明距離,罕見變異,相似度,單一核?酸多型性之集合, | zh_TW |
| dc.subject.keyword | Association test,clustering analysis,common variants,dendrogram,Hamming distance,rare variants,similarity,SNP set, | en |
| dc.relation.page | 139 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2015-09-30 | |
| dc.contributor.author-college | 公共衛生學院 | zh_TW |
| dc.contributor.author-dept | 流行病學與預防醫學研究所 | zh_TW |
| 顯示於系所單位: | 流行病學與預防醫學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-104-1.pdf 未授權公開取用 | 1.82 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
