以Q值適性結合法來指出罕見致病變異

Jen-Yi Li; 李貞儀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/3772

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林菀俞
dc.contributor.author	Jen-Yi Li	en
dc.contributor.author	李貞儀	zh_TW
dc.date.accessioned	2021-05-13T08:36:37Z	-
dc.date.available	2016-08-26
dc.date.available	2021-05-13T08:36:37Z	-
dc.date.copyright	2016-08-26
dc.date.issued	2016
dc.date.submitted	2016-08-10
dc.identifier.citation	Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. 2010. A map of human genome variation from population-scale sequencing. Nature 467(7319):1061-73. Almasy L, Dyer TD, Peralta JM, Kent JW, Jr., Charlesworth JC, Curran JE, Blangero J. 2011. Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc 5 Suppl 9:S2. Basu S, Pan W. 2011. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 35(7):606-19. Benaglia T, Chauveau D, Hunter DR. 2009. An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures. Computational and Graphical Statistics 18:505-526. Benjamini Y, Hochberg Y. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. B. 57:289-300. Bertram L, Tanzi RE. 2009. Genome-wide association studies in Alzheimer’s disease. Hum Mol Genet 18(R2):R137-45. Besag J, Clifford P. 1991. Sequential Monte Carlo p-values. Biometrika 78:301-304. Byrnes AE, Wu MC, Wright FA, Li M, Li Y. 2013. The value of statistical or bioinformatics annotation for rare variant association with quantitative trait. Genet Epidemiol 37(7):666-74. Cheung YH, Wang G, Leal SM, Wang S. 2012. A fast and noise-resilient approach to detect rare-variant associations with deep sequencing data for complex disorders. Genet Epidemiol 36(7):675-85. Cirulli ET, Goldstein DB. 2010. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11(6):415-25. Davies R. 1980. The distribution of a linear combination of chi-square random variables. J. R. Stat. Soc. Ser. C Appl. Stat. 29:323-333. Derkach A, Lawless JF, Sun L. 2013. Robust and powerful tests for rare variants using Fisher's method to combine evidence of association from two or more complementary tests. Genet Epidemiol 37(1):110-21. Fisher RA. 1922. On the interpretation of χ2 from contingency tables, and the calculation of P. J. R. Stat. Soc. 85:87-94. Fisher RA. 1932. Statistical methods for research workers. London: Oliver and Boyd. Han F, Pan W. 2010. A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 70(1):42-54. Hudson RR. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337-8. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. 2014. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet 10(12):e1004729. Lee S, Wu MC, Lin X. 2012. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13(4):762-75. Li B, Leal SM. 2008. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311-21. Li C, Li M, Lange EM, Watanabe RM. 2008. Prioritized subset analysis: improving power in genome-wide association studies. Hum Hered 65(3):129-41. Li Y, Byrnes AE, Li M. 2010. To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. Am J Hum Genet 87(5):728-35. Lin WY. 2014a. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 9(12):e115971. Lin WY. 2014b. Association testing of clustered rare causal variants in case-control studies. PLoS One 9(4):e94337. Lin WY. 2016. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep 6:21824. Lin WY, Lee WC. 2012. Improving power of genome-wide association studies with weighted false discovery rate control and prioritized subset analysis. PLoS One 7(4):e33716. Lin WY, Liang YC. 2016. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 6:28389. Lin WY, Lou XY, Gao G, Liu N. 2014. Rare Variant Association Testing by Adaptive Combination of P-values. PLoS One 9(1):e85728. Lin WY, Yi N, Lou XY, Zhi D, Zhang K, Gao G, Tiwari HK, Liu N. 2013. Haplotype kernel association test as a powerful method to identify chromosomal regions harboring uncommon causal variants. Genet Epidemiol 37(6):560-70. Lin WY, Yi N, Zhi D, Zhang K, Gao G, Tiwari HK, Liu N. 2012. Haplotype-based methods for detecting uncommon causal variants with common SNPs. Genet Epidemiol 36(6):572-82. Lin WY, Zhang B, Yi N, Gao G, Liu N. 2011. Evaluation of pooled association tests for rare variant identification. BMC Proc 5 Suppl 9:S118. Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5(2):e1000384. Morgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 615(1-2):28-56. Morris AP, Zeggini E. 2010. An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34(2):188-93. Peng B. 2015. Reproducible simulations of realistic samples for next-generation sequencing studies using Variant Simulation Tools. Genet Epidemiol 39(1):45-52. Price AL, Kryukov GV, de Bakker PI, Purcell SM, Staples J, Wei LJ, Sunyaev SR. 2010. Pooled association tests for rare variants in exon-resequencing studies. Am J Hum Genet 86(6):832-8. Ramensky V, Bork P, Sunyaev S. 2002. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 30(17):3894-900. Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. 2005. Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15(11):1576-83. Schizophrenia Working Group of the Psychiatric Genomics C. 2014. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510):421-427. Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ and others. 2009. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature 460(7256):753-7. Sullivan PF, Daly MJ, O'Donovan M. 2012. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat Rev Genet 13(8):537-51. Tintle N, Aschard H, Hu IC, Nock N, Wang HT, Pugh E. 2011. Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17. Genetic Epidemiology 35:S56-S60. Wang GT, Zhang D, He Z, Hang D, Li B, Leal S. 2015. Pitfalls in development of statistical methods for rare variant association studies. Presented at the 65th Annual Meeting of The American Society of Human Genetics, October 7, 2015, Baltimore, MD. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. 2011. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82-93. Yang HC, Chen CW. 2011. Region-based and pathway-based QTL mapping using a p-value combination method. BMC Proc 5 Suppl 9:S43. Zaykin DV, Zhivotovsky LA, Westfall PH, Weir BS. 2002. Truncated product method for combining P-values. Genet Epidemiol 22(2):170-85.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/3772	-
dc.description.abstract	過去十年間，全基因組關聯研究 (genome-wide association studies, GWAS) 已指出數千個與複雜疾病有關的單一核苷酸多型性 (single-nucleotide polymorphisms, SNPs)。隨著次世代定序 (next-generation sequencing, NGS) 技術的進步，遺傳學家們得以在人類的染色體上觀察到更多遺傳訊息，使得尋找次要對偶基因頻率 (minor allele frequency, MAF) 小於1% 的罕見致病變異 (rare causal variants) 逐漸成為可能。為了從眾多的變異中指出罕見致病變異，統計方法已陸續發展出來，例如：向後刪除法 (backward elimination procedure, 簡稱「BE」) 和P值適性結合法 (adaptive combination of P-values method, 簡稱「ADA」)，已有文獻指出ADA方法辨認變異的訊號雜訊比 (signal-to-noise ratio) 高於BE方法。本文提出「Q值適性結合法」 (adaptive combination of Q-values, 簡稱「ADAQ」) 以進一步來提高發現致病變異的機率。在變異有同義/非同義註解 (synonymous / non-synonymous annotations) 的情況下，吾人首先將全部的變異分為同義變異群與非同義變異群，再使用Benjamini-Hochberg法分別將兩組內的P值轉換成Q值(簡稱B-H Q-values)，繼而移除Q值較大者，因其較有可能真為中立變異 (neutral variants)。經由模擬發現，ADAQ的陽性預測值 (positive predictive value) 較ADA更高。此外，吾人亦將ADAQ應用到遺傳分析工作坊17 (GAW 17) 的資料上，發現ADAQ較ADA更能有效地控制偽陽性 (false positives) 的個數且產生較高的陽性預測值。因此，當所研究的變異有同義/非同義註解時，吾人推薦使用ADAQ來指出個別罕見致病變異。	zh_TW
dc.description.abstract	In the past decade, genome-wide association analyses have identified thousands of single-nucleotide polymorphisms (SNPs) associated with complex diseases. With the improvement of next-generation sequencing technology, geneticists have observed more inherited information on human chromosome. Searching for rare causal variants (minor allele frequency < 1%) gradually becomes possible. In order to pinpoint rare causal variants in a large number of variants, statistical approaches such as the BE (backward elimination) procedure and the ADA method (the adaptive combination of P-values method), have been developed. It has been shown that the signal-to-noise ratio of variants identified by ADA is larger than that of variants identified by BE. In this study, we propose an ADAQ method (‘adaptive combination of Q-values method’) to further increase the probability that a finding is genuine. With synonymous / non-synonymous annotations for variants, we first allocate all variants into a non-synonymous group and a synonymous group, and transform two groups of per-site P-values into Benjamini-Hochberg Q-values, respectively. We then remove the variants with larger Q-values that are more likely to be neutral. Comprehensive simulations have shown that ADAQ has an even larger positive predictive value than ADA. Moreover, we applied ADAQ to the Genetic Analysis Workshop 17 (GAW 17) data sets. It controls the number of false positives more effectively and generates a larger positive predictive value than ADA. Therefore, we recommend using ADAQ to pinpoint individual rare causal variants, when synonymous / non-synonymous annotations for variants are available.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T08:36:37Z (GMT). No. of bitstreams: 1 ntu-105-R03849031-1.pdf: 1898290 bytes, checksum: 03d7b13d95a765a81baaa30f06c13625 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖目錄 vi 表目錄 vii 第一章前言 1 第二章文獻回顧 4 2.1 以基因為分析單元的關聯檢定 4 2.2 指出個別罕見致病變異的分析方法 6 第三章材料與方法 9 第四章模擬 13 4.1 模擬設計 13 4.2 方法比較 16 4.3 模擬結果 18 第五章應用於遺傳分析工作坊17資料 22 第六章結論與討論 24 參考文獻 47
dc.language.iso	zh-TW
dc.subject	致病變異	zh_TW
dc.subject	罕見變異	zh_TW
dc.subject	中立變異	zh_TW
dc.subject	非同義變異	zh_TW
dc.subject	次世代定序	zh_TW
dc.subject	next-generation sequencing	en
dc.subject	neutral variants	en
dc.subject	rare variants	en
dc.subject	causal variants	en
dc.subject	non-synonymous variants	en
dc.title	以Q值適性結合法來指出罕見致病變異	zh_TW
dc.title	Pinpointing Rare Causal Variants with the Adaptive Combination of Q-values Method	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	李文宗,范盛娟,邱燕楓
dc.subject.keyword	中立變異,罕見變異,致病變異,非同義變異,次世代定序,	zh_TW
dc.subject.keyword	neutral variants,rare variants,causal variants,non-synonymous variants,next-generation sequencing,	en
dc.relation.page	49
dc.identifier.doi	10.6342/NTU201602253
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2016-08-10
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf	1.85 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。