拷貝數變異資料關聯性檢定之分析策略

Yi-Hsuan Chen; 陳逸萱

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60691

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪弘(Hung Hung),郭柏秀(Po-Hsiu Kuo)
dc.contributor.author	Yi-Hsuan Chen	en
dc.contributor.author	陳逸萱	zh_TW
dc.date.accessioned	2021-06-16T10:26:15Z	-
dc.date.available	2016-09-24
dc.date.copyright	2013-09-24
dc.date.issued	2013
dc.date.submitted	2013-08-15
dc.identifier.citation	1. Feuk, L.; Carson, A.R.; Scherer, S.W., Structural variation in the human genome. Nat Rev Genet 2006, 7, 85-97. 2. Zhang, F.; Gu, W.; Hurles, M.E.; Lupski, J.R., Copy number variation in human health, disease, and evolution. Annu Rev Genomics Hum Genet 2009, 10, 451-481. 3. Zollner, S.; Teslovich, T.M., Using gwas data to identify copy number variants contributing to common complex diseases. Statistical Science 2009, 24, 530-546. 4. Macgregor, S.; Visscher, P.M.; Montgomery, G., Analysis of pooled DNA samples on high density arrays without prior knowledge of differential hybridization rates. Nucleic Acids Res 2006, 34, e55. 5. Sham, P.; Bader, J.S.; Craig, I.; O'Donovan, M.; Owen, M., DNA pooling: A tool for large-scale association studies. Nat Rev Genet 2002, 3, 862-871. 6. Lin, C.H.; Huang, M.C.; Li, L.H.; Wu, J.Y.; Chen, Y.T.; Fann, C.S., Genome-wide copy number analysis using copy number inferring tool (cnit) and DNA pooling. Hum Mutat 2008, 29, 1055-1062. 7. Barnes, C.; Plagnol, V.; Fitzgerald, T.; Redon, R.; Marchini, J.; Clayton, D.; Hurles, M.E., A robust statistical method for case-control association testing with copy number variation. Nature Genetics 2008, 40, 1245-1252. 8. Cardin, N.; Holmes, C.; Donnelly, P.; Marchini, J., Bayesian hierarchical mixture modeling to assign copy number from a targeted cnv array. Genet Epidemiol 2011, 35, 536-548. 9. Zhang, J.; Liang, F., Robust clustering using exponential power mixtures. Biometrics 2010, 66, 1078-1086. 10. Gonzalez, J.R.; Subirana, I.; Escaramis, G.; Peraza, S.; Caceres, A.; Estivill, X.; Armengol, L., Accounting for uncertainty when assessing association between copy number and disease: A latent class model. BMC Bioinformatics 2009, 10, 172. 11. Tseng, G.C.; Wong, W.H., Tight clustering: A resampling-based approach for identifying stable and tight patterns in data. Biometrics 2005, 61, 10-16. 12. Sorzano, C.O.; Bilbao-Castro, J.R.; Shkolnisky, Y.; Alcorlo, M.; Melero, R.; Caffarena-Fernandez, G.; Li, M.; Xu, G.; Marabini, R.; Carazo, J.M., A clustering approach to multireference alignment of single-particle projections in electron microscopy. J Struct Biol 2010, 171, 197-206. 13. Shiu, S.-Y.; Chen, T.-L., Clustering by self-updating process. arXiv:1201.1979 [stat.ME] 2012. 14. Ting-Li Chen, H.H., I-Ping Tu, Pei-Shien Wu, Dai-Ni Hsieh, Wei-Hau Chang, Su-Yun Huang, Gamma-sup: A self-updating clustering algorithm based on minimum gamma-divergence with application to cryo-em images. arXiv:1205.2034 [stat.ME] 2013. 15. Wang, K.; Li, M.; Hadley, D.; Liu, R.; Glessner, J.; Grant, S.F.; Hakonarson, H.; Bucan, M., Penncnv: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data. Genome Res 2007, 17, 1665-1674. 16. Lin, C.H.; Lin, Y.C.; Wu, J.Y.; Pan, W.H.; Chen, Y.T.; Fann, C.S., A genome-wide survey of copy number variations in han chinese residing in taiwan. Genomics 2009, 94, 241-246. 17. Lou, H.; Li, S.; Yang, Y.; Kang, L.; Zhang, X.; Jin, W.; Wu, B.; Jin, L.; Xu, S., A map of copy number variations in chinese populations. PLoS ONE 2011, 6, e27341. 18. Malhotra, D.; McCarthy, S.; Michaelson, J.J.; Vacic, V.; Burdick, K.E.; Yoon, S.; Cichon, S.; Corvin, A.; Gary, S.; Gershon, E.S., et al., High frequencies of de novo cnvs in bipolar disorder and schizophrenia. Neuron 2011, 72, 951-963. 19. CF, K.; PH, K., Risk and information evaluation of prioritized genes for complex traits: Application to bipolar disorder 20. Wang, J.; Duncan, D.; Shi, Z.; Zhang, B., Web-based gene set analysis toolkit (webgestalt): Update 2013. Nucleic Acids Res 2013, 41, W77-83. 21. Tu, I.P., An eigenvector variability plot. Statistica Sinica 2009, 19, 1741. 22. Meinshausen, N.; Buhlmann, P., Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2010, 72, 417-473. 23. Han Liu, K.R., Larry Wasserman, Stability approach to regularization selection (stars) for high dimensional graphical models. arXiv:1006.3316 [stat.ML] 2010. 24. Monti, S.; Tamayo, P.; Mesirov, J.; Golub, T., Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 2003, 52, 91-118. 25. Mollah, M.N.; Sultana, N.; Minami, M.; Eguchi, S., Robust extraction of local structures by the minimum beta-divergence method. Neural Netw 2010, 23, 226-238. 26. Priebe, L.; Degenhardt, F.A.; Herms, S.; Haenisch, B.; Mattheisen, M.; Nieratschker, V.; Weingarten, M.; Witt, S.; Breuer, R.; Paul, T., et al., Genome-wide survey implicates the influence of copy number variants (cnvs) in the development of early-onset bipolar disorder. Mol Psychiatry 2012, 17, 421-432. 27. Yang, S.; Wang, K.; Gregory, B.; Berrettini, W.; Wang, L.S.; Hakonarson, H.; Bucan, M., Genomic landscape of a three-generation pedigree segregating affective disorder. PLoS ONE 2009, 4, e4474. 28. Grozeva D, K.G.I.D.; et al., Rare copy number variants: A point of rarity in genetic risk for bipolar disorder and schizophrenia. Archives of General Psychiatry 2010, 67, 318-327. 29. Bergen, S.E.; O'Dushlaine, C.T.; Ripke, S.; Lee, P.H.; Ruderfer, D.M.; Akterin, S.; Moran, J.L.; Chambert, K.D.; Handsaker, R.E.; Backlund, L., et al., Genome-wide association study in a swedish population yields support for greater cnv and mhc involvement in schizophrenia compared with bipolar disorder. Mol Psychiatry 2012, 17, 880-886. 30. McQuillin, A.; Bass, N.; Anjorin, A.; Lawrence, J.; Kandaswamy, R.; Lydall, G.; Moran, J.; Sklar, P.; Purcell, S.; Gurling, H., Analysis of genetic deletions and duplications in the university college london bipolar disorder case control sample. Eur J Hum Genet 2011, 19, 588-592. 31. Siuly; Li, Y.; Wen, P.P., Clustering technique-based least square support vector machine for eeg signal classification. Comput Methods Programs Biomed 2011, 104, 358-372. 32. Ben-Shachar, S.; Lanpher, B.; German, J.R.; Qasaymeh, M.; Potocki, L.; Nagamani, S.C.; Franco, L.M.; Malphrus, A.; Bottenfield, G.W.; Spence, J.E., et al., Microdeletion 15q13.3: A locus with incomplete penetrance for autism, mental retardation, and psychiatric disorders. J Med Genet 2009, 46, 382-388. 33. Miller, D.T.; Shen, Y.; Weiss, L.A.; Korn, J.; Anselm, I.; Bridgemohan, C.; Cox, G.F.; Dickinson, H.; Gentile, J.; Harris, D.J., et al., Microdeletion/duplication at 15q13.2q13.3 among individuals with features of autism and other neuropsychiatric disorders. J Med Genet 2009, 46, 242-248. 34. Jin, G.; Sun, J.; Liu, W.; Zhang, Z.; Chu, L.W.; Kim, S.T.; Feng, J.; Duggan, D.; Carpten, J.D.; Wiklund, F., et al., Genome-wide copy-number variation analysis identifies common genetic variants at 20p13 associated with aggressiveness of prostate cancer. Carcinogenesis 2011, 32, 1057-1062. 35. Trachoo, O.; Assanatham, M.; Jinawath, N.; Nongnuch, A., Chromosome 20p inverted duplication deletion identified in a thai female adult with mental retardation, obesity, chronic kidney disease and characteristic facial features. Eur J Med Genet 2013, 56, 319-324. 36. Olsen, L.; Hansen, T.; Djurovic, S.; Haastrup, E.; Albrecthsen, A.; Hoeffding, L.K.; Secher, A.; Gustafsson, O.; Jakobsen, K.D.; Nielsen, F.C., et al., Copy number variations in affective disorders and meta-analysis. Psychiatr Genet 2011, 21, 319-322. 37. Glessner, J.T.; Reilly, M.P.; Kim, C.E.; Takahashi, N.; Albano, A.; Hou, C.; Bradfield, J.P.; Zhang, H.; Sleiman, P.M.A.; Flory, J.H., et al., Strong synaptic transmission impact by copy number variations in schizophrenia. Proceedings of the National Academy of Sciences 2010, 107, 10584-10589. 38. Sampaio, A.S.; Fagerness, J.; Crane, J.; Leboyer, M.; Delorme, R.; Pauls, D.L.; Stewart, S.E., Association between polymorphisms in grik2 gene and obsessive-compulsive disorder: A family-based study. CNS Neurosci Ther 2011, 17, 141-147. 39. Kim, S.A.; Kim, J.H.; Park, M.; Cho, I.H.; Yoo, H.J., Family-based association study between grik2 polymorphisms and autism spectrum disorders in the korean trios. Neurosci Res 2007, 58, 332-335.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/60691	-
dc.description.abstract	拷貝數變異是一種DNA結構上的變異，近年來已有許多研究指出它與許多複雜性疾病有關。陣列式晶片技術可幫助我們快速的掃描大量拷貝數變異的訊號，也有許多新發展的統計方法嘗試從實驗偵測的訊號值估計出拷貝數。這些方法主要面臨的問題在於離散的拷貝數數值需要從一連串標記所讀出的連續訊號值來估計，進而我們還希望進行關聯性檢定來找出拷貝數變異與疾病的關係。在拷貝數變異分析的第一階段，我們通常會由全基因體的訊號來找尋和疾病相關的拷貝數變異片段。由於拷貝數變異是一種稀少且影響力相對小的一種DNA變異，使得我們很難在病人與非病人間進行比較。近年來許多研究為了節省成本，開始採用混合樣本之全基因體掃描研究的分析策略，然而由於拷貝數變異的複雜性，此策略若要應用到拷貝數變異的偵測上，將面臨更大的挑戰。在這個研究中，我們希望能發展一套程序來幫助我們使用混合樣本來找出拷貝數變異與疾病之間的關係。我們建立一系列的篩選方法來過濾掉可能是偽陽性的結果，並將這套程序應用到躁鬱症的拷貝數變異資料中。我們先定義出每批混合樣本的拷貝數變異區段，再挑選出在病例組與對照組中有不同分佈趨勢的拷貝數變異區段，最後我們透過整合這些拷貝數變異區段所對應到的基因功能及比對過去的發表過的相關研究，來探測拷貝數變異與躁鬱症之間的關聯性。在拷貝數變異分析的第二階段，我們可透過集群分析從特定片段所取得的驗證訊號值中估計出拷貝數。但由於拷貝數變異的資料在分群的趨勢較不明顯且有離群值，使得我們很難找出正確的分類。γ-SUP是一種新發展的方法,它能解決拷貝數變異資料面臨的問題，並且它不需要事先決定分群的組數。γ-SUP需要決定一個會影響組數的參數τ，然而該篇作者建議的主觀挑選參數的方法與分析結果的好壞並沒有確定的根據。在這個研究中，我們希望能根據穩定性的概念來發展出挑選。γ-SUP參數的方法。穩定的集群分析在於它的分群結果能可被重覆很多次，因此我們利用重複抽樣的方法測量評估穩定性的指標。根據模擬的分析證明我們提出的方法能夠找出適當的參數，進而我們將這個方法應用在自閉症的拷貝數變異資料中。	zh_TW
dc.description.abstract	Copy number variation (CNV) is a type of structural variation on DNA segment, which is reported to be associated with a number of complex diseases. Array-based technology enables fasting scanning large numbers of CNV, and many statistical strategies are developed for the estimation of copy number from experimental data. The challenge comes from estimating discrete value of the copy numbers using continuous signals calling from a set of markers. Another complexity resides in simultaneously performing association testing between CNVs and diseases. At the first stage of CNV analysis, CNV regions can be searched in relation to the trait of interest from genome-wide data. Because CNVs are rare and with low effect size, it is generally difficult to compare the frequency between cases and controls using the traditional statistical methods. Recently, DNA pooling strategy is adopted to save genotyping cost. However, CNV detection is even more challenging using pooling data. The first aim of this study is to develop a series of procedure to detect the associations between CNV and trait of interest using pooling strategy. We set a series of criteria for filtering out the noise of data and to reduce false-positive findings. We applied our procedures in an empirical CNV dataset of bipolar disorder. We first defined CNV regions for every pool. Second, we select CNV regions with different patterns between case and control pools. Finally, we integrated our findings into the mapped gene functions and the results of previous studies to explore the associations between CNV and bipolar disorder. At the second stage of CNV analysis, we would apply clustering procedure to estimate copy numbers from the validated signals of the specified CNV region. In the situation of poor clustering quality and outlier-problem in CNV data, it is more challenging to identify correct clusters. γ-Self-updating process (SUP) is a newly developed method that could overcome the above mentioned problems, and it is also robust to the predetermination of the number of classes. The performance of γ-SUP relies on the selection of a tuning parameter τ. However, the relationship between the subjective selection rule and performance of final clustering output is unclear. The second aim of this study is to develop a selection procedure of τ in γ-SUP, based on the idea of stability. In our method, the stability is defined to be the reproducibility of clustering results, and a measure of instability is constructed using resample scheme. Simulation studies show that the proposed selection criterion does provide adequate value of τ. Furthermore, we also apply applied this method in an empirical CNV dataset of autism.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T10:26:15Z (GMT). No. of bitstreams: 1 ntu-102-R00849026-1.pdf: 1326386 bytes, checksum: 023896653c53654c25174701eea9ed0c (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	誌謝 I 中文摘要 II Abstract IV Contents VI List of Figures VIII List of Tables IX 1 Introduction 1 2 Methods for Strategy I: CNV analysis procedures for pooling data 7 2.1 Subjects, DNA pooling construction and genotyping . . . . . . . . . . 7 2.2 Criteria for CNV analysis . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Exploring the CNV regions related to bipolar disorder . . . . . . . . . 10 3 Methods for Strategy II: CNV clustering 13 3.1 Review of γ-SUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Procedure for τ selection . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Simulation 22 4.1 Simulation with p = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Simulation with p = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3 Simulation with large p . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Real data application I: BPD CNV dataset 39 5.1 Exploring the CNV regions related to BPD . . . . . . . . . . . . . . . 39 6 Real data application II: Autism CNV dataset 48 6.1 Pre-procedure and post-procedure for CNV data clustering . . . . . . 49 6.2 Numeric analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7 Discussion 56 References 61
dc.language.iso	en
dc.subject	過濾程序	zh_TW
dc.subject	穩定性	zh_TW
dc.subject	混合DNA	zh_TW
dc.subject	γ-SUP	zh_TW
dc.subject	關聯性檢定	zh_TW
dc.subject	拷貝數變異	zh_TW
dc.subject	集群分析	zh_TW
dc.subject	stability	en
dc.subject	association testing	en
dc.subject	DNA pooling	en
dc.subject	filtering procedures	en
dc.subject	clustering	en
dc.subject	γ-SUP	en
dc.subject	copy number variation (CNV)	en
dc.title	拷貝數變異資料關聯性檢定之分析策略	zh_TW
dc.title	Development of analytic strategies to improve association testing with copy number variation (CNV) data	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	蕭朱杏(Chuhsing Kate Hsiao),高淑芬(Shur-Fen Gau)
dc.subject.keyword	拷貝數變異,關聯性檢定,混合DNA,過濾程序,集群分析,γ-SUP,穩定性,	zh_TW
dc.subject.keyword	copy number variation (CNV),association testing,DNA pooling,filtering procedures,clustering,γ-SUP,stability,	en
dc.relation.page	66
dc.rights.note	有償授權
dc.date.accepted	2013-08-15
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	1.3 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。