請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90340
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 許書睿 | zh_TW |
dc.contributor.advisor | Shu-Jui Hsu | en |
dc.contributor.author | 陳品瑄 | zh_TW |
dc.contributor.author | Pin-Xuan Chen | en |
dc.date.accessioned | 2023-09-26T16:20:33Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-09-26 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-08-04 | - |
dc.identifier.citation | (HPA), H. P. A. (2018) Prenatal genetic testing for couples with Thalassemia carriers can prevent severe Thalassemia in newborns. Avaliable at: https://www.hpa.gov.tw/EngPages/Detail.aspx?nodeid=1038&pid=7623.
Abel, H. J., Larson, D. E., Regier, A. A., Chiang, C., Das, I., Kanchi, K. L., . . . Hall, I. M. (2020). Mapping and characterization of structural variation in 17,795 human genomes. Nature, 583(7814), 83-89. doi:10.1038/s41586-020-2371-0 Alkan, C., Coe, B. P., & Eichler, E. E. (2011). Genome structural variation discovery and genotyping. Nature Reviews Genetics, 12(5), 363-376. Ball, M. P., Thakuria, J. V., Zaranek, A. W., Clegg, T., Rosenbaum, A. M., Wu, X., . . . Callow, M. J. (2012). A public resource facilitating clinical use of genomes. Proceedings of the National Academy of Sciences, 109(30), 11920-11927. Barseghyan, H., Tang, W., Wang, R. T., Almalvez, M., Segura, E., Bramble, M. S., . . . Délot, E. C. (2017). Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis. Genome medicine, 9(1), 1-11. Byrska-Bishop, M., Evani, U. S., Zhao, X., Basile, A. O., Abel, H. J., Regier, A. A., . . . Zody, M. C. (2022). High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell, 185(18), 3426-3440 e3419. doi:10.1016/j.cell.2022.08.004 Cameron, D. L., Di Stefano, L., & Papenfuss, A. T. (2019). Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat Commun, 10(1), 3240. doi:10.1038/s41467-019-11146-4 Cameron, D. L., Schroder, J., Penington, J. S., Do, H., Molania, R., Dobrovic, A., . . . Papenfuss, A. T. (2017). GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res, 27(12), 2050-2060. doi:10.1101/gr.222109.117 Chaisson, M. J. P., Sanders, A. D., Zhao, X., Malhotra, A., Porubsky, D., Rausch, T., . . . Lee, C. (2019). Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat Commun, 10(1), 1784. doi:10.1038/s41467-018-08148-z Chen, X., Schulz-Trieglaff, O., Shaw, R., Barnes, B., Schlesinger, F., Kallberg, M., . . . Saunders, C. T. (2016). Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics, 32(8), 1220-1222. doi:10.1093/bioinformatics/btv710 Chiang, C., Scott, A. J., Davis, J. R., Tsang, E. K., Li, X., Kim, Y., . . . Hall, I. M. (2017). The impact of structural variation on human gene expression. Nat Genet, 49(5), 692-699. doi:10.1038/ng.3834 Collins, R. L., Brand, H., Karczewski, K. J., Zhao, X., Alfoldi, J., Francioli, L. C., . . . Talkowski, M. E. (2020). A structural variation reference for medical and population genetics. Nature, 581(7809), 444-451. doi:10.1038/s41586-020-2287-8 Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., . . . Sherry, S. T. (2011). The variant call format and VCFtools. Bioinformatics, 27(15), 2156-2158. Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., . . . Davies, R. M. (2021). Twelve years of SAMtools and BCFtools. Gigascience, 10(2), giab008. Delage, W. J., Thevenon, J., & Lemaitre, C. (2020). Towards a better understanding of the low recall of insertion variants with short-read based variant callers. BMC Genomics, 21(1), 762. doi:10.1186/s12864-020-07125-5 DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., . . . Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet, 43(5), 491-498. doi:10.1038/ng.806 Duan, J., Zhang, J.-G., Deng, H.-W., & Wang, Y.-P. (2013). Comparative studies of copy number variation detection methods for next-generation sequencing technologies. PloS one, 8(3), e59128. Ebert, P., Audano, P. A., Zhu, Q., Rodriguez-Martin, B., Porubsky, D., Bonder, M. J., . . . Serra Mari, R. (2021). Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science, 372(6537), eabf7117. English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A., & Sedlazeck, F. J. (2022). Truvari: refined structural variant comparison preserves allelic diversity. Genome Biology, 23(1), 1-20. Feng, Y.-C. A., Chen, C.-Y., Chen, T.-T., Kuo, P.-H., Hsu, Y.-H., Yang, H.-I., . . . Lin, Y.-F. (2022). Taiwan Biobank: A rich biomedical research database of the Taiwanese population. Cell Genomics, 2(11). doi:10.1016/j.xgen.2022.100197 Freeman, J. L., Perry, G. H., Feuk, L., Redon, R., McCarroll, S. A., Altshuler, D. M., . . . Hurles, M. E. (2006). Copy number variation: new insights in genome diversity. Genome research, 16(8), 949-961. Gardner, E. J., Lam, V. K., Harris, D. N., Chuang, N. T., Scott, E. C., Pittard, W. S., . . . Devine, S. E. (2017). The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res, 27(11), 1916-1929. doi:10.1101/gr.218032.116 Geoffroy, V., Herenger, Y., Kress, A., Stoetzel, C., Piton, A., Dollfus, H., & Muller, J. (2018). AnnotSV: an integrated tool for structural variations annotation. Bioinformatics, 34(20), 3572-3574. doi:10.1093/bioinformatics/bty304 Gilissen, C., Hoischen, A., Brunner, H. G., & Veltman, J. A. (2011). Unlocking Mendelian disease using exome sequencing. Genome Biology, 12(9), 228. Gong, T., Hayes, V. M., & Chan, E. K. F. (2021). Detection of somatic structural variants from short-read next-generation sequencing data. Brief Bioinform, 22(3). doi:10.1093/bib/bbaa056 Halldorsson, B. V., Eggertsson, H. P., Moore, K. H., Hauswedell, H., Eiriksson, O., Ulfarsson, M. O., . . . Jensson, B. O. (2022). The sequences of 150,119 genomes in the UK Biobank. Nature, 607(7920), 732-740. He, J., Song, W., Yang, J., Lu, S., Yuan, Y., Guo, J., . . . Long, F. (2017). Next-generation sequencing improves thalassemia carrier screening among premarital adults in a high prevalence population: the Dai nationality, China. Genetics in medicine, 19(9), 1022-1031. Hehir-Kwa, J. Y., Marschall, T., Kloosterman, W. P., Francioli, L. C., Baaijens, J. A., Dijkstra, L. J., . . . Guryev, V. (2016). A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat Commun, 7, 12989. doi:10.1038/ncomms12989 Ho, S. S., Urban, A. E., & Mills, R. E. (2020). Structural variation in the sequencing era. Nature Reviews Genetics, 21(3), 171-189. Hurles, M. E., Dermitzakis, E. T., & Tyler-Smith, C. (2008). The functional impact of structural variation in humans. Trends in Genetics, 24(5), 238-245. Jeffares, D. C., Jolly, C., Hoti, M., Speed, D., Shaw, L., Rallis, C., . . . Sedlazeck, F. J. (2017). Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun, 8, 14061. doi:10.1038/ncomms14061 Jun, G., English, A. C., Metcalf, G. A., Yang, J., Chaisson, M. J., Pankratz, N., . . . Sedlazeck, F. J. (2023). Structural variation across 138,134 samples in the TOPMed consortium. bioRxiv. doi:10.1101/2023.01.25.525428 Jun, G., Sedlazeck, F., Zhu, Q., English, A., Metcalf, G., Kang, H. M., . . . Boerwinkle, E. (2021). muCNV: genotyping structural variants for population-level sequencing. Bioinformatics, 37(14), 2055-2057. Karampetsou, E., Morrogh, D., & Chitty, L. (2014). Microarray Technology for the Diagnosis of fetal chromosomal aberrations: which platform should we use? Journal of clinical medicine, 3(2), 663-678. Kendig, K. I., Baheti, S., Bockol, M. A., Drucker, T. M., Hart, S. N., Heldenbrand, J. R., . . . Mainzer, L. S. (2019). Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy. Front Genet, 10, 736. doi:10.3389/fgene.2019.00736 Khayat, M. M., Sahraeian, S. M. E., Zarate, S., Carroll, A., Hong, H., Pan, B., . . . Zheng, Y. (2021). Hidden biases in germline structural variant detection. Genome Biology, 22(1), 1-15. Kosugi, S., Kamatani, Y., Harada, K., Tomizuka, K., Momozawa, Y., Morisaki, T., & Terao, C. (2023). Detection of trait-associated structural variations using short-read sequencing. Cell Genomics, 3(6). Kosugi, S., Momozawa, Y., Liu, X., Terao, C., Kubo, M., & Kamatani, Y. (2019). Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol, 20(1), 117. doi:10.1186/s13059-019-1720-5 Kronenberg, Z. N., Osborne, E. J., Cone, K. R., Kennedy, B. J., Domyan, E. T., Shapiro, M. D., . . . Yandell, M. (2015). Wham: Identifying Structural Variants of Biological Consequence. PLoS Comput Biol, 11(12), e1004572. doi:10.1371/journal.pcbi.1004572 Krusche, P., Trigg, L., Boutros, P. C., Mason, C. E., De La Vega, F. M., Moore, B. L., . . . Health Benchmarking, T. (2019). Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol, 37(5), 555-560. doi:10.1038/s41587-019-0054-x Kuo, C. W., Hwu, W. L., Chien, Y. H., Hsu, C., Hung, M. Z., Lin, I. L., . . . Lee, N. C. (2020). Frequency and spectrum of actionable pathogenic secondary findings in Taiwanese exomes. Molecular genetics & genomic medicine, 8(10), e1455. Kuzniar, A., Maassen, J., Verhoeven, S., Santuari, L., Shneider, C., Kloosterman, W. P., & de Ridder, J. (2020). sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ, 8, e8214. doi:10.7717/peerj.8214 Lappalainen, I., Lopez, J., Skipper, L., Hefferon, T., Spalding, J. D., Garner, J., . . . Zhou, G. (2012). DbVar and DGVa: public archives for genomic structural variation. Nucleic acids research, 41(D1), D936-D941. Li, C.-K. (2017). New trend in the epidemiology of thalassaemia. Best practice & research Clinical obstetrics & gynaecology, 39, 16-26. Lin, J., Wang, S., Audano, P. A., Meng, D., Flores, J. I., Kosters, W., . . . Ye, K. (2022). SVision: a deep learning approach to resolve complex structural variants. Nat Methods, 19(10), 1230-1233. doi:10.1038/s41592-022-01609-w Liu, Z., Roberts, R., Mercer, T. R., Xu, J., Sedlazeck, F. J., & Tong, W. (2022). Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol, 23(1), 68. doi:10.1186/s13059-022-02636-8 Mahmoud, M., Gobet, N., Cruz-Davalos, D. I., Mounier, N., Dessimoz, C., & Sedlazeck, F. J. (2019). Structural variant calling: the long and the short of it. Genome Biol, 20(1), 246. doi:10.1186/s13059-019-1828-7 McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R., Thormann, A., . . . Cunningham, F. (2016). The ensembl variant effect predictor. Genome Biology, 17(1), 1-14. Miller, D. T., Lee, K., Abul-Husn, N. S., Amendola, L. M., Brothers, K., Chung, W. K., . . . documents@acmg.net, A. S. F. W. G. E. a. (2023). ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: A policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med, 100866. doi:10.1016/j.gim.2023.100866 Mohiyuddin, M., Mu, J. C., Li, J., Bani Asadi, N., Gerstein, M. B., Abyzov, A., . . . Lam, H. Y. (2015). MetaSV: an accurate and integrative structural-variant caller for next generation sequencing. Bioinformatics, 31(16), 2741-2744. doi:10.1093/bioinformatics/btv204 Nagai, A., Hirata, M., Kamatani, Y., Muto, K., Matsuda, K., Kiyohara, Y., . . . Mushiroda, T. (2017). Overview of the BioBank Japan Project: study design and profile. Journal of epidemiology, 27(Supplement_III), S2-S8. Nicholas, T. J., Cormier, M. J., & Quinlan, A. R. (2022). Annotation of structural variants with reported allele frequencies and related metrics from multiple datasets using SVAFotate. BMC Bioinformatics, 23(1), 490. doi:10.1186/s12859-022-05008-y Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., . . . Gershman, A. (2022). The complete sequence of a human genome. Science, 376(6588), 44-53. O’Leary, A., Fernàndez-Castillo, N., Gan, G., Yang, Y., Yotova, A. Y., Kranz, T. M., . . . Cabana-Domínguez, J. (2022). Behavioural and functional evidence revealing the role of RBFOX1 variation in multiple psychiatric disorders and traits. Molecular Psychiatry, 1-10. Olson, N. D., Wagner, J., McDaniel, J., Stephens, S. H., Westreich, S. T., Prasanna, A. G., . . . Zook, J. M. (2021). precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions. doi:10.1101/2020.11.13.380741 Oussalah, A., Siblini, Y., Hergalant, S., Chéry, C., Rouyer, P., Cavicchi, C., . . . Pupavac, M. (2022). Epimutations in both the TESK2 and MMACHC promoters in the Epi-cblC inherited disorder of intracellular metabolism of vitamin B12. Clinical Epigenetics, 14(1), 1-13. Pirooznia, M., Goes, F. S., & Zandi, P. P. (2015). Whole-genome CNV analysis: advances in computational approaches. Frontiers in genetics, 6, 138. Popic, V., Rohlicek, C., Cunial, F., Hajirasouliha, I., Meleshko, D., Garimella, K., & Maheshwari, A. (2023). Cue: a deep-learning framework for structural variant discovery and genotyping. Nat Methods, 20(4), 559-568. doi:10.1038/s41592-023-01799-x Poplin, R., Chang, P.-C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., . . . Afshar, P. T. (2018). A universal SNP and small-indel variant caller using deep neural networks. Nature biotechnology, 36(10), 983-987. Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), 841-842. Rausch, T., Zichner, T., Schlattl, A., Stutz, A. M., Benes, V., & Korbel, J. O. (2012). DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 28(18), i333-i339. doi:10.1093/bioinformatics/bts378 Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., . . . Spector, E. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genetics in medicine, 17(5), 405-423. Riggs, E. R., Andersen, E. F., Cherry, A. M., Kantarci, S., Kearney, H., Patel, A., . . . Thorland, E. C. (2020). Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). In: Elsevier. Ryan M Layer, C. C., Aaron R Quinlan and Ira M Hall. (2014). <LUMPY a probabilistic framework for structural variant discovery.pdf>. Genome Biology. Sánchez-Gaya, V., & Rada-Iglesias, A. (2023). POSTRE: a tool to predict the pathological effects of human structural variants. Nucleic acids research, 51(9), e54-e54. Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T., . . . Kendall, J. (2007). Strong association of de novo copy number mutations with autism. Science, 316(5823), 445-449. Sudmant, P. H., Rausch, T., Gardner, E. J., Handsaker, R. E., Abyzov, A., Huddleston, J., . . . Korbel, J. O. (2015). An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571), 75-81. doi:10.1038/nature15394 Van der Auwera, G. A., Carneiro, M. O., Hartl, C., Poplin, R., Del Angel, G., Levy‐Moonshine, A., . . . Thibault, J. (2013). From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline. Current protocols in bioinformatics, 43(1), 11.10. 11-11.10. 33. Van Rossum, G., & Drake, F. L. (1995). Python reference manual: Centrum voor Wiskunde en Informatica Amsterdam. Wagner, J., Olson, N. D., Harris, L., Khan, Z., Farek, J., Mahmoud, M., . . . Zook, J. M. (2022). Benchmarking challenging small variants with linked and long reads. Cell Genom, 2(5). doi:10.1016/j.xgen.2022.100128 Wagner, J., Olson, N. D., Harris, L., McDaniel, J., Cheng, H., Fungtammasan, A., . . . Sedlazeck, F. J. (2022). Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol, 40(5), 672-680. doi:10.1038/s41587-021-01158-1 Wala, J. A., Bandopadhayay, P., Greenwald, N. F., O'Rourke, R., Sharpe, T., Stewart, C., . . . Beroukhim, R. (2018). SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res, 28(4), 581-591. doi:10.1101/gr.221028.117 Wei, C.-Y., Yang, J.-H., Yeh, E.-C., Tsai, M.-F., Kao, H.-J., Lo, C.-Z., . . . Belsare, S. (2021). Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ genomic medicine, 6(1), 10. Weischenfeldt, J., Symmons, O., Spitz, F., & Korbel, J. O. (2013). Phenotypic impact of genomic structural variation: insights from and for human disease. Nature Reviews Genetics, 14(2), 125-138. Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. M., & Jaffe, D. B. (2017). Direct determination of diploid genome sequences. Genome research, 27(5), 757-767. Wickham, H. (2011). ggplot2. Wiley interdisciplinary reviews: computational statistics, 3(2), 180-185. Wu, D.-C., Hsu, J. S.-J., Chen, C.-Y., Shih, S.-H., Liu, J.-F., Tsai, Y.-C., . . . Chen, P.-L. (2021). Complete genomic profiles of 1,496 Taiwanese reveal curated medical insights. doi:10.1101/2021.12.23.21268291 Wu, Z., Jiang, Z., Li, T., Xie, C., Zhao, L., Yang, J., . . . Xie, Z. (2021). Structural variants in the Chinese population and their impact on phenotypes, diseases and population adaptation. Nat Commun, 12(1), 6501. doi:10.1038/s41467-021-26856-x Yang, J., & Chaisson, M. J. (2022). TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biology, 23(1), 110. Yang, Z., Cui, Q., Zhou, W., Qiu, L., & Han, B. (2019). Comparison of gene mutation spectrum of thalassemia in different regions of China and Southeast Asia. Molecular genetics & genomic medicine, 7(6), e680. Zarate, S., Carroll, A., Mahmoud, M., Krasheninina, O., Jun, G., Salerno, W. J., . . . Sedlazeck, F. J. (2020). Parliament2: Accurate structural variant calling at scale. Gigascience, 9(12). doi:10.1093/gigascience/giaa145 Zhao, W.-W. (2013). Intragenic deletion of RBFOX1 associated with neurodevelopmental/neuropsychiatric disorders and possibly other clinical presentations. Molecular cytogenetics, 6, 1-5. Zhao, X., Collins, R. L., Lee, W. P., Weber, A. M., Jun, Y., Zhu, Q., . . . Talkowski, M. E. (2021). Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am J Hum Genet, 108(5), 919-928. doi:10.1016/j.ajhg.2021.03.014 Zook, J. M., Catoe, D., McDaniel, J., Vang, L., Spies, N., Sidow, A., . . . Alexander, N. (2016). Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data, 3(1), 1-26. Zook, J. M., Hansen, N. F., Olson, N. D., Chapman, L., Mullikin, J. C., Xiao, C., . . . Salit, M. (2020). A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol, 38(11), 1347-1355. doi:10.1038/s41587-020-0538-8 Zook, J. M., McDaniel, J., Olson, N. D., Wagner, J., Parikh, H., Heaton, H., . . . Salit, M. (2019). An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol, 37(5), 561-566. doi:10.1038/s41587-019-0074-6 | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90340 | - |
dc.description.abstract | 遺傳性結構性變異 (germline structural variations, SV) 的定義為基因體上長度超過五十個鹼基的大片段改變,和單點核苷酸變異 (single-nucleotide variant, SNV) 相比具有複雜的變異型態、對基因作用的影響範圍也較大。近年來許多大型基因體定序計畫,如基因組匯總資料庫 (Genome Aggregation Database, gnomAD),利用大規模的多族群樣本進行結構性變異的偵測,並探勘其族群等位基因頻率 (allele frequency) 與功能影響。但在定序分析層面,目前仍未有高可信度且準確的單一方法,可以衡量不同演算法的偵測表現;在族群基因體層面,目前也沒有針對台灣族群進行的大規模結構性變異偵測研究。此研究中我們利用Genome in a Bottle Consortium (GIAB) 釋出位於國際標準品HG002基因體上之大片段缺失 (deletion) 與插入 (insertion) 兩種變異位點的資料,來衡量九種基於不同演算法開發的工具表現。本論文比較工具的偵測能力與特性差異,並建議將所有工具結果合併可達到最佳偵測靈敏度,在缺失和插入變異的召回率 (recall) 可分別達到82%和67%。更進一步將分析策略應用至1,484個台灣人體生物資料庫 (Taiwan Biobank) 的全基因體定序樣本,總共偵測到81,268個缺失與81,235個插入變異、平均每人帶有7,526個大片段缺失與插入兩種結構性變異;從等位基因頻率分析,84%屬於等位基因小於1%的罕見變異 (rare variant),更有48%屬於僅一人帶有的單例變異 (singleton)。通過計算族群等位基因頻率並與基因組匯總資料庫 (gnomAD) 資料庫中東亞族群的變異頻率進行比較,驗證兩資料集的等位基因頻率高度一致,在缺失與插入變異的相關係數分別為0.93和0.89。此外透過變異對基因的影響分析,我們發現某些變異發生在ACMG secondary finding list中具有潛在罹病風險的基因位點,也評估台灣人常見之遺傳疾病甲型地中海型貧血的攜帶率 (carrier rate) 約為5.12%,和過去研究結果相似。
本篇研究建立高可信度的比對流程並評估最佳化偵測策略以協助建立台灣族群的結構性變異資料庫,同時計算族群變異的等位基因頻率。未來將進一步釋出變異資料集與等位基因頻率資訊,以應用於臨床分子診斷及相關研究。 | zh_TW |
dc.description.abstract | Structural variants (SVs) are defined as genomic changes larger than 50 bp and have a functional impact on human genomes. Large public databases such as The Genome Aggregation Database (gnomAD) have constructed population SV profile using different detection methods. However, there was no standard method to evaluate SV calling tools with different algorithms and no SV profiling specifically constructed for Taiwanese. We utilized a benchmark truth set with a collection of insertion /deletion variants (NIST_SV0.6) on HG002 released by Genome in A Bottle (GIAB) consortium to evaluate nine reputable SV callers. Our results suggested the combination of multiple callers was the most sensitive strategy which increased the overall recall rate to 82% for deletions, and 67% for insertions at most. A total of 81,268 deletions and 81,235 insertions were discovered from 1,484 short-read whole genome sequencing Taiwan Biobank samples, with 7,526 SVs per individual on average. Through cohort frequency, we found 84% of SVs were rare, 48% were singleton among Taiwanese. Moreover, the population SV allele frequency was analyzed and compared to the East Asia population in gnomAD-SV database, with correlation coefficients 0.93 in deletion and 0.89 in insertion, respectively. Finally, we annotated the SV call set and found some SVs had overlapped with genes in ACMG secondary finding lists. In addition, we estimated 5.12% of carrier rate for alpha thalassemia in Taiwanese which was similar to previous predictions.
In conclusion, this study utilized a robust benchmarking process and constructed an optimized SV detection workflow for Taiwan Biobank WGS data. Through the cohort SV profile, we characterized the allele frequency distribution in general population. The cohort SV call set and the population allele frequency will be released to facilitate the molecular diagnosis and human genetic researches for Taiwanese population. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-26T16:20:33Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-09-26T16:20:33Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | 口試委員會審定書 …………………………………………………………………… i
誌謝 ………………………………………………………………………………… ii 中文摘要 …………………………………………………………………………….. iii 英文摘要 ……………………………………………………………………………... v 縮寫對照表 …………………………………………………………………………. vii 圖目錄 ……………………………………………………………………………….. xi 表目錄 ………………………………………………………………………………. xii 第一章 背景介紹(Introduction) ……………………………………………………… 1 第二章 實驗設計(Study Materials & Methods) ……………………………………… 9 2.1 Study Materials …………………………………………………………………… 9 2.1.1 GIAB NIST_SV0.6 truth set ……………………………………………….. 9 2.1.2 International standard sample input ………………………………………. 10 2.1.3 Taiwan Biobank sample information ……………………………………… 10 2.2 Study Methods …………………………………………………………………... 12 2.2.1 Sequence preprocessing and variant calling ………………………………. 12 2.2.2 Variant filtration and VCF format change ………………………………… 14 2.2.3 SV merging and accuracy evaluation ……………………………………... 16 2.2.4 Functional annotation and correlation of allele frequency ………………... 18 第三章 實驗結果(Results) ………………………………………………………….. 19 3.1 Truth set composition and filtration ……………………………………………. 19 3.1.1 The identification of truth subset with 8,103 SVs for benchmarking ……. 19 3.1.2 The call-set composition and size distribution in the truth subset ……….. 20 3.2 Benchmarking of structural variation caller performance ……………………… 22 3.2.1 DRAGEN had the best performance among individual tools ………......... 22 3.2.2 SURVIVOR was the optimized tool to merge callers …………………… 23 3.2.3 Merging all callers had higher recall than any single caller ……………… 25 3.2.4 The multiple limitations of insertions caused high false-negatives ……… 26 3.3 Structural variation profiles in Taiwan Biobank samples ……………………… 29 3.3.1 The optimized method to joint variants from multiple samples. ………… 29 3.3.2 Statistics of 164 high-quality and 1,484 Taiwanese ……………………… 29 3.3.3 The individual-level SVs showed a higher overlapping consistency at the joint calling level ………………………………………………………. 32 3.3.4 Transforming from sample frequency to genotype allele frequency …….. 33 3.4 Functional annotation in Taiwan Biobank samples ……………………………. 36 3.4.1 Most of SVs caused effects in non-coding or intergenic regions …........... 36 3.4.2 The prediction of SV pathogenicity using VEP and AnnotSV …………... 37 3.4.3 The allele frequency correlation between Taiwanese and gnomAD-SV database ……………………………………………………………………39 3.4.4 Thalassemia carrier rate and secondary finding list in Taiwan Biobank ……………………………………………………………………………… 41 3.5 The final release for the Taiwan Biobank SV joint call set …………………… 42 第四章 結果討論(Discussion) …………………………………………………….. 44 圖 …………………………………………………………………………................. 55 表 …………………………………………………………………………............... 69 參考文獻 ………………………………………………………………….………... 78 附錄 ………………………………………………………………………………… 86 | - |
dc.language.iso | en | - |
dc.title | 台灣人體生物資料庫1,484人全基因體定序偵測大片段缺失與插入變異資訊 | zh_TW |
dc.title | Germline large deletions and insertions in 1,484 Taiwan Biobank whole genome sequence data | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 陳沛隆;陳倩瑜;郭柏秀 | zh_TW |
dc.contributor.oralexamcommittee | Pei-Lung Chen;Chien-Yu Chen;Po-Hsiu Kuo | en |
dc.subject.keyword | 生物資訊學,結構性變異,工具表現比對,族群基因體學,台灣人體生物資料庫, | zh_TW |
dc.subject.keyword | bioinformatics,structural variation,benchmarking,population genetics,Taiwan Biobank, | en |
dc.relation.page | 90 | - |
dc.identifier.doi | 10.6342/NTU202302413 | - |
dc.rights.note | 未授權 | - |
dc.date.accepted | 2023-08-04 | - |
dc.contributor.author-college | 醫學院 | - |
dc.contributor.author-dept | 基因體暨蛋白體醫學研究所 | - |
顯示於系所單位: | 基因體暨蛋白體醫學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 目前未授權公開取用 | 4.91 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。