基於階層式組裝之短序列單倍體定相方法

Yu-Yu Lin; 林祐榆

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71669

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正
dc.contributor.author	Yu-Yu Lin	en
dc.contributor.author	林祐榆	zh_TW
dc.date.accessioned	2021-06-17T06:06:06Z	-
dc.date.available	2024-01-29
dc.date.copyright	2019-01-29
dc.date.issued	2018
dc.date.submitted	2019-01-16
dc.identifier.citation	1. Nalls, M.A., et al., Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nat Genet, 2014. 46(9): p. 989-93. 2. Ripke, S., et al., Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet, 2013. 45(10): p. 1150-9. 3. Sankararaman, S., et al., The genomic landscape of Neanderthal ancestry in present-day humans. Nature, 2014. 507(7492): p. 354-7. 4. Schiffels, S. and R. Durbin, Inferring human population size and separation history from multiple genome sequences. Nat Genet, 2014. 46(8): p. 919-25. 5. Zanger, U.M. and M. Schwab, Cytochrome P450 enzymes in drug metabolism: regulation of gene expression, enzyme activities, and impact of genetic variation. Pharmacol Ther, 2013. 138(1): p. 103-41. 6. Deenen, M.J., et al., Relationship between single nucleotide polymorphisms and haplotypes in DPYD and toxicity and efficacy of capecitabine in advanced colorectal cancer. Clin Cancer Res, 2011. 17(10): p. 3455-68. 7. Castel, S.E., et al., Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat Commun, 2016. 7: p. 12817. 8. Browning, S.R. and B.L. Browning, Haplotype phasing: existing methods and new developments. Nat Rev Genet, 2011. 12(10): p. 703-14. 9. Snyder, M.W., et al., Haplotype-resolved genome sequencing: experimental methods and applications. Nat Rev Genet, 2015. 16(6): p. 344-58. 10. O'Connell, J., et al., A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet, 2014. 10(4): p. e1004234. 11. Glusman, G., H.C. Cox, and J.C. Roach, Whole-genome haplotyping approaches and genomic medicine. Genome Med, 2014. 6(9): p. 73. 12. Edge, P., V. Bafna, and V. Bansal, HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res, 2017. 27(5): p. 801-812. 13. Garg, S., M. Martin, and T. Marschall, Read-based phasing of related individuals. Bioinformatics, 2016. 32(12): p. i234-i242. 14. Mazrouee, S. and W. Wang, FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs. Bioinformatics, 2014. 30(17): p. i371-8. 15. Pirola, Y., et al., HapCol: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics, 2016. 32(11): p. 1610-7. 16. Choi, Y., et al., Comparison of phasing strategies for whole human genomes. PLoS Genet, 2018. 14(4): p. e1007308. 17. Ellingford, J.M., et al., Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease. Ophthalmology, 2016. 123(5): p. 1143-50. 18. Luukkonen, T.M., et al., Breakpoint mapping and haplotype analysis of translocation t(1;12)(q43;q21.1) in two apparently independent families with vascular phenotypes. Mol Genet Genomic Med, 2018. 6(1): p. 56-68. 19. Sousa-Pinto, B., et al., HLA and Delayed Drug-Induced Hypersensitivity. International Archives of Allergy and Immunology, 2016. 170(3): p. 163-179. 20. Stavropoulos, D.J., et al., Whole Genome Sequencing Expands Diagnostic Utility and Improves Clinical Management in Pediatric Medicine. NPJ Genom Med, 2016. 1. 21. Cheng, S.J., et al., Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome. Journal of Genetics and Genomics, 2018. 45(7): p. 373-379. 22. Wu, P.C., et al., ABO genotyping with next-generation sequencing to resolve heterogeneity in donors with serology discrepancies. Transfusion, 2018. 23. Lancia G, e.a. SNPs problems, complexity, and algorithms. in The 9th Annual European Symposium on Algorithms. 2001. Berlin: Springer. 24. Lippert, R., et al., Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief Bioinform, 2002. 3(1): p. 23-31. 25. Chen, Z.Z., F. Deng, and L. Wang, Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 2013. 29(16): p. 1938-45. 26. He, D., et al., Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 2010. 26(12): p. i183-90. 27. Aguiar, D. and S. Istrail, HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data. J Comput Biol, 2012. 19(6): p. 577-90. 28. Xie, M., J. Wang, and X. Chen, LGH: A Fast and Accurate Algorithm for Single Individual Haplotyping Based on a Two-Locus Linkage Graph. IEEE/ACM Trans Comput Biol Bioinform, 2015. 12(6): p. 1255-66. 29. Bansal, V. and V. Bafna, HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 2008. 24(16): p. i153-9. 30. Sedlazeck, F.J., et al., Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet, 2018. 19(6): p. 329-346. 31. Mark J.P. Chaisson, et al., Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nature, 2018. 32. Duitama, J., et al., Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques. Nucleic Acids Res, 2012. 40(5): p. 2041-53. 33. Zook, J.M., et al., Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol, 2014. 32(3): p. 246-51.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71669	-
dc.description.abstract	基因分型是在確定變異基因座上存在的等位基因，例如A / A（同型合子）或G / A（異型合子），而單倍體分型則描述了判定不同基因座上等位基因共同位於一染色體的過程，正確地揭示多個位點間等位基因之間的關係。換句話說，單倍體分型在於判定多個異型合子在兩股染色體上共出現的相互關係。因此單倍體分型又稱為定相，用於表示同染色體中，不同基因座上等位基因的順位關係。研究指出單倍體相位會影響基因表達、訊號傳遞與蛋白質功能，因此極為重要。隨著定序技術進步，基於短序列資訊的定相方法日趨受到重視。最小錯誤更正法為短序列定相方法中最主流的建模方式，此方法通過減少單核苷酸多態性與片段矩陣內的衝突來預測機率最高的單倍體組合。然而，學者觀察普遍指出最佳的最小錯誤解可能不是真正的單倍體組合。因為最小錯誤更正法將所有變異基因座都考慮在一起，導致某接基因組區域內大量且密集的錯誤誤導了更正過程的選擇。為解決這個問題，我們提出一個基於階層式組裝的方法，利用漸進方式更正區域衝突。本論文提出一個基於階層是組裝的新型定相計算方法。此方法由可信度高的成對變異基因座開始，逐步建構出完整單倍體。與其他基於最小錯誤更正法的計算方法相比，在真實收集的全基因組短序列資料集與模擬資料上皆取得較佳的定相錯誤率。論文將真實數據中錯誤校正的次數與其他方法做比較，發現本方法也能找到相當少校正次數的答案解，此說明本方法雖不完全以最小錯誤法為出發點，但仍能找到接近最小錯誤解。最後利用模擬數據來測試論文方法在不同定序條件下的效能，以實證論文方法在現實中的適用性。	zh_TW
dc.description.abstract	While genotyping refers to the determination of alleles present at a variant locus, such as A/A (homozygous) or G/A (heterozygous), haplotyping describes the procedure of telling which alleles are co-located on the same chromosome, revealing the correct relationship between alleles at multiple loci. In other words, haplotyping is the procedure of phasing two heterozygous variants G/A and C/T in order to produce haplotypes of G-T (one copy of the chromosome) and A-C (the other copy of the chromosome). In this regard, phasing, an alternative name for haplotyping, is to clarify cis relationships in chromosomes, which are essential in association research and clinical genetics. The need for read-based phasing arises with advances in sequencing technologies. The minimum error correction (MEC) approach is the primary model to resolve haplotypes by reducing conflicts in a SNP-fragment matrix. However, it is frequently observed that the solution with the optimal MEC might not be the real haplotypes, due to the fact that MEC methods consider all positions together and sometimes the discords in noisy regions might mislead the selection of corrections. To tackle this problem, a hierarchical assembly-based method was designed to progressively resolve local conflicts. This thesis presents HAHap, a new phasing algorithm based on hierarchical assembly. HAHap leverages high-confident variant pairs to build haplotypes progressively. The phasing results by HAHap on both real and simulated data, compared to other MEC-based methods, revealed better phasing error rates for constructing haplotypes using short reads from whole-genome sequencing. The study compared the number of error corrections on real data with other methods, and it reveals the ability of HAHap to predict haplotypes with a lower number of error corrections. The simulated data was used to investigate the behavior of HAHap under different sequencing conditions, highlighting the applicability of HAHap in certain situations.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T06:06:06Z (GMT). No. of bitstreams: 1 ntu-107-D99945022-1.pdf: 2084119 bytes, checksum: 6bffb398b873f0416d21d290863fa720 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	博士學位論文口試委員會審定書 i ACKNOWLEDGMENTS ii 中文摘要 iii ABSTRACT iv TABLE OF CONTENTS v LIST OF FIGURES vii LIST OF TABLES xi CHAPTER 1 Introduction 1 1.1 Experimental methods 2 1.2 Computational methods 3 1.3 Thesis structure 5 CHAPTER 2 Related Works 6 2.1 SNP-fragment matrix 6 2.2 Minimum error correction model 8 2.3 Challenges of read-based phasing 12 CHAPTER 3 Methods 13 3.1 Haplotype-informative reads and variant blocks 14 3.2 Hierarchical assembly 15 3.3 Multinomial distribution metric 18 3.4 Local phasing using MEC 21 3.4.1 Embedded merging 22 3.4.2 Only ambiguous variant pairs remain 23 3.5 Examples of choices between heuristic and local MEC-based search 24 CHAPTER 4 Experiment Design 27 4.1 Evaluation measurement 27 4.2 Real Datasets 29 4.3 Simulated Datasets 31 CHAPTER 5 Experimental Results 33 5.1 Evaluation on real data 33 5.2 Evaluation on simulated data 36 5.3 Evaluation on sequencing skewness 38 5.4 Comparison of number of error corrections 41 5.5 Comparison of Running time 43 CHAPTER 6 Application on ABO Blood Type 45 CHAPTER 7 Discussions 47 CHAPTER 8 Conclusion and Future Works 49 REFERENCE: 50 Appendix 53
dc.language.iso	en
dc.subject	單倍體分型	zh_TW
dc.subject	短序列單倍體定相方法	zh_TW
dc.subject	階層式組裝	zh_TW
dc.subject	機率分數	zh_TW
dc.subject	最小錯誤修正法	zh_TW
dc.subject	Probability score	en
dc.subject	Read-based phasing	en
dc.subject	Hierarchical assembly	en
dc.subject	Minimum Error Correction	en
dc.subject	Haplotype	en
dc.title	基於階層式組裝之短序列單倍體定相方法	zh_TW
dc.title	Read-based haplotyping method using hierarchical assembly	en
dc.type	Thesis
dc.date.schoolyear	107-1
dc.description.degree	博士
dc.contributor.coadvisor	陳倩瑜
dc.contributor.oralexamcommittee	陳沛隆,趙坤茂,蔡懷寬
dc.subject.keyword	單倍體分型,短序列單倍體定相方法,階層式組裝,機率分數,最小錯誤修正法,	zh_TW
dc.subject.keyword	Haplotype,Read-based phasing,Hierarchical assembly,Probability score,Minimum Error Correction,	en
dc.relation.page	59
dc.identifier.doi	10.6342/NTU201900074
dc.rights.note	有償授權
dc.date.accepted	2019-01-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	2.04 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。