請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99957| 標題: | 臺灣人體生物資料庫全基因體定序資料集之基因體結構變異特徵分析 Genomic Structural Variation Characterization of the Taiwan Biobank Whole Genome Sequencing Cohort |
| 作者: | 李婕瑜 Chieh-Yu Lee |
| 指導教授: | 許書睿 Shu-Jui Hsu |
| 關鍵字: | 結構變異,工具比較,生物資訊學,族群結構, structure variation,benchmarking,bioinformatics,population structure, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 結構變異 (structural variation, SV) 是指長度超過 50 個核苷酸的基因體遺傳變異。本研究利用美國國家標準與技術研究院 (National Institute of Standards and Technology, NIST) 轄下瓶中基因體 (Genome in a Bottle, GIAB) 聯盟釋出的結構變異基準資料集對多種定序技術和結構變異偵測流程進行評估。瓶中基因體聯盟的 0.6 版本結構變異基準資料集包含了標準樣本 HG002 生殖細胞系位於高信賴區域的 4,114 個大片段缺失以及 5,623 個大片段插入的序列資訊。為了更全面的評估結構變異偵測流程,本研究分別分析 HG002 及 40 個人類泛參考基因體聯盟 (Human Pangenome Reference Consortium, HPRC) 樣本的短讀長與長讀長全基因體定序資料,並相互驗證。針對長讀長全基因體定序資料,本研究使用序列比對 (alignment-based) 以及序列組裝 (assembly-based) 兩種方式分別比較結構變異偵測能力。針對短讀長全基因體定序資料,本研究整合了七個不同的偵測工具。評估結果顯示長讀長全基因體定序方式在偵測結構變異的方面表現優於短讀長。本研究結果也顯示,透過多種結構變異偵測工具整合,短讀長全基因體定序資料在不同族群檢體的表現相當一致。因此,本研究將分析流程運用到 1,480 位臺灣人體生物資料庫中短讀長全基因體定序資料的樣本上,總共偵測到了 90,578 個大片段缺失和 85,577 個大片段插入。其中有 85% 的結構變異為族群等位基因頻率小於 1% 的罕見變異,且大多數結構變異位於非編碼區。本研究利用千人基因體計劃樣本進行族群結構分析,確認結構變異的族群結構與單核苷酸變異的族群結構相似。本研究進一步估計疾病相關隱性遺傳基因上的結構變異攜帶者頻率,有 4.9% 的人帶有地中海型貧血基因 HBA1/HBA2 大片段缺失。 Structural variants (SVs) are genomic alterations spanning more than 50 nucleotides. We benchmarked SV detecting performance across multiple sequencing technologies and bioinformatics algorithms using expert-validated SV benchmark sets released by the Genome in a Bottle (GIAB) consortium (version 0.6). The benchmark set includes 4,114 large deletions and 5,623 insertions within the high-confidence regions of HG002, the son in an Ashkenazim Trio. To comprehensively evaluate SV detection methods, we analyzed short-read and long-read whole genome sequencing (WGS) data from HG002 and 40 samples from the Human Pangenome Reference Consortium (HPRC). For short-read WGS, we integrated results from seven short-read SV detection tools, including Delly, DRAGEN, GRIDSS, Lumpy, Manta, SvABA, and Wham as well as MELT for mobile element insertions (MEIs). For long-read WGS, we employed both alignment-based and assembly-based approaches. Benchmarking results demonstrated that long-read methods outperformed short-read methods. Our results indicate that the short-read SV detection workflow performed robustly across individuals from diverse populations. Our study applied this robust short-read SV detection workflow to 1,480 Taiwan Biobank (TWB) short-read WGS data and identified a total of 90,425 large deletions, 85,390 large insertions, and 11,434 MEIs. Notably, approximately 85% of SVs were rare variants and most located in non-coding regions. We estimated the alpha-thalassemia carrier frequency in the Taiwanese population to be about 5% based on the identified pathogenic deletions. Finally, we used the SV callsets from the 1000 Genomes Project to analyze population structure and confirmed that the population structure revealed by SVs was similar to that identified using single nucleotide variants (SNVs). |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99957 |
| DOI: | 10.6342/NTU202504136 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2027-07-31 |
| 顯示於系所單位: | 基因體暨蛋白體醫學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 9.63 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
