Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90340
Title: | 台灣人體生物資料庫1,484人全基因體定序偵測大片段缺失與插入變異資訊 Germline large deletions and insertions in 1,484 Taiwan Biobank whole genome sequence data |
Authors: | 陳品瑄 Pin-Xuan Chen |
Advisor: | 許書睿 Shu-Jui Hsu |
Keyword: | 生物資訊學,結構性變異,工具表現比對,族群基因體學,台灣人體生物資料庫, bioinformatics,structural variation,benchmarking,population genetics,Taiwan Biobank, |
Publication Year : | 2023 |
Degree: | 碩士 |
Abstract: | 遺傳性結構性變異 (germline structural variations, SV) 的定義為基因體上長度超過五十個鹼基的大片段改變,和單點核苷酸變異 (single-nucleotide variant, SNV) 相比具有複雜的變異型態、對基因作用的影響範圍也較大。近年來許多大型基因體定序計畫,如基因組匯總資料庫 (Genome Aggregation Database, gnomAD),利用大規模的多族群樣本進行結構性變異的偵測,並探勘其族群等位基因頻率 (allele frequency) 與功能影響。但在定序分析層面,目前仍未有高可信度且準確的單一方法,可以衡量不同演算法的偵測表現;在族群基因體層面,目前也沒有針對台灣族群進行的大規模結構性變異偵測研究。此研究中我們利用Genome in a Bottle Consortium (GIAB) 釋出位於國際標準品HG002基因體上之大片段缺失 (deletion) 與插入 (insertion) 兩種變異位點的資料,來衡量九種基於不同演算法開發的工具表現。本論文比較工具的偵測能力與特性差異,並建議將所有工具結果合併可達到最佳偵測靈敏度,在缺失和插入變異的召回率 (recall) 可分別達到82%和67%。更進一步將分析策略應用至1,484個台灣人體生物資料庫 (Taiwan Biobank) 的全基因體定序樣本,總共偵測到81,268個缺失與81,235個插入變異、平均每人帶有7,526個大片段缺失與插入兩種結構性變異;從等位基因頻率分析,84%屬於等位基因小於1%的罕見變異 (rare variant),更有48%屬於僅一人帶有的單例變異 (singleton)。通過計算族群等位基因頻率並與基因組匯總資料庫 (gnomAD) 資料庫中東亞族群的變異頻率進行比較,驗證兩資料集的等位基因頻率高度一致,在缺失與插入變異的相關係數分別為0.93和0.89。此外透過變異對基因的影響分析,我們發現某些變異發生在ACMG secondary finding list中具有潛在罹病風險的基因位點,也評估台灣人常見之遺傳疾病甲型地中海型貧血的攜帶率 (carrier rate) 約為5.12%,和過去研究結果相似。
本篇研究建立高可信度的比對流程並評估最佳化偵測策略以協助建立台灣族群的結構性變異資料庫,同時計算族群變異的等位基因頻率。未來將進一步釋出變異資料集與等位基因頻率資訊,以應用於臨床分子診斷及相關研究。 Structural variants (SVs) are defined as genomic changes larger than 50 bp and have a functional impact on human genomes. Large public databases such as The Genome Aggregation Database (gnomAD) have constructed population SV profile using different detection methods. However, there was no standard method to evaluate SV calling tools with different algorithms and no SV profiling specifically constructed for Taiwanese. We utilized a benchmark truth set with a collection of insertion /deletion variants (NIST_SV0.6) on HG002 released by Genome in A Bottle (GIAB) consortium to evaluate nine reputable SV callers. Our results suggested the combination of multiple callers was the most sensitive strategy which increased the overall recall rate to 82% for deletions, and 67% for insertions at most. A total of 81,268 deletions and 81,235 insertions were discovered from 1,484 short-read whole genome sequencing Taiwan Biobank samples, with 7,526 SVs per individual on average. Through cohort frequency, we found 84% of SVs were rare, 48% were singleton among Taiwanese. Moreover, the population SV allele frequency was analyzed and compared to the East Asia population in gnomAD-SV database, with correlation coefficients 0.93 in deletion and 0.89 in insertion, respectively. Finally, we annotated the SV call set and found some SVs had overlapped with genes in ACMG secondary finding lists. In addition, we estimated 5.12% of carrier rate for alpha thalassemia in Taiwanese which was similar to previous predictions. In conclusion, this study utilized a robust benchmarking process and constructed an optimized SV detection workflow for Taiwan Biobank WGS data. Through the cohort SV profile, we characterized the allele frequency distribution in general population. The cohort SV call set and the population allele frequency will be released to facilitate the molecular diagnosis and human genetic researches for Taiwanese population. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90340 |
DOI: | 10.6342/NTU202302413 |
Fulltext Rights: | 未授權 |
Appears in Collections: | 基因體暨蛋白體醫學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-111-2.pdf Restricted Access | 4.91 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.