Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83531
Title: | 應用全基因體定序資料鑑定人類白血球抗原基因型 Human Leukocyte Antigen Genotyping Using Whole Genome Sequencing Data |
Authors: | Yi-Hsuan Tseng 曾宜萱 |
Advisor: | 許書睿(Shu-Jui Hsu) |
Keyword: | 人類白血球抗原,全基因體定序,基因型鑑定,演算法, Human Leukocyte Antigen (HLA),Whole-Genome Sequencing (WGS),Genotyping,Algorithm, |
Publication Year : | 2022 |
Degree: | 碩士 |
Abstract: | 人類白血球抗原 (HLA) 之基因型鑑定對於各種臨床應用至關重要包括器官移植、疾病診斷和藥物過敏反應預測等。隨著次世代定序 (NGS) 技術的大幅躍進,許多以全基因體定序 (WGS) 為基礎的 HLA 基因分型演算法已被開發,包括 HLA-VBSeq、Kourami 和 HISAT-genotype。本研究旨在比較和建立適用於台灣人群之基於 WGS 的 HLA 基因分型演算法和生物資訊分析流程並準確預測臺灣人體生物資料庫 (TWB) 中無 HLA 基因分型資料的 613 個樣本之 HLA 基因型。 本論文使用 TWB 中 883 例個案之 WGS 和 HLA 基因分型資料, 探討 HLA 基因分型演算法錯誤分型之模式和可能的原因,其中以人類基因體參考序列 hs37d5 和 hs38DH 進行序列比對,同時觀察到區域性序列重新比對結合序列質量重新校正對於獲得更高之 HLA 基因分型靈敏度是必需的。在 G 組解析度下,Kourami 在 HLA I 類和 II 類等位基因上具有最高的靈敏度,為 97%,其次是 HISAT-genotype (96%) 和 HLA-VBSeq (93%)。在兩域解析度下,HISAT-genotype 分別在等位基因 HLA-A (96%, 1632/1706), HLA-B (97%, 1676/1722), HLA-DPA1 (96%, 1610/1682), HLA-DPB1 (95%, 1234/1296), HLA-DQA1 (98%, 1628/1666) 和 HLA-DQB1 (98%, 1652/1682) 上具有最高的靈敏度。相反,HLA-VBSeq 則在 HLA-C 和 HLA-DRB1 等位基因上的靈敏度最高,分別為 98% (1710/1746) 和 84% (1040/1236)。此結果表明 HISAT-genotype 和 HLA-VBSeq 在 HLA 基因分型上具有互補特性。我們發現 HISAT-genotype 和 HLA-VBSeq 在兩域解析度下除 HLA-DRB1 等位基因之外皆可以獲得至少 95% 的靈敏度。總體而言,不同的 HLA 基因分型演算法在不同的 HLA 基因中各具優勢。因此,我們建議使用不同具有互補性之基於 WGS 的 HLA 基因分型演算法來進行 HLA 基因型的鑑定。 此外,我們對 HISAT-genotype 在兩域解析度下中特定的不一致模式進行分析,並使用前五個高頻不一致模式作為檢查點來過濾並校正錯誤的 HLA 基因分型結果。在重新校正的?果中,HISAT-genotype 的靈敏度在 HLA-C (98%, 1694/1722)、HLA-DPA1 (99.8%, 1680/1682)、HLA-DPB1 (97%, 1252/1296) 和 HLA-DRB1 (86%, 1058/1226) 得到提升。使 HISAT-genotype 在所有可評估的 HLA 基因座上具有最高的靈敏度,在 HLA I 類中的總體靈敏度高於 96%,在 HLA II 類 中高於 97%(HLA-DRB1 除外,靈敏度為 86%)。這是目前我們建立的基於 WGS 的 HLA 基因分型演算法之生物資訊分析流程可實現的最高 HLA 基因分型靈敏度。 綜上所述,我們建議首先通過 HISAT-genotype 進行 HLA 基因分型。產生的 HLA 基因分型結果需要通過我們設置的檢查點重新校正。然後,重新校正的結果還需要參考 HLA-VBSeq 的 HLA-C 和 HLA-DPA1 之 HLA 基因分型結果來重新確認。我們將使用這個建議的 HLA 基因分型分析流程來準確報告 TWB 中預測樣本之 HLA 基因型。 Human leukocyte antigen (HLA) genotyping is critical for various clinical applications, such as organ transplantation, disease diagnosis, and prediction of drug hypersensitivity reactions. With the significant progress in next-generation sequencing (NGS) technology, many whole-genome sequencing (WGS)-based HLA genotyping algorithms have been developed, including HLA-VBSeq, Kourami, and HISAT-genotype. This study aims to compare and establish the best available WGS-based HLA genotyping workflow for the Taiwanese population and accurately predict the HLA genotype of 613 samples with no HLA genotyping data in TWB. We investigated patterns and possible causes of miscalls using 883 individuals from the Taiwan biobank (TWB) whose HLA genotypes and WGS data are available. We used human reference genome hs37d5/hs38DH for reads mapping and observed reads local realignment combined quality recalibration was necessary to achieve higher HLA genotyping sensitivity. At the G group resolution, Kourami showed the highest sensitivity of 97% for HLA class I and II alleles, followed by 96% in HISAT-genotype and 93% in HLA- VBSeq. At the two field resolution, HISAT-genotype showed the highest sensitivity on the HLA-A (96%, 1632/1706), HLA-B (97%, 1676/1722), HLA-DPA1 (96%, 1610/1682), HLA-DPB1 (95%, 1234/1296), HLA-DQA1 (98%, 1628/1666), and HLA-DQB1 (98%, 1652/1682) alleles, respectively. In contrast, HLA-VBSeq showed the highest sensitivity on the HLA-C and HLA-DRB1 alleles, with 98% (1710/1746) and 84% (1040/1236), respectively. Our results indicated genotyping complementary between HISAT-genotype and HLA-VBSeq algorithms. We found that HISAT-genotype and HLA-VBSeq can obtain at least 95% sensitivity at the two-field level except for HLA-DRB1 loci. Overall, different algorithms had their advantages in different HLA genes. Therefore, we recommend using complementary algorithms for WGS-based HLA genotyping. In addition, we analyzed specific discordant patterns in HISAT-genotype at the two-field resolution. We used the top five high-frequency discordant patterns as checkpoints to filter and correct miscalled HLA genotyping results. In recalibrated results, the sensitivity of the HISAT-genotype improved at the HLA-C(98%, 1694/1722)、HLA- DPA1(99.8%, 1680/1682)、HLA-DPB1(97%, 1252/1296) and HLA-DRB1(86%, 1058/1226). Enabled HISAT-genotype to have the highest sensitivity at all evaluable HLA loci, with an overall sensitivity of higher than 96% at HLA Class I and above 97% at HLA Class II (except for HLA-DRB1 with 86% sensitivity). That was the highest HLA genotyping sensitivity achievable in our established WGS-based HLA genotyping workflow. In conclusion, we suggested performing HLA genotyping through HISAT-genotype first. The produced HLA genotyping results need to recalibrate through our set checkpoints. Then, the recalibrated results also need to reconfirm by referring to HLA genotyping results at HLA-C and HLA-DPA1 from HLA-VBSeq. We will use this suggested HLA genotyping workflow to report the HLA genotype for prediction samples in TWB accurately. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/83531 |
DOI: | 10.6342/NTU202202955 |
Fulltext Rights: | 未授權 |
Appears in Collections: | 基因體暨蛋白體醫學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
U0001-3008202200425800.pdf Restricted Access | 46.65 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.