請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94434
標題: | Graph-KIR 在真實資料上的評估及在台灣族群之應用 Evaluation of Graph-KIR on Real Data and its Application to Taiwanese Population |
作者: | 王亭堅 Ting-Jian Wang |
指導教授: | 陳倩瑜 Chien-Yu Chen |
關鍵字: | 殺手細胞免疫球蛋白樣受體,次世代定序,拷貝數,等位基因,台灣人, Killer Immunoglobulin-like Receptors,Next generation sequencing,Copy number,Allele,Taiwanese, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 殺手細胞免疫球蛋白樣受體(Killer Immunoglobulin-like Receptors,KIR)基因在免疫系統中扮演關鍵角色,與器官移植、自體免疫疾病及癌症密切相關。KIR 基因表現高度多態性,這使得透過次世代高通量短序列進行個體等位基因型識別充滿挑戰。實驗室先前開發的 Graph-KIR 在生成資料與真實資料的雙重驗證中證實為目前表現最佳的次世代定序(Next-Generation Sequencing,NGS)高通量短序列 KIR 等位基因分型工具,優於其主要競爭對手 PING。然而,Graph-KIR 在真實資料與生成資料的表現存在落差。儘管聚焦在真實資料中非新穎等位基因的部分,其表現仍然不及生成資料。為了釐清 Graph-KIR 在真實資料中遭遇的挑戰,本研究深入分析先前 Graph-KIR 在人類全基因組參考聯盟(Human Pangenome Reference Consortium,HPRC)中發生的錯誤預測案例。HPRC 分析結果顯示,Graph-KIR 在缺失變異區域的短序列回貼表現不佳,缺失變異的忽略或錯誤識別,使正確答案無法被輸出。此外,拷貝數錯誤預測案例的分析顯示了工具對合併基因拷貝數預測的困難。本研究從常見錯誤案例探討算法優化的可能性,並進行了初步的更新與校正,最新版本的 Graph-KIR 在真實資料上表現顯著改善。優化後的算法被運用於 Taiwan Biobank 台灣人 1492 樣本的 KIR 基因分型,測試其在大規模樣本群的實用性。初步結果中,少量樣本因模型缺乏對拷貝數預測深度區間的限制,預測出異常的拷貝數分布。模型經過調整,將二倍體的參考資訊納入適配過程,提升 Graph-KIR 拷貝數預測的自動性與可靠性,並輸出相對合理的結果。最後,本研究統計了 1492 台灣人的全基因定序資料的等位基因分型結果,計算個體中常見的拷貝數分布以及族群中高頻率出現的等位基因型。本研究首次報導的台灣人高精度 7 碼 KIR 等位基因統計,為未來族群 KIR 研究及個人化精準醫療的發展奠定了重要基礎。 Killer Immunoglobulin-like Receptors(KIR) play a crucial role in the immune system and are closely associated with organ transplantation, autoimmune diseases, and cancer. The high polymorphism of KIR gene family makes individual genotyping through next-generation sequencing(NGS) high-throughput short-read a challenging task. The previously developed Graph-KIR has been proven to be the best-performing NGS KIR genotyping tool to date, outperforming its main competitor, PING, in both simulated and real data validation. However, there is a performance gap between simulated and real data for Graph-KIR, with its performance on non-novel alleles in real data still falling short of its performance on simulated data. To elucidate the challenges faced by Graph-KIR in real data, this study conducted an in-depth analysis of the previous erroneous prediction cases of Graph-KIR on Human Pangenome Reference Consortium(HPRC) real data. The HPRC analysis results indicate that Graph-KIR performs poorly in aligning short reads in regions with deletion variants, leading to the incorrect or missed identification of deletion variants and the inability to output correct results. Additionally, the analysis of copy number error prediction cases revealed difficulties in predicting merged gene copy numbers. This study explored the potential for algorithm optimization from common error cases and performed preliminary updates and corrections, resulting in significant performance improvements of the latest version of Graph-KIR on real data. The optimized algorithm was applied to KIR gene typing of 1492 samples from the Taiwan Biobank to verify its practicality on a large sample set. Preliminary results showed a small number of samples with abnormal copy number distributions due to the model's lack of constraints on copy number prediction depth intervals. Diploid reference information was incorporated into the model to limit the fitting interval to 2 copy numbers, enhancing the automaticity and reliability of Graph-KIR copy number prediction. Finally, this study summarized the 7-digit allele typing results of the whole genome sequencing data of 1492 Taiwanese individuals, calculating the common copy number distributions in individuals and the high-frequency allele types in the population, providing important reference data for subsequent KIR studies in the Taiwanese population. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94434 |
DOI: | 10.6342/NTU202403277 |
全文授權: | 同意授權(全球公開) |
電子全文公開日期: | 2026-08-31 |
顯示於系所單位: | 生物機電工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf 此日期後於網路公開 2026-08-31 | 1.93 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。