請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90558完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳倩瑜 | zh_TW |
| dc.contributor.advisor | Chien-Yu Chen | en |
| dc.contributor.author | 蔡毓璁 | zh_TW |
| dc.contributor.author | Yu-Tsung Tsai | en |
| dc.date.accessioned | 2023-10-03T16:37:35Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-10-03 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-08-07 | - |
| dc.identifier.citation | Beck, S., & Trowsdale, J. (2000). The human major histocompatibility complex: lessons from the DNA sequence. Annual review of genomics and human genetics, 1(1), 117-137.
Chen, P.-C., Tsai, H., Bhojanapalli, S., Chung, H. W., Chang, Y.-W., & Ferng, C.-S. (2021). A Simple and Effective Positional Encoding for Transformers. Paper presented at the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Choo, S. Y. (2007). The HLA system: genetics, immunology, clinical testing, and clinical implications. Yonsei medical journal, 48(1), 11-23. Delaneau, O., Marchini, J., & Zagury, J.-F. (2012). A linear complexity phasing method for thousands of genomes. Nature methods, 9(2), 179-181. Graves, A. (2012). Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37-45. Jia, X., Han, B., Onengut-Gumuscu, S., Chen, W.-M., Concannon, P. J., Rich, S. S., . . . de Bakker, P. I. (2013). Imputing amino acid polymorphisms in human leukocyte antigens. PloS one, 8(6), e64683. Medsker, L. R., & Jain, L. (2001). Recurrent neural networks. Design and Applications, 5, 64-67. Naito, T., Suzuki, K., Hirata, J., Kamatani, Y., Matsuda, K., Toda, T., & Okada, Y. (2021). A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nature communications, 12(1), 1639. Ng, S. K., Krishnan, T., & McLachlan, G. J. (2012). The EM algorithm. Handbook of computational statistics: concepts and methods, 139-172. Robinson, J., Halliwell, J. A., McWilliam, H., Lopez, R., Parham, P., & Marsh, S. G. (2012). The imgt/hla database. Nucleic acids research, 41(D1), D1222-D1227. Shiina, T., Hosomichi, K., Inoko, H., & Kulski, J. K. (2009). The HLA genomic loci map: expression, interaction, diversity and disease. Journal of human genetics, 54(1), 15-39. Slatkin, M. (2008). Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 9(6), 477-485. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. Voita, E., Talbot, D., Moiseev, F., Sennrich, R., & Titov, I. (2019). Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418. Zheng, X., Shen, J., Cox, C., Wakefield, J. C., Ehm, M. G., Nelson, M. R., & Weir, B. S. (2014). HIBAG—HLA genotype imputation with attribute bagging. The Pharmacogenomics Journal, 14(2), 192-200. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90558 | - |
| dc.description.abstract | 人類白血球抗原(human leukocyte antigen,簡稱HLA)位於人體第六條染色體的主要組織相容性複合物(major histocompatibility complex,簡稱MHC)上,具有高度複雜性且與傳染性、免疫性以及癌症疾病相關聯,並且HLA也與許多免疫藥物有相互作用。 目前HLA的分型方法對於疾病研究或是免疫相關分析來說,仍需要耗費大量時間和資源,而且價格不斐,這限制了其在臨床診斷和研究中的應用。因此利用單核苷酸多態性(Single Nucleotide Polymorphism,簡稱SNP) 預測HLA的插補方法是一個優秀的方法。HLA的預測工具對於大部分的基因座有著不低的一致性以及準確度,但對於某些特殊的基因座,像是HLA-B,較難以準確預測。基於上述提到HLA與人類健康和疾病的研究高度相關,為了優化整體的訓練效果,本論文開發一個能夠快速且準確預測HLA型態的機器學習模型。本研究使用了Taiwan Biobank 2.0的資料並搭配自然語言處理技術中的變換器 (Transformer )模型作為核心訓練模型,並對其進行了調整和優化,以便更好地適應HLA序列的特徵,並建立了一個根據台灣人特有的SNP點位預測HLA基因型的模型TW2HLA(Transformer With TaiWan HLA data)。TW2HLA優於傳統的HLA預測模型,其效率和準確性都有了顯著提升。TW2HLA還針對稀少的HLA基因型的預測準確性進行了優化。由於低頻HLA基因型較為罕見,現有的模型在預測這些基因型時往往表現不佳。為了解決這個問題,我們訓練了一個能夠在預測罕見HLA基因型方面表現良好的模型。此外本研究與插補模型SNP2HLA、預測模型HIBAG以及機器學習模型DEEPHLA進行比較 ,以準確度與靈敏度作為分析的基準,並關注低頻率HLA基因型的訓練成效,同時也對於模型的訓練資料集大小進行討論與比較。總結來說,本論文開發了一個基於變換器模型的HLA預測模型,其在預測HLA型態方面表現合格,而且可以在大規模HLA檢測中發揮重要作用。這個模型將有助於提高臨床診斷和研究的效率,也有望成為未來HLA相關研究的基礎。 | zh_TW |
| dc.description.abstract | HLA is the name given to the MHC in humans. The HLA system is the human version of the MHC and represents the group of genes on chromosome 6 that codes for the MHC. HLA is extraordinarily complex and is associated with infectious, immune, and cancer-related diseases. Additionally, HLA interacts strongly with immunomodulatory drugs. Currently, HLA genotyping methods require a significant amount of time and resources, making them costly and limiting their use in clinical diagnosis and research related to immunology. Therefore, using a single-nucleotide polymorphism (SNP)-based imputation method to predict HLA alleles is an excellent alternative. HLA prediction tools have high consistency and accuracy for most loci, but for certain loci such as HLA-B is more difficult to predict. Given the high relevance of HLA to human health and diseases, this study aims to develop a machine learning model that can swiftly and accurately predict HLA types to optimize overall training performance. The propose method used a transformer model, a type of natural language processing technology, as the core training model. The developed method was applied on the data from Taiwan Biobank 2.0. The model, TW2HLA, was adjusted and optimized to better adapt to the characteristics of HLA sequences. TW2HLA outperformed traditional HLA prediction models in terms of efficiency and accuracy. TW2HLA also optimized the prediction accuracy of rare HLA genotypes. As low-frequency HLA genotypes are rare, existing models often perform poorly when predicting these genotypes. To address this issue, this study trained a model that performed well in predicting rare HLA genotypes. In addition, this study compared the results with the imputation model SNP2HLA and the prediction model HIBAG and the machine learning models DEEPHLA, using accuracy and sensitivity as the evaluation metrics. Special attention was given to the training performance for low-frequency HLA genotypes. Furthermore, the study also discussed and compared the sizes of the training datasets used in the models. In summary, this thesis developed a HLA prediction model based on transformer models, which performs well in predicting HLA allele types and can play a significant role in large-scale HLA testing. This method will help improve the efficiency of clinical diagnosis and research. It is expected to become the foundation for future HLA-related studies. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-10-03T16:37:35Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-10-03T16:37:35Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
致謝 ii 中文摘要 iii 英文摘要 v 圖目錄 ix 表目錄 x 第一章 研究目的 1 第二章 文獻探討 2 2.1 HLA 2 2.2 IMGT-HLA 資料庫 3 2.3 Transformer 4 2.4 SNP2HLA 6 2.5 DEEPHLA 7 2.6 HIBAG 8 第三章 研究方法 10 3.1 實驗資料 10 3.1.1 Taiwan Biobank Data 10 3.1.2 HLA基因座挑選 10 3.1.3 SNP選取 11 3.1.4 單倍體建構 12 3.1.5 資料前處理 14 3.2 模型訓練 16 3.2.1 訓練資料挑選 16 3.2.2 訓練模型架構 19 3.2.3 訓練流程與結果 19 第四章 結果與討論 21 4.1 訓練結果 21 4.1.1 準確度 21 4.1.2 靈敏度 22 4.1.3 BEAGLE以及SHAPEIT 22 4.1.4 SNP2HLA、DEEPHLA以及HIBAG 25 4.2 結果分析 27 4.2.1 頻率影響 27 4.2.2 驗證集影響 28 4.2.3 SNP位置擴展影響 28 4.2.4 以樣本為單位與HIBAG比較 32 4.2.5 HLA模型結果比較 34 第五章 結論 38 第六章 參考文獻 40 附錄A 42 附錄B 51 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 單核苷酸多態性 | zh_TW |
| dc.subject | 變換器 | zh_TW |
| dc.subject | 機器學習 | zh_TW |
| dc.subject | 人類白血球抗原 | zh_TW |
| dc.subject | Human leukocyte antigen | en |
| dc.subject | Machine learning | en |
| dc.subject | Transformer | en |
| dc.subject | Single Nucleotide Polymorphism | en |
| dc.title | 應用深度學習於 SNP 微陣列資料預測 HLA 對偶基因型 | zh_TW |
| dc.title | Applying deep learning on SNP array data to predict HLA alleles | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳沛隆;楊雅倩;許書睿;許家郎 | zh_TW |
| dc.contributor.oralexamcommittee | Pei-Lung Chen ;Ya-Chian Yang;Shu-Jui Hsu;Chia-Lang Hsu | en |
| dc.subject.keyword | 人類白血球抗原,機器學習,變換器,單核苷酸多態性, | zh_TW |
| dc.subject.keyword | Human leukocyte antigen,Machine learning,Transformer,Single Nucleotide Polymorphism, | en |
| dc.relation.page | 57 | - |
| dc.identifier.doi | 10.6342/NTU202302754 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2023-08-09 | - |
| dc.contributor.author-college | 生物資源暨農學院 | - |
| dc.contributor.author-dept | 生物機電工程學系 | - |
| dc.date.embargo-lift | 2025-12-31 | - |
| 顯示於系所單位: | 生物機電工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf 未授權公開取用 | 2.12 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
