Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 生物機電工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85876
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳倩瑜(CHIEN-YU CHEN)
dc.contributor.authorYU-WEI LIUen
dc.contributor.author劉又瑋zh_TW
dc.date.accessioned2023-03-19T23:27:19Z-
dc.date.copyright2022-09-26
dc.date.issued2022
dc.date.submitted2022-09-23
dc.identifier.citationDevlin, J., M.-W. Chang, K. Lee and K. Toutanova (2018) 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.' arXiv:1810.04805. Elnaggar, A., M. Heinzinger, C. Dallago, G. Rihawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, D. Bhowmik and B. Rost (2020) 'ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing.' arXiv:2007.06225. Henikoff, S. and J. G. Henikoff (1992). 'Amino acid substitution matrices from protein blocks.' Proc Natl Acad Sci U S A 89(22): 10915-10919. Krogsgaard, M. and M. M. Davis (2005). 'How T cells 'see' antigen.' Nature Immunology 6(3): 239-245. Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut (2019) 'ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.' arXiv:1909.11942. Lee, K.-H., Y.-C. Chang, T.-F. Chen, H.-F. Juan, H.-K. Tsai and C.-Y. Chen (2021). 'Connecting MHC-I-binding motifs with HLA alleles via deep learning.' Communications Biology 4(1): 1194. Lu, T., Z. Zhang, J. Zhu, Y. Wang, P. Jiang, X. Xiao, C. Bernatchez, J. V. Heymach, D. L. Gibbons, J. Wang, L. Xu, A. Reuben and T. Wang (2021). 'Deep learning-based prediction of the T cell receptor–antigen binding specificity.' Nature Machine Intelligence 3(10): 864-875. Montemurro, A., V. Schuster, H. R. Povlsen, A. K. Bentzen, V. Jurtz, W. D. Chronister, A. Crinklaw, S. R. Hadrup, O. Winther, B. Peters, L. E. Jessen and M. Nielsen (2021). 'NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data.' Communications Biology 4(1): 1060. Nielsen, M., C. Lundegaard, T. Blicher, K. Lamberth, M. Harndahl, S. Justesen, G. Røder, B. Peters, A. Sette, O. Lund and S. Buus (2007). 'NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence.' PLOS ONE 2(8): e796. Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li and P. J. Liu (2019) 'Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.' arXiv:1910.10683. Rao, R., J. Meier, T. Sercu, S. Ovchinnikov and A. Rives (2020). 'Transformer protein language models are unsupervised structure learners.' bioRxiv: 2020.2012.2015.422761. Reynisson, B., B. Alvarez, S. Paul, B. Peters and M. Nielsen (2020). 'NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data.' Nucleic Acids Res 48(W1): W449-w454. Shugay, M., D. V. Bagaev, I. V. Zvyagin, R. M. Vroomans, J. C. Crawford, G. Dolton, E. A. Komech, A. L. Sycheva, A. E. Koneva, E. S. Egorov, A. V. Eliseev, E. Van Dyk, P. Dash, M. Attaf, C. Rius, K. Ladell, J. E. McLaren, K. K. Matthews, E B. Clemens, D. C. Douek, F. Luciani, D. van Baarle, K. Kedzierska, C. Kesmir, P. G. Thomas, D. A. Price, A. K. Sewell and D. M. Chudakov (2017). 'VDJdb: a curated database of T-cell receptor sequences with known antigen specificity.' Nucleic Acids Research 46(D1): D419-D427. Springer, I., N. Tickotsky and Y. Louzoun (2021). 'Contribution of T Cell Receptor Alpha and Beta CDR3, MHC Typing, V and J Genes to Peptide Binding Prediction.' Frontiers in Immunology 12. Tickotsky, N., T. Sagiv, J. Prilusky, E. Shifrut and N. Friedman (2017). 'McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences.' Bioinformatics 33(18): 2924-2929. Vita, R., S. Mahajan, J. A. Overton, S. K. Dhanda, S. Martini, J. R. Cantrell, D. K. Wheeler, A. Sette and B. Peters (2019). 'The Immune Epitope Database (IEDB): 2018 update.' Nucleic Acids Res 47(D1): D339-d343.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85876-
dc.description.abstract預測 T 細胞受體 (T cell receptor,TCR ) 與主要組織相容性複合物(Major histocompatibility complex,MHC) 和胜肽(Peptide) 結合的相互作用,仍然是極具挑戰性的計算問題。這一挑戰主要源於三個主要因素:實驗數據準確性、稀缺性和問題本身的高複雜性。一般而言,關於新生抗原(Neoantigen)和抗原生物學中未解決的基本問題之一是:為什麼並非所有新生抗原或抗原都會引發 T 細胞反應,對此,如果能準確預測新生抗原/抗原和 TCR 之間相互作用,將對於了解癌症進展、預後和對免疫治療的反應之相關研究至關重要。另一方面,近期許多自然語言處理(Natural Language Processing,NLP)相關研究顯示,可將蛋白質序列視為句子,而將胺基酸視為單詞,因此,許多相關研究開始嘗試使用類似自然語言處理的技術,從蛋白質序列數據庫中提取有用的生物信息。日前,有一些可公開使用的蛋白質語言預訓練模型被釋出,而且已被證明有助於各種下游預測任務。因此,本研究旨於建立了一個以蛋白質語言模型ProtBert 為編碼基礎的預測模型,預測由 I 類主要組織相容性複合物呈現的新生抗原和一般 T 細胞抗原的 TCR 結合特異性。本研究針對兩個預測問題,一個是預測MHC-I和peptide的結合問題,一個是TCR和peptide-MHC(pMHC)的結合問題,比較不同編碼方式,結果顯示蛋白質語言模型在兩個問題上都可以提升預測準確率。最終,本研究提出搭配集成學習,進一步提升以ProtBert為基礎的預測模型之準確性,期望能強化預測T細胞受體與抗原結合特異性之後續應用。zh_TW
dc.description.abstractPredicting the interaction of T cell receptors (TCR) with complexes of major histocompatibility and peptide (pMHC) remains challenging. This challenge involves three main issues: accuracy of data, sparse and problem complexity. One of the fundamental and unanswered question about neoantigen and antigen is why not all antigen elicits T cell responses although the peptide might have been present on the MHC cell surface. Accurate and comprehensive characterization of the interactions between neoantigen/antigen and TCR is critical for understanding cancer progressions, prognosis, and the response of immunotherapy. On the other hand, many recent NLP studies have shown that protein sequences can be regarded as sentences and amino acids as words. In this regard, researchers can use natural language processing to extract biological information from protein sequence databases. Recently, there are some successful pre-training protein language models publicly available. This study then developed a prediction model based on protein language model ProtBert to predict TCR binding specificity of neoantigen/antigen presented by major histocompatibility complex class I. The results demonstrated that using protein language model can improve the accuracy of prediction on both problems: predicting MHC-peptide binding and TCR-pMHC binding. Moreover, this study integrated ensemble learning to further improve the prediction accuracy. The ProtBert-based ensemble model is expected to facilitate the immunogenomics studies related to TCR binding in the near future.en
dc.description.provenanceMade available in DSpace on 2023-03-19T23:27:19Z (GMT). No. of bitstreams: 1
U0001-2109202223334500.pdf: 1690073 bytes, checksum: 4bac6996d181da674d394497973d341a (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents致謝 i 摘要 ii Abstract iii 目錄 iv 圖目錄 vii 表目錄 ix 第一章 研究目的 1 第二章 文獻探討 3 2.1 TCR與peptide-MHC複合物結合 3 2.2 自然語言處理 4 2.3 蛋白質語言模型 5 2.4 TCR-pMHC 資料庫 6 2.4.1 VDJdb 6 2.4.2 McPAS-TCR 7 2.5 TCR-pMHC 結合預測工具 7 2.5.1 NetTCR 7 2.5.2 ERGO-II 8 2.5.3 pMTnet 9 第三章 研究方法 11 3.1資料介紹 11 3.1.1 NetMHCpan 資料收集 11 3.1.2 pMTnet 資料收集 12 3.2實驗模型 14 3.3實驗流程 16 3.3.1 比較不同蛋白質編碼工具在NetMHCpan 資料預測上影響 16 3.3.2 比較不同蛋白質編碼工具在pMTnet資料預測上影響 19 3.3.3 比較不同MHC-I 長度對pMTnet 資料預測上影響 20 3.3.4 比較不同填充(padding)方式對pMTnet 資料預測上影響 22 3.3.5 探討在訓練時刪掉特定的等位基因群對測試集的影響 22 3.3.6 訓練結果評估 22 第四章 結果與討論 24 4.1 NetMHCpan 資料分析 24 4.2 pMTnet資料分析 25 4.3 探討MHC 長度對AUC 的影響 27 4.4 探討不同填充方式對AUC 的影響 29 4.5 Ensemble ProtBert為編碼基礎的模型對AUC的影響 30 4.6 訓練時刪除特定的等位基因群對預測的影響 32 4.7 探討加入TCR的資訊對AUC的影響 35 第五章 結論 36 第六章 參考文獻 37
dc.language.isozh-TW
dc.title利用深度學習預測T細胞受體與抗原結合的特異性zh_TW
dc.titleUsing deep learning to predict antigen binding specificity of T-cell receptorsen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳沛隆(PEI-LUNG CHEN),許書睿(SHU-RUEI SHIU),許家郎(JIA-LANG SHIU),楊雅倩(YA-CHIAN YANG)
dc.subject.keywordT細胞受體,一類主要組織相容性複合物,胜肽,zh_TW
dc.subject.keywordTCR,TCR-pMHC,MHC-I,peptide,en
dc.relation.page38
dc.identifier.doi10.6342/NTU202203779
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-09-25
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept生物機電工程學系zh_TW
dc.date.embargo-lift2022-09-26-
顯示於系所單位:生物機電工程學系

文件中的檔案:
檔案 大小格式 
U0001-2109202223334500.pdf1.65 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved