Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 生物機電工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85876
Title: 利用深度學習預測T細胞受體與抗原結合的特異性
Using deep learning to predict antigen binding specificity of T-cell receptors
Authors: YU-WEI LIU
劉又瑋
Advisor: 陳倩瑜(CHIEN-YU CHEN)
Keyword: T細胞受體,一類主要組織相容性複合物,胜肽,
TCR,TCR-pMHC,MHC-I,peptide,
Publication Year : 2022
Degree: 碩士
Abstract: 預測 T 細胞受體 (T cell receptor,TCR ) 與主要組織相容性複合物(Major histocompatibility complex,MHC) 和胜肽(Peptide) 結合的相互作用,仍然是極具挑戰性的計算問題。這一挑戰主要源於三個主要因素:實驗數據準確性、稀缺性和問題本身的高複雜性。一般而言,關於新生抗原(Neoantigen)和抗原生物學中未解決的基本問題之一是:為什麼並非所有新生抗原或抗原都會引發 T 細胞反應,對此,如果能準確預測新生抗原/抗原和 TCR 之間相互作用,將對於了解癌症進展、預後和對免疫治療的反應之相關研究至關重要。另一方面,近期許多自然語言處理(Natural Language Processing,NLP)相關研究顯示,可將蛋白質序列視為句子,而將胺基酸視為單詞,因此,許多相關研究開始嘗試使用類似自然語言處理的技術,從蛋白質序列數據庫中提取有用的生物信息。日前,有一些可公開使用的蛋白質語言預訓練模型被釋出,而且已被證明有助於各種下游預測任務。因此,本研究旨於建立了一個以蛋白質語言模型ProtBert 為編碼基礎的預測模型,預測由 I 類主要組織相容性複合物呈現的新生抗原和一般 T 細胞抗原的 TCR 結合特異性。本研究針對兩個預測問題,一個是預測MHC-I和peptide的結合問題,一個是TCR和peptide-MHC(pMHC)的結合問題,比較不同編碼方式,結果顯示蛋白質語言模型在兩個問題上都可以提升預測準確率。最終,本研究提出搭配集成學習,進一步提升以ProtBert為基礎的預測模型之準確性,期望能強化預測T細胞受體與抗原結合特異性之後續應用。
Predicting the interaction of T cell receptors (TCR) with complexes of major histocompatibility and peptide (pMHC) remains challenging. This challenge involves three main issues: accuracy of data, sparse and problem complexity. One of the fundamental and unanswered question about neoantigen and antigen is why not all antigen elicits T cell responses although the peptide might have been present on the MHC cell surface. Accurate and comprehensive characterization of the interactions between neoantigen/antigen and TCR is critical for understanding cancer progressions, prognosis, and the response of immunotherapy. On the other hand, many recent NLP studies have shown that protein sequences can be regarded as sentences and amino acids as words. In this regard, researchers can use natural language processing to extract biological information from protein sequence databases. Recently, there are some successful pre-training protein language models publicly available. This study then developed a prediction model based on protein language model ProtBert to predict TCR binding specificity of neoantigen/antigen presented by major histocompatibility complex class I. The results demonstrated that using protein language model can improve the accuracy of prediction on both problems: predicting MHC-peptide binding and TCR-pMHC binding. Moreover, this study integrated ensemble learning to further improve the prediction accuracy. The ProtBert-based ensemble model is expected to facilitate the immunogenomics studies related to TCR binding in the near future.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85876
DOI: 10.6342/NTU202203779
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2022-09-26
Appears in Collections:生物機電工程學系

Files in This Item:
File SizeFormat 
U0001-2109202223334500.pdf1.65 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved