利用序列特徵探勘預測蛋白質-蛋白質互動鍵結區之配對

Chien-Chieh Lin; 林千捷

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40340

Title:	利用序列特徵探勘預測蛋白質-蛋白質互動鍵結區之配對 Prediction of Paired Binding Regions in Protein-Protein Interactions by Sequential Pattern Mining
Authors:	Chien-Chieh Lin 林千捷
Advisor:	陳倩瑜
Keyword:	蛋白質-蛋白質互相作用,生物序列, protein-protein interaction,biological sequences,
Publication Year :	2008
Degree:	碩士
Abstract:	近年來由於基因體定序計畫的迅速發展，提供大量的序列資訊，在此潮流下，若能從胺基酸序列資訊直接預測蛋白質－蛋白質互動鍵結區，將可幫助生物學家建立正確的調控網路或代謝路徑，有助於許多相關研究之發展。在此研究中，我們提出了利用尋找同源序列組中共同保留之胺基酸配對為基礎之的兩階段序列特徵探勘方法，鎖定一對已知在空間中有互相接觸作用的蛋白質，預測發生於它們之間的蛋白質－蛋白質互動鍵結區。在本論文中，我們收集三組共41個已知有互相作用的蛋白質配對，利用我們所提出之預測方法進行預測。對此41組測試配對，我們在第一階段的探勘中共決定了128個具有高度保留性且極可能為互動鍵結區之區塊，以此接續第二階段的計算後，其中60個高度保留區塊具有跨序列的配對特徵。我們利用現有的蛋白質結構檔驗證所預測之蛋白質－蛋白質互動鍵結區，在設定成功門檻為預測兩兩互動鍵結區之距離為10 Å下，共有33個例子在探勘結果中有出現可以準確指出相互靠近的蛋白質互動鍵結區之序列特徵，準確率達56% (33/60)。若我們僅擷取此60組資料中，具有較相似的演化速率之配對進行預測(共33組)，則準確率更可提升至72% (24/33)。實驗結果顯示，一旦具備了兩階段序列特徵探勘所需資料，便得以利用跨序列特徵探勘方式，指出可能產生蛋白質－蛋白質互動之鍵結區，而如何從現有的探勘結果進一步縮小預測範圍將是下一個重要研究議題。 Abstract Recent advances in fully sequenced genomes have provided a huge amount of accessible sequence information. It raises a great challenge to detect the interface residues participating in protein-protein interactions directly from the primary structures, the amino acid sequences. To address the problem, we propose a two-phase pattern mining method to predict the interacting regions of a pair of proteins, which are known to have physical interactions, based on the co-occurrence of residues found in a set of concatenated protein homologues. Once a valid training data can be prepared, it is potential to recognize the interacting regions by the patterns that cross two proteins. In this thesis, we apply the proposed approach to 41 protein pairs from three different data sets. The performance of the proposed method is evaulated by calculating the distance between the predicted paired interacting regions from different protein chains in existing structure complexes. In summary, we predicted 128 conserved regions in the first phase of mining, where 60 of them can find their potential partners among the patterns derived in the second phase. Thirty three of the predicted interacting pairs are found to be within 10 Å in available complexes, resulting an accuracy of 56% (33/60). If we only trust the mining results from protein pairs with similar evolution rates, our method can deliver an accuracy of 72% (24/33). This reveals the potential of our method and suggests that how to incorporating other useful information to refine the current predictions deserves more studies in the future.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40340
Fulltext Rights:	有償授權
Appears in Collections:	生物機電工程學系

Files in This Item:

File	Size	Format
ntu-97-1.pdf Restricted Access	1.58 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets