從一級結構預測DNA結合蛋白之標的序列

Chih-Wei Lin; 林志瑋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9380

標題:	從一級結構預測DNA結合蛋白之標的序列 Predicting target sequences of DNA-binding proteins based on primary structure
作者:	Chih-Wei Lin 林志瑋
指導教授:	歐陽彥正
共同指導教授:	陳倩瑜
關鍵字:	DNA結合蛋白, DNA-binding protein,
出版年 :	2011
學位:	碩士
摘要:	結合特定DNA序列的蛋白質在基因調控中扮演重要的角色，利用計算方法預測或設計生物實驗尋找這些DNA結合蛋白質的標的序列可以幫助我們了解基因調控如何進行，並解釋基因組中序列的變異如何擾亂正常的基因表現。位置頻率矩陣 (position frequency matrix) 是最常被拿來描述這些標的序列的模型，對大部分的物種而言，截至目前為止，只有一小部分的轉錄因子已經從相關生物實驗中取得這樣的模型。由於生物實驗往往需要高資金與人力成本，因此，如何利用計算方法準確預測位置頻率矩陣，加速這個研究領域的進展，一直以來是生物資訊學家非常關心的研究議題之一。這篇論文針對這個問題，提出一個利用蛋白質DNA複合物結構與紀錄不同胺基酸和核酸之間結合偏好的知識庫去預測DNA結合蛋白之標的序列的新方法。當我們拿到一條蛋白質序列，會先挑選一個適當的樣板複合物結構，接著利用該樣板與所得之知識庫進行位置頻率矩陣的預測。這篇論文使用了兩組資料去評估新方法的表現，和其他利用三級結構的方法比較起來，這篇論文提出的新方法可以達到和它們一樣的預測效果；但若與另一個同樣以序列資訊為基礎且利用已知位置頻率矩陣訓練所得之預測模型相比，本論文所提之方法表現略差。由於現存這些以序列資訊為基礎的預測方法仍各有其侷限處，本論文所提之方法，仍可幫助一些相關的研究，針對其同源序列已有蛋白質DNA複合物結構之蛋白質序列預測其標的序列，所得之預測結果將有助於相關研究之進行。 Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase. The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9380
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf	1.16 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。