利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度

Ting-Ying Chien; 簡廷因

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338

標題:	利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度 Selecting appropriate template structures to improve precision in predicting protein-DNA binding profiles
作者:	Ting-Ying Chien 簡廷因
指導教授:	歐陽彥正
共同指導教授:	陳倩瑜
關鍵字:	DNA結合蛋白質,轉錄因子,蛋白質-DNA結合特徵,以知識為基礎的能量函數,支持向量機, DNA-binding proteins,transcription factor,protein-DNA binding profiles,knowledge-based potential function,support vector machines,
出版年 :	2013
學位:	博士
摘要:	DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins’ unbound structures (structures of the unbound state). Given an unbound structure of the query protein, the proposed method first aligns this structure to all the template structures to generate synthetic protein-DNA complexes. Then it builds a classifier using support vector machines (SVM) to select the most appropriate complex for PWM prediction. The feature set incorporated in the predicting model includes the similarities between the query and template proteins, structural composition such as percentage of alpha-helix, and the number of residues falling within specific distances between the protein and DNA in the synthetic protein-DNA complex. Once the appropriate complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on 19 DNA-binding proteins which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor that influences the prediction accuracy the most. Moreover, to facilitate the procedure of predicting PWMs based on protein-DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented. The DBD2BS server provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf	9.77 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。