利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度

Ting-Ying Chien; 簡廷因

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338

Title:	利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度 Selecting appropriate template structures to improve precision in predicting protein-DNA binding profiles
Authors:	Ting-Ying Chien 簡廷因
Advisor:	歐陽彥正
Co-Advisor:	陳倩瑜
Keyword:	DNA結合蛋白質,轉錄因子,蛋白質-DNA結合特徵,以知識為基礎的能量函數,支持向量機, DNA-binding proteins,transcription factor,protein-DNA binding profiles,knowledge-based potential function,support vector machines,
Publication Year :	2013
Degree:	博士
Abstract:	DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins’ unbound structures (structures of the unbound state). Given an unbound structure of the query protein, the proposed method first aligns this structure to all the template structures to generate synthetic protein-DNA complexes. Then it builds a classifier using support vector machines (SVM) to select the most appropriate complex for PWM prediction. The feature set incorporated in the predicting model includes the similarities between the query and template proteins, structural composition such as percentage of alpha-helix, and the number of residues falling within specific distances between the protein and DNA in the synthetic protein-DNA complex. Once the appropriate complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on 19 DNA-binding proteins which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor that influences the prediction accuracy the most. Moreover, to facilitate the procedure of predicting PWMs based on protein-DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented. The DBD2BS server provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338
Fulltext Rights:	同意授權(全球公開)
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-102-1.pdf	9.77 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets