利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度

Ting-Ying Chien; 簡廷因

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	歐陽彥正
dc.contributor.author	Ting-Ying Chien	en
dc.contributor.author	簡廷因	zh_TW
dc.date.accessioned	2021-05-16T16:26:28Z	-
dc.date.available	2015-03-06
dc.date.available	2021-05-16T16:26:28Z	-
dc.date.copyright	2013-03-06
dc.date.issued	2013
dc.date.submitted	2013-02-07
dc.identifier.citation	1. Bulyk, M.L., Computational prediction of transcription-factor binding site locations. Genome Biol, 2003. 5(1): p. 201. 2. Stormo, G.D., DNA binding sites: representation and discovery. Bioinformatics, 2000. 16(1): p. 16-23. 3. Siggia, E.D., Computational methods for transcriptional regulation. Curr Opin Genet Dev, 2005. 15(2): p. 214-21. 4. Xing, E.P. and R.M. Karp, MotifPrototyper: a Bayesian profile model for motif families. Proc Natl Acad Sci U S A, 2004. 101(29): p. 10523-8. 5. Mahony, S., et al., Improved detection of DNA motifs using a self-organized clustering of familial binding profiles. Bioinformatics, 2005. 21 Suppl 1: p. i283-91. 6. Sandelin, A. and W.W. Wasserman, Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J Mol Biol, 2004. 338(2): p. 207-15. 7. Mahony, S., et al., Transcription factor binding site identification using the self-organizing map. Bioinformatics, 2005. 21(9): p. 1807-14. 8. Macisaac, K.D., et al., A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data. Bioinformatics, 2006. 22(4): p. 423-9. 9. Johnson, D.S., et al., Genome-wide mapping of in vivo protein-DNA interactions. Science, 2007. 316(5830): p. 1497-1502. 10. Morozov, A.V., et al., Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res, 2005. 33(18): p. 5781-98. 11. Morozov, A.V. and E.D. Siggia, Connecting protein structure with predictions of regulatory sites. Proc Natl Acad Sci U S A, 2007. 104(17): p. 7068-73. 12. Xu, B., et al., An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins, 2009. 76(3): p. 718-30. 13. Kirchmair, J., et al., The Protein Data Bank (PDB), its related services and software tools as key components for in silico guided drug discovery. J Med Chem, 2008. 51(22): p. 7021-40. 14. van Dijk, M., et al., Information-driven protein-DNA docking using HADDOCK: it is a matter of flexibility. Nucleic Acids Res, 2006. 34(11): p. 3317-25. 15. Liu, Z., et al., Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach. Proteins, 2008. 72(4): p. 1114-24. 16. Gao, M. and J. Skolnick, DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res, 2008. 36(12): p. 3978-92. 17. Zhang, Y. and J. Skolnick, Scoring function for automated assessment of protein structure template quality. Proteins-Structure Function and Bioinformatics, 2004. 57(4): p. 702-710. 18. Chang, C.-C. and C.-J. Lin, LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2011. 2(3): p. 1-27. 19. Kabsch, W. and C. Sander, Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers, 1983. 22(12): p. 2577-2637. 20. Contreras-Moreira, B., P.A. Branger, and J. Collado-Vides, TFmodeller: comparative modelling of protein-DNA complexes. Bioinformatics, 2007. 23(13): p. 1694-1696. 21. Zhang, Y. and J. Skolnick, TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res, 2005. 33(7): p. 2302-9. 22. Xu, J.R. and Y. Zhang, How significant is a protein structure similarity with TM-score=0.5? Bioinformatics, 2010. 26(7): p. 889-895. 23. Morris, G.M., et al., AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility. Journal of Computational Chemistry, 2009. 30(16): p. 2785-2791. 24. Dominguez, C., R. Boelens, and A.M.J.J. Bonvin, HADDOCK: A protein-protein docking approach based on biochemical or biophysical information. Journal of the American Chemical Society, 2003. 125(7): p. 1731-1737. 25. Chen, R., L. Li, and Z.P. Weng, ZDOCK: An initial-stage protein-docking algorithm. Proteins-Structure Function and Genetics, 2003. 52(1): p. 80-87. 26. Endres, R.G., T.C. Schulthess, and N.S. Wingreen, Toward an atomistic model for predicting transcription-factor binding sites. Proteins, 2004. 57(2): p. 262-8. 27. Donald, J.E., W.W. Chen, and E.I. Shakhnovich, Energetics of protein-DNA interactions. Nucleic Acids Res, 2007. 35(4): p. 1039-47. 28. Liu, Z., et al., Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential. Nucleic Acids Res, 2005. 33(2): p. 546-58. 29. Zhang, C., et al., A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. J Med Chem, 2005. 48(7): p. 2325-35. 30. Alamanova, D., P. Stegmaier, and A. Kel, Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. Bmc Bioinformatics, 2010. 11. 31. Brooks, B.R., et al., Charmm - a Program for Macromolecular Energy, Minimization, and Dynamics Calculations. Journal of Computational Chemistry, 1983. 4(2): p. 187-217. 32. Cheatham, T.E. and P.A. Kollman, Molecular dynamics simulation of nucleic acids. Annual Review of Physical Chemistry, 2000. 51: p. 435-471. 33. Havranek, J.J., C.M. Duarte, and D. Baker, A simple physical model for the prediction and design of protein-DNA interactions. J Mol Biol, 2004. 344(1): p. 59-70. 34. Ponder, J.W. and D.A. Case, Force fields for protein simulations. Protein Simulations, 2003. 66: p. 27-+. 35. Zhang, Y. NW-align: A protein sequence-to-sequence alignment program by Needleman-Wunsch algorithm. Available from: http://zhanglab.ccmb.med.umich.edu/NW-align. 36. Morozov, A.V., et al., Protein-DNA binding specificity predictions with structural models. Nucleic Acids Research, 2005. 33(18): p. 5781-5798. 37. Robasky, K. and M.L. Bulyk, UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res, 2011. 39: p. D124-D128. 38. Chen, C.Y., W.C. Chung, and C.T. Su, Exploiting homogeneity in protein sequence clusters for construction of protein family hierarchies. Pattern Recognition, 2006. 39(12): p. 2356-2369. 39. Penrose, R., Shadows of the Mind: A Search for the Missing Science of Consciousness. 1994: Oxford University Press. 40. Gabdoulline, R., et al., 3DTF: a web server for predicting transcription factor PWMs using 3D structure-based energy calculations. Nucleic Acids Research, 2012. 40(W1): p. W180-W185. 41. Redhead, M., The large, the small and the human mind. British Journal for the Philosophy of Science, 2000. 51(4): p. 913-917. 42. Dan, A., Y. Ofran, and Y. Kliger, Large scale analysis of secondary structure changes in proteins suggests a role for disorder to order transitions in nucleotide binding proteins. Proteins: Structure, Function, and Bioinformatics, 2010. 78(2): p. 236-248. 43. Chen, C.Y., et al., Predicting Target DNA Sequences of DNA-Binding Proteins Based on Unbound Structures. PLoS One, 2012. 7(2): p. e30446. 44. Chang, D.T., et al., AH-DB: collecting protein structure pairs before and after binding. Nucleic Acids Res, 2012. 40(Database issue): p. D472-8. 45. Zhang, Y., I-TASSER server for protein 3D structure prediction. Bmc Bioinformatics, 2008. 9. 46. Kim, D.E., D. Chivian, and D. Baker, Protein structure prediction and analysis using the Robetta server. Nucleic Acids Research, 2004. 32: p. W526-W531. 47. Carl, N., et al., Protein-Protein Binding Site Prediction by Local Structural Alignment. Journal of Chemical Information and Modeling, 2010. 50(10): p. 1906-1913. 48. Li, G.H. and J.F. Huang, CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation. Bmc Bioinformatics, 2010. 11.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/6338	-
dc.description.abstract	DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins’ unbound structures (structures of the unbound state). Given an unbound structure of the query protein, the proposed method first aligns this structure to all the template structures to generate synthetic protein-DNA complexes. Then it builds a classifier using support vector machines (SVM) to select the most appropriate complex for PWM prediction. The feature set incorporated in the predicting model includes the similarities between the query and template proteins, structural composition such as percentage of alpha-helix, and the number of residues falling within specific distances between the protein and DNA in the synthetic protein-DNA complex. Once the appropriate complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on 19 DNA-binding proteins which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor that influences the prediction accuracy the most. Moreover, to facilitate the procedure of predicting PWMs based on protein-DNA complexes or even structures of the unbound state, the web server, DBD2BS, is presented. The DBD2BS server provides users with an easy-to-use interface for visualizing the PWMs predicted based on different templates and the spatial relationships of the query protein, the DBDs and the DNAs. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.	en
dc.description.provenance	Made available in DSpace on 2021-05-16T16:26:28Z (GMT). No. of bitstreams: 1 ntu-102-D97922019-1.pdf: 10002226 bytes, checksum: 87bf8cfbef528ef945da4a1ac204b327 (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	中文摘要 i Abstract iii Table of Contents v List of Tables vi List of Figures vii Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Framework of the study 3 1.3 Web server - DBD2BS 5 Chapter 2. Literature review 7 2.1 Protein structure 7 2.2 Binding specificity prediction 10 Chapter 3. Methods 14 3.1 Constructing templates 15 3.2 Constructing superimposed complexes 16 3.3 Building SVM model 17 3.4 The potential function for PWM prediction 19 3.5 Validation Set 22 3.6 Evaluating PWM prediction 24 Chapter 4. Results 25 4.1 Evaluating PWM prediction 25 4.2 Evaluating robustness of the proposed method 37 4.3 Using a SVM model and DBD2BS to improve PWM prediction 44 4.4 Comparison with predictions based on complexes generated by docking 48 4.5 Discussion 54 Chapter 5. Web server 62 5.1 Web interface 62 5.2 Case study - CRP_ECOLI 68 5.3 Case study - Foxk1 72 Chapter 6. Conclusion and suggestion for future direction 79 6.1 Conclusion 79 6.2 Suggestion for future direction 80 References 82
dc.language.iso	en
dc.title	利用機器學習演算法篩選適當模板結構提升預測轉錄因子結合序列特徵之準確度	zh_TW
dc.title	Selecting appropriate template structures to improve precision in predicting protein-DNA binding profiles	en
dc.type	Thesis
dc.date.schoolyear	101-1
dc.description.degree	博士
dc.contributor.coadvisor	陳倩瑜
dc.contributor.oralexamcommittee	趙坤茂,黃乾綱,徐駿森
dc.subject.keyword	DNA結合蛋白質,轉錄因子,蛋白質-DNA結合特徵,以知識為基礎的能量函數,支持向量機,	zh_TW
dc.subject.keyword	DNA-binding proteins,transcription factor,protein-DNA binding profiles,knowledge-based potential function,support vector machines,	en
dc.relation.page	85
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2013-02-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf	9.77 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。