以序列為基礎建構小型DNA結合區域之蛋白質與DNA交互作用模型

Ai-Mi Chen; 陳艾彌

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47132

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳倩瑜(Chien-Yu Chen)
dc.contributor.author	Ai-Mi Chen	en
dc.contributor.author	陳艾彌	zh_TW
dc.date.accessioned	2021-06-15T05:48:29Z	-
dc.date.available	2012-08-20
dc.date.copyright	2010-08-20
dc.date.issued	2010
dc.date.submitted	2010-08-18
dc.identifier.citation	Ahmad, S., Gromiha, M.M., Sarai, A., 2004. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20:477-486. Ahmad, S., Sarai, A., 2005. PSSM-based prediction of DNA binding sites in proteins. Bmc Bioinformatics 6:-. Baldwin, R.L., 2007. Energetics of protein folding. J Mol Biol 371:283-301. Berman, H.M., Battistuz, T., Bhat, T.N., Bluhm, W.F., Bourne, P.E., Burkhardt, K., Iype, L., Jain, S., Fagan, P., Marvin, J., Padilla, D., Ravichandran, V., Schneider, B., Thanki, N., Weissig, H., Westbrook, J.D., Zardecki, C., 2002. The Protein Data Bank. Acta Crystallogr D 58:899-907. Blancafort, P., Segal, D.J., Barbas, C.F., 3rd, 2004. Designing transcription factor architectures for drug discovery. Mol Pharmacol 66:1361-1371. Boyer, R.F., 2002. Concepts in biochemistry. Brooks/Cole Thomson Learning, Pacific Grove, Calif. Bradley, P., Misura, K.M., Baker, D., 2005a. Toward high-resolution de novo structure prediction for small proteins. Science 309:1868-1871. Bradley, P., Misura, K.M.S., Baker, D., 2005b. Toward high-resolution de novo structure prediction for small proteins. Science 309:1868-1871. Chakravarty, S., Hutson, A.M., Estes, M.K., Prasad, B.V.V., 2005. Evolutionary trace residues in noroviruses: Importance in receptor binding, antigenicity, virion assembly, and strain diversity. J Virol 79:554-568. Chang, D.T.H., Huang, H.Y., Syu, Y.T., Wu, C.P., 2008. Real value prediction of protein solvent accessibility using enhanced PSSM features. Bmc Bioinformatics 9:-. Chen, H.L., Zhou, H.X., 2005. Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data. Proteins 61:21-35. Contreras-Moreira, B., Branger, P.A., Collado-Vides, J., 2007. TFmodeller: comparative modelling of protein-DNA complexes. Bioinformatics 23:1694-1696. Cooper, S., 1981. The central dogma of cell biology. Cell Biol Int Rep 5:539-549. Das, R., Baker, D., 2008a. Macromolecular modeling with rosetta. Annu Rev Biochem 77:363-382. Das, R., Baker, D., 2008b. Macromolecular modeling with Rosetta. Annu Rev Biochem 77:363-382. De Vries, S.J., van Dijk, A.D.J., Krzeminski, M., van Dijk, M., Thureau, A., Hsu, V., Wassenaar, T., Bonvin, A.M.J.J., 2007. HADDOCK versus HADDOCK: New features and performance of HADDOCK2.0 on the CAPRI targets. Proteins 69:726-733. Dill, K.A., Ozkan, S.B., Shell, M.S., Weikl, T.R., 2008. The protein folding problem. Ann Rev Biophys 37:289-316. Dirick, L., Moll, T., Auer, H., Nasmyth, K., 1992. A Central Role for Swi6 in Modulating Cell-Cycle Start-Specific Transcription in Yeast. Nature 357:508-513. Dominguez, C., Boelens, R., Bonvin, A.M.J.J., 2003. HADDOCK: A protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc 125:1731-1737. Feng, Y.P., Kloczkowski, A., Jernigan, R.L., 2010. Potentials 'R'Us web-server for protein energy estimations with coarse-grained knowledge-based potentials. Bmc Bioinformatics 11:-. Fink, A.L., 2005. Natively unfolded proteins. Curr Opin Struct Biol 15:35-41. Hearst, M.A., 1998. Support vector machines. Ieee Intell Syst App 13:18-21. Holbrook, S.R., Muskal, S.M., Kim, S.H., 1990. Predicting surface exposure of amino acids from protein sequence. Protein Eng 3:659-665. Hsu, C.M., Chen, C.Y., Liu, B.J., 2006. MAGIIC-PRO: detecting functional signatures by efficient discovery of long patterns in protein sequences. Nucleic Acids Res 34:W356-W361. Kabsch, W., Sander, C., 1983. Dictionary of Protein Secondary Structure - Pattern-Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22:2577-2637. Kim, H., Park, H., 2003. Protein secondary structure prediction based on an improved support vector machines approach. Protein Eng. 16:553-560. Koike, A., Takagi, T., 2004. Prediction of protein-protein interaction sites using support vector machines. Protein Eng Des Sel 17:165-173. Kotelnikova, E., Kalinin, A., Yuryev, A., Maslov, S., 2007. Prediction of Protein-protein Interactions on the Basis of Evolutionary Conservation of Protein Functions. Evol Bioinform:197-206. Kuznetsov, I.B., Gou, Z.K., Li, R., Hwang, S.W., 2006. Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64:19-27. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., Higgins, D.G., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-2948. Lichtarge, O., Bourne, H.R., Cohen, F.E., 1996. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342-358. Lichtarge, O., Sowa, M.E., 2002. Evolutionary predictions of binding surfaces and interactions. Curr Opin Struc Biol 12:21-27. Luscombe, N.M., Austin, S.E., Berman, H.M., Thornton, J.M., 2000. An overview of the structures of protein-DNA complexes. Genome Biol 1:REVIEWS001. Mika, S., Rost, B., 2006. Protein-protein interactions more conserved within species than across species. Plos Comput Biol 2:698-709. Mocchegiani, E., Costarelli, L., Giacconi, R., Cipriano, C., Muti, E., Malavolta, M., 2006. Zinc-binding proteins (metallothionein and alpha-2 macroglobulin) and immunosenescence. Exp Gerontol 41:1094-1107. Nooren, I.M., Thornton, J.M., 2003. Structural characterisation and functional significance of transient protein-protein interactions. J Mol Biol 325:991-1018. Ofran, Y., Mysore, V., Rost, B., 2007. Prediction of DNA-binding residues from sequence. Bioinformatics 23:I347-I353. Ofran, Y., Rost, B., 2003. Analysing six types of protein-protein interfaces. J Mol Biol 325:377-387. Ofran, Y., Rost, B., 2007. ISIS: interaction sites identified from sequence. Bioinformatics 23:E13-E16. Plaxco, K.W., Simons, K.T., Baker, D., 1998. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol 277:985-994. Ptashne, M., 2005. Regulation of transcription: from lambda to eukaryotes. Trends Biochem Sci 30:275-279. Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., Dimaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B.H., Das, R., Grishin, N.V., Baker, D., 2009. Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins. Rohl, C.A., Strauss, C.E.M., Misura, K.M.S., Baker, D., 2004. Protein structure prediction using rosetta. Numerical Computer Methods, Pt D 383:66-+. Sayle, R.A., Milnerwhite, E.J., 1995. Rasmol - Biomolecular Graphics for All. Trends Biochem Sci 20:374-376. Schueler-Furman, O., Baker, D., 2003. Conserved residue clustering and protein structure prediction. Proteins-Structure Function and Genetics 52:225-235. Shindyalov, I.N., Bourne, P.E., 1998. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11:739-747. Skolnick, J., Jaroszewski, L., Kolinski, A., Godzik, A., 1997. Derivation and testing of pair potentials for protein folding. When is the quasichemical approximation correct? Protein Sci 6:676-688. Skolnick, J., Kolinski, A., Ortiz, A., 2000. Derivation of protein-specific pair potentials based on weak sequence fragment similarity. Proteins 38:3-16. Spriggs, R.V., Murakami, Y., Nakamura, H., Jones, S., 2009. Protein function annotation from sequence: prediction of residues interacting with RNA. Bioinformatics 25:1492-1497 1495. Su, C.T., Chen, C.Y., Hsu, C.M., 2007. iPDA: integrated protein disorder analyzer. Nucleic Acids Res 35:W465-472. Tjong, H., Zhou, H.X., 2007. DISPLAR: an accurate method for predicting DNA-binding sites on protein surfaces. Nucleic Acids Res 35:1465-1477. Unger, R., Moult, J., 1993. Finding the Lowest Free-Energy Conformation of a Protein Is an Np-Hard Problem - Proof and Implications. B Math Biol 55:1183-1198. Wang, G.L., Dunbrack, R.L., 2003. PISCES: a protein sequence culling server. Bioinformatics 19:1589-1591. Wang, L.J., Brown, S.J., 2006. BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34:W243-W248. Wright, P.E., Dyson, H.J., 1999. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 293:321-331. Wu, S.T., Skolnick, J., Zhang, Y., 2007. Ab initio modeling of small proteins by iterative TASSER simulations. Bmc Biol 5:-. Yan, C.H., Terribilini, M., Wu, F.H., Jernigan, R.L., Dobbs, D., Honavar, V., 2006. Predicting DNA-binding sites of proteins from amino acid sequence. Bmc Bioinformatics 7:-. Yang, H., Shi, O., Tian, X., 2008. Combining Physico-chemical Properties with PSSM for Protein Secondary Structure Prediction Using BP Neural Network. Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics - Volume 01. IEEE Computer Society, pp. 107-110. Zhang, Y., 2007. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins 69:108-117. Zhang, Y., 2008. I-TASSER server for protein 3D structure prediction. Bmc Bioinformatics 9:-.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/47132	-
dc.description.abstract	蛋白質與DNA交互作用發生於許多基本生化作用中，例如，基因表現之調控與DNA修復。我們可以藉由蛋白質與DNA之共同結晶結構，亦即複合體結構，理解它們如何相互作用，但是蛋白質結晶需要透過昂貴且費時的實驗才能得到，以致於這些知識非常有限。另一方面，由於基因及蛋白質定序技術的精進，大量的一級結構資訊被解出，在已知會與DNA鍵結的蛋白質當中，序列資訊為複合體結構資訊的數十倍之多。因此，本研究旨在藉由序列及結構分析工具建構蛋白質與DNA交互作用模型，也就是說，藉由蛋白質與DNA的序列資訊模擬其互動模式。我們將這個問題切割為兩個小主題：一為有系統地利用蛋白質序列預測其三級結構，二為藉由預測之蛋白質結構模型，建構蛋白質與DNA之互動模式。實驗結果顯示從頭開始結構預測(de novo structure prediction)軟體，Rosetta，可以準確地預測出蛋白質三級結構；也就是說，在Rosetta產生大量的結構模型之後，我們可根據序列為基礎的預測 RSA (relative solvent accessibility) 和結構模型的RSA的相關係數，搭配以統計為基礎的能量計算公式，挑選出貼近原始的結構。除此之外，當現有的結構中僅有序列相似度較低的結構可作為模板時，以模板為基礎的結構預測法(template-based modeling)可能無法進行預測；然而，從頭開始結構預測法，對於所有的蛋白質序列都能產生預測結果，且其準確度並不遜於以模板為基礎的結構預測法；因此，在模板與欲預測蛋白質序列相似度較低的情況下，從頭開始結構預測法會是更好的選擇。最後，當利用預測的蛋白質結構透過結構比對，或嵌合演算法模擬其與DNA之互動模式時，若蛋白質模型和真實結構的相似度越高，則其模擬的結果將更準確；而結構比對所建立的互動模式，準確度優於嵌合演算法。總而言之，在缺乏複合體結構資訊的情形下，本研究提出之流程可依據序列資訊，建構出蛋白質與DNA交互作用之模型，對於預測蛋白質-DNA之結合將有莫大幫助。	zh_TW
dc.description.abstract	Protein-DNA interaction plays an important role in many fundamental bio-chemical activities, for example, gene regulation and DNA repair. Researchers can understand how protein and DNA interact by examining available co-crystallized structures. However, such knowledge is very scarce because experimentally determining atom-level structure models of protein-DNA complexes requires expensive and time-consuming processes. On the contrary, due to recent advances in whole-genome sequencing technology, the sequence information of known DNA-binding proteins is much more than the number of protein-DNA tertiary complexes. Therefore, this study aims at constructing protein-DNA interaction models by integrating a number of in silico analyses based on sequences and predicted structures, i.e., creating the interaction models from sequences of proteins and DNA. This problem can be segmented into two sub-topics, both concerning tertiary structures: to predict protein tertiary structure in a systematic way and to construct predicted protein-DNA complexes. We use Rosetta to generate ten thousand decoys and select close-to-native protein structures from them. In addition, the protein-DNA complexes are predicted by the docking method, HADDOCK, or the template-based method, DBD2BS. Our results demonstrate that the protein structure can be predicted by de novo structure prediction for DNA-binding domains of small sizes. To be specific, after creating plenty of decoys by Rosetta, close-to-native structures can be selected by combination of correlation coefficient of sequence-based predicted RSA, decoy’s RSA, and one of knowledge-based energy scores. In addition, the performance of structural models created by de novo structure prediction is better than template-based modeling when only distant templates are available. All of the query proteins have prediction results of de novo structure prediction, while only proteins with templates which are similar to the query can be predicted by template-based methods. When both approaches deliver predictions, the qualities of modeling are similar. Furthermore, the accuracy of protein-DNA interacting models constructed by structure alignment is better than those predicted by docking tools when close-to-native protein structures are available. In summary, this study concludes that it is possible to construct the interaction model of protein and DNA even in the absence of co-crystallized structure.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T05:48:29Z (GMT). No. of bitstreams: 1 ntu-99-R97631015-1.pdf: 945668 bytes, checksum: 681c904854a3154b46f951ddab413844 (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	致謝 i 中文摘要 iii Abstract v Table of Contents vii Table of Figures xi Table of Tables xv Chapter 1. Introduction 1 Chapter 2. Literature Review 3 2.1. Transcriptional Regulation 3 2.2. Position-Specific Scoring Matrix (PSSM) 5 2.3. Physical and Chemical Properties of Proteins 6 2.3.1. Amino Acids in Proteins 6 2.3.2. Solvent Accessibility 7 2.3.3. Four Levels of Protein Structure 7 2.4. Related Works of Identifying Interface Residues of Proteins 8 2.4.1. Prediction of DNA-binding sites on protein sequences 9 2.4.2. Prediction of Protein-Protein Interaction Sites 13 2.4.3. Prediction of Tertiary Structure Using Rosetta 15 Chapter 3. Materials and Methods 19 3.1. Dataset 21 3.2. Protein Structures Prediction from Sequences 23 3.2.1. Rosetta Setup 24 3.2.2. Correlation Coefficient of Decoy’s RSA and Sequence-Based Predicted RSA 27 3.2.3. Degree of Clustering of Subset Measurement 28 3.2.4. MAGIIC-PRO Patterns 28 3.2.5. Knowledge-based Potential for Proteins 30 3.2.6. Assessment on Decoy Selection 30 3.3. Predicting DNA-binding Residues from Structural Models 31 3.4. Constructing Protein-DNA Complexes 32 Chapter 4. Results and Discussion 33 4.1. Decoy Selection According to Four Different Indexes 33 4.2. Comparison between De Novo Structure Prediction and Template-based Modeling 38 4.3. Docking of DNA and Protein Structure Models 41 4.4. Protein-DNA Interacting Model Construction by DBD2BS 43 Chapter 5. Conclusion 47 5.1. Close-to-native Structure Models Can Be Discovered by Two Stage Decoy Discrimination 47 5.2. De Novo Structure Prediction Performs as Good as Template-based Modeling When Only Distant Templates are Available 48 5.3. Protein-DNA Interaction Models Constructed by HADDOCK and DBD2BS 49 Reference 51 Appendix 57
dc.language.iso	en
dc.subject	蛋白質與DNA交互作用	zh_TW
dc.subject	蛋白質結構預測	zh_TW
dc.subject	從頭開始結構預測法	zh_TW
dc.subject	以模板為基礎的結構預測法	zh_TW
dc.subject	可接觸溶劑之面積	zh_TW
dc.subject	結構模型篩選	zh_TW
dc.subject	decoy discrimination	en
dc.subject	template-based modeling	en
dc.subject	de novo structure prediction	en
dc.subject	protein structure prediction	en
dc.subject	Protein-DNA interaction	en
dc.subject	solvent accessibility	en
dc.title	以序列為基礎建構小型DNA結合區域之蛋白質與DNA交互作用模型	zh_TW
dc.title	Modeling protein-DNA interactions from sequences for small DNA-binding domains	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	楊?伸(Chii-Shen Yang),林守德(Shou-De Lin),蔡懷寬(Huai-Kuang Tsai)
dc.subject.keyword	蛋白質與DNA交互作用,蛋白質結構預測,從頭開始結構預測法,以模板為基礎的結構預測法,可接觸溶劑之面積,結構模型篩選,	zh_TW
dc.subject.keyword	Protein-DNA interaction,protein structure prediction,de novo structure prediction,template-based modeling,solvent accessibility,decoy discrimination,	en
dc.relation.page	59
dc.rights.note	有償授權
dc.date.accepted	2010-08-19
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	生物產業機電工程學研究所	zh_TW
顯示於系所單位：	生物機電工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	923.5 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。