建立以機器學習演算法為基礎之評分函數預測蛋白質與DNA結合之親和力

Chien-Ho Chao; 趙健合

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39498

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳倩瑜(Chien-Yu Chen)
dc.contributor.author	Chien-Ho Chao	en
dc.contributor.author	趙健合	zh_TW
dc.date.accessioned	2021-06-13T17:30:03Z	-
dc.date.available	2016-08-22
dc.date.copyright	2011-08-22
dc.date.issued	2011
dc.date.submitted	2011-08-20
dc.identifier.citation	Arnold, K., L. Bordoli, J. Kopp and T. Schwede 2006. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22: 195-201. doi: bti770 [pii] 10.1093/bioinformatics/bti770 Ballester, P. J. and J. B. Mitchell 2010. A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26: 1169-1175. doi: btq112 [pii] 10.1093/bioinformatics/btq112 Ben-Gal, I., A. Shani, A. Gohr, J. Grau, S. Arviv, A. Shmilovici, S. Posch and I. Grosse 2005. Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21: 2657-2666. doi: bti410 [pii] 10.1093/bioinformatics/bti410 Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne 2000. The Protein Data Bank. Nucleic Acids Res 28: 235-242. doi: gkd090 [pii] Bohm, H. J. 1994. The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein Ligand Complex of Known 3-Dimensional Structure. Journal of Computer-Aided Molecular Design 8: 243-256. Breiman, L. 2001. Random forests. Machine Learning 45: 5-32. Chang, Chih-Chung and Chih-Jen Lin 2011. LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: 27:21--27:27. Denison, D. G. T., B. K. Mallick and A. F. M. Smith 1998. A Bayesian CART algorithm. Biometrika 85: 363-377. Dirick, L., T. Moll, H. Auer and K. Nasmyth 1992. A central role for SWI6 in modulating cell cycle Start-specific transcription in yeast. Nature 357: 508-513. doi: 10.1038/357508a0 Eldridge, M. D., C. W. Murray, T. R. Auton, G. V. Paolini and R. P. Mee 1997. Empirical scoring functions .1. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. Journal of Computer-Aided Molecular Design 11: 425-445. Gohlke, H., M. Hendlich and G. Klebe 2000. Knowledge-based scoring function to predict protein-ligand interactions. Journal of Molecular Biology 295: 337-356. Ho, T. K. 1998. The random subspace method for constructing decision forests. Ieee Transactions on Pattern Analysis and Machine Intelligence 20: 832-844. Huang, N., C. Kalyanaraman, K. Bernacki and M. P. Jacobson 2006. Molecular mechanics methods for predicting protein-ligand binding. Physical Chemistry Chemical Physics 8: 5166-5177. doi: Doi 10.1039/B608269f Ishchenko, A. V. and E. I. Shakhnovich 2002. SMall molecule growth 2001 (SMoG2001): An improved knowledge-based scoring function for protein-ligand interactions. Journal of Medicinal Chemistry 45: 2770-2780. doi: Doi 10.1021/Jm0105833 Kitchen, D. B., H. Decornez, J. R. Furr and J. Bajorath 2004. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nature Reviews Drug Discovery 3: 935-949. doi: Doi 10.1038/Nrd1549 Kramer, C. and P. Gedeck 2010. Leave-Cluster-Out Cross-Validation Is Appropriate for Scoring Functions Derived from Diverse Protein Data Sets. Journal of Chemical Information and Modeling 50: 1961-1969. doi: Doi 10.1021/Ci100264e Luscombe, N. M., S. E. Austin, H. M. Berman and J. M. Thornton 2000. An overview of the structures of protein-DNA complexes. Genome Biol 1: REVIEWS001. Scholkopf, B., J. C. Platt, J. Shawe-Taylor, A. J. Smola and R. C. Williamson 2001. Estimating the support of a high-dimensional distribution. Neural Computation 13: 1443-1471. Scholkopf, B., A. J. Smola, R. C. Williamson and P. L. Bartlett 2000. New support vector algorithms. Neural Computation 12: 1207-1245. Svetnik, V., A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan and B. P. Feuston 2003. Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences 43: 1947-1958. Velec, H. F. G., H. Gohlke and G. Klebe 2005. DrugScore(CSD)-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. Journal of Medicinal Chemistry 48: 6296-6303. doi: Doi 10.1021/Jm050436v Wang, R., X. Fang, Y. Lu and S. Wang 2004. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. Journal of Medicinal Chemistry 47: 2977-2980. doi: 10.1021/jm030580l Wang, R., X. Fang, Y. Lu, C. Y. Yang and S. Wang 2005. The PDBbind database: methodologies and updates. Journal of Medicinal Chemistry 48: 4111-4119. doi: 10.1021/jm048957q Wang, R., L. Lai and S. Wang 2002. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 16: 11-26. Watson, J. D. and F. H. Crick 1953. The structure of DNA. Cold Spring Harb Symp Quant Biol 18: 123-131. Xu, B., Y. Yang, H. Liang and Y. Zhou 2009. An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins-Structure Function and Bioinformatics 76: 718-730. doi: 10.1002/prot.22384 Yuriev, E., M. Agostino and P. A. Ramsland 2010. Challenges and advances in computational docking: 2009 in review. J Mol Recognit. doi: 10.1002/jmr.1077 Zhang, C., S. Liu, Q. Q. Zhu and Y. Q. Zhou 2005. A knowledge-based energy function for protein-ligand, protein-protein, and protein-DNA complexes. Journal of Medicinal Chemistry 48: 2325-2335. doi: Doi 10.1021/Jm049314d 陳艾彌。2010。以序列為基礎建構蛋白質與 DNA 交互作用模型之研究。碩士論文。台北：台灣大學生物產業機電工程學研究所。
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/39498	-
dc.description.abstract	蛋白質是維持生命的重要物質，在生物體內，蛋白質與DNA之結合牽引著許多生化反應與活動，如轉錄因子與特定DNA之結合，可開啟特定基因之轉錄活動。因此長久以來，蛋白質與DNA之間的互動一直是生物學家們所爭相研究的對象，近年來，由於電腦科技與計算能力之發展與進步，生物學家與統計學家們利用電腦程式之計算與彙整能力，逐步輔助傳統生物實驗之研究，而其中，預測蛋白質與其他生物單元如蛋白質、小分子、甚至DNA之互動之親和力一直是備受關注的主題，近年來也有許多針對此議題之研究，開發許多不同種類的親和力預測之評分函數，其中以機器學習演算法為基礎之評分函式，近幾年在預測蛋白質與小分子結合之親和力這個問題上，皆得到不錯的成效。本篇論文嘗試以機器學習演算法為基礎，設計能預測蛋白質與DNA結合親和力之評分函數，此研究篩選高品質的蛋白質與DNA複合物結構與實驗所得之親和力資訊作為本篇論文之材料來源，建構以知識庫搭配機器學習演算法為基礎之評分函數。實驗結果顯示，使用隨機森林為基礎之分類方法，在預測蛋白質與DNA結合親和力之問題上，亦可得到良好的預測結果；本論文同時也引入不同種類的特徵擷取方式，並討論其對預測結果之影響，期待能對生物巨分子之間結合親和力之評分函數開發等研究議題有所貢獻。	zh_TW
dc.description.abstract	Proteins and DNA play important roles to maintaining life in living cells. The binding of protein to specific DNA sequences is the beginning of lots of bio-activities. For instance, the binding of regulatory sites of DNA by transcription factors, which are a kind of proteins that trigger transcription of a particular gene, initiates the transcription process. Research on this issue could facilitate the studies of gene regulation and regulatory networks. For these reasons, the study of interactions between protein and DNA has attracted much attention for a long time. Recently, with the advances of computer technology and algorithm development, developing computational methods to predict binding affinity of protein-protein, protein-ligand and even protein-DNA interactions has been largely considered recently. Some of the scoring functions for predicting protein-ligand are shown to perform well on this challenge. In this thesis, a machine learning-based scoring function was developed to predict the binding affinity of protein-DNA interactions. For this purpose, a high-quality dataset containing the information of binding affinity associated with a protein-DNA complex was collected from PDBbind. The performance of the proposed method was compared with existing scoring functions, and it is concluded that the proposed machine learning-based scoring function perfrom well in predicting the binding affinities of protein-DNA complexes and can benefit future studies on this problem.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T17:30:03Z (GMT). No. of bitstreams: 1 ntu-100-R98631042-1.pdf: 4812277 bytes, checksum: 8365fb1e8a65d67f594a602377e2adc4 (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	中文摘要 i Abstract ii Table of Content iv Table of Figures vii Table of Tables ix Chapter 1. Introduction 1 Chapter 2. Literature Review 3 2-1. Proteins and Nucleic Acids 4 2-2. Transcription Factors and TFBSs 5 2-3. Protein Data Bank (PDB) 6 2-4. PDBbind Database 7 2-5. Binding Affinity Prediction and Scoring Functions 8 2-5-1. Empirical Scoring Functions 9 2-5-2. Force Field Scoring Functions 10 2-5-3. Knowledge-based Scoring Functions 11 2-6. Random Forest 12 Chapter 3. Materials and Methods 14 3-1. Dataset preparation 14 3-2. Features preparation 17 3-3. Random Forest Score (RF-Score) 21 3-4. Comparison of Other Scoring Functions 21 Chapter 4. Results and Discussion 24 4-1. Binding Affinity Prediction of protein-ligand Interactions 24 4-2. Binding affinity prediction of protein-DNA interactions 26 4-2-1. Training by Protein-ligand Data 27 4-2-2. Training by Protein-DNA Data 29 4-3. Comparison with Other Scoring Functions 35 4-3-1. X-Score 35 4-3-2. Support Vector Machine (SVM) 37 4-4. Importance of Features 39 4-5. Position Weight Matrix Prediction 46 Chapter 5. Conclusion 51 References 54
dc.language.iso	en
dc.title	建立以機器學習演算法為基礎之評分函數預測蛋白質與DNA結合之親和力	zh_TW
dc.title	Predicting Binding Affinity of Protein-DNA Interactions Using Machine Learning-based Scoring Functions	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	楊健志(Chien-Chih Yang),徐駿森(Chun-Hua Hsu),張天豪(Tien-Hao Chang)
dc.subject.keyword	蛋白質與DNA交互作用,評分函數,隨機森林,親和力預測,	zh_TW
dc.subject.keyword	protein-DNA interaction,scoring function,random forest,binding affinity prediction,	en
dc.relation.page	56
dc.rights.note	有償授權
dc.date.accepted	2011-08-20
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	生物產業機電工程學研究所	zh_TW
顯示於系所單位：	生物機電工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	4.7 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。