發展類別最佳化之高解析度之配體受體交互作用評分函數

Zhong-Wei Zhang; 章仲偉

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32188

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林榮信(Jung-Hsin Lin)
dc.contributor.author	Zhong-Wei Zhang	en
dc.contributor.author	章仲偉	zh_TW
dc.date.accessioned	2021-06-13T03:35:48Z	-
dc.date.available	2008-08-03
dc.date.copyright	2006-08-03
dc.date.issued	2006
dc.date.submitted	2006-07-26
dc.identifier.citation	References 1. Morris, G.M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry 19, 1639-1662 (1998). 2. Warren, G.L. et al. Critical assessment of docking programs and scoring functions. Abstracts of Papers of the American Chemical Society 228, U513-U514 (2004). 3. Böhm, H.J. The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein Ligand Complex of Known 3-Dimensional Structure. Journal of Computer-Aided Molecular Design 8, 243-256 (1994). 4. Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. Journal of Molecular Biology 267, 727-748 (1997). 5. Alvarez, J.C. High-throughput docking as a source of novel drug leads. Current Opinion in Chemical Biology 8, 365-370 (2004). 6. Jorgensen, W.L. The many roles of computation in drug discovery. Science 303, 1813-1818 (2004). 7. Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nature Reviews Drug Discovery 3, 935-949 (2004). 8. Shoichet, B.K. Virtual screening of chemical libraries. Nature 432, 862-865 (2004). 9. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Research 28, 235-242 (2000). 10. Bohm, H.J. The Development of a Simple Empirical Scoring Function to Estimate the Binding Constant for a Protein Ligand Complex of Known 3-Dimensional Structure. Journal of Computer-Aided Molecular Design 8, 243-256 (1994). 11. Head, R.D. et al. VALIDATE: A new method for the receptor-based prediction of binding affinities of novel ligands. Journal of the American Chemical Society 118, 3959-3969 (1996). 12. Eldridge, M.D., Murray, C.W., Auton, T.R., Paolini, G.V. & Mee, R.P. Empirical scoring functions .1. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. Journal of Computer-Aided Molecular Design 11, 425-445 (1997). 13. Bohm, H.J. Prediction of binding constants of protein ligands: A fast method for the prioritization of hits obtained from de novo design or 3D database search programs. Journal of Computer-Aided Molecular Design 12, 309-323 (1998). 14. Wang, R.X., Liu, L., Lai, L.H. & Tang, Y.Q. SCORE: A new empirical method for estimating the binding affinity of a protein-ligand complex. Journal of Molecular Modeling 4, 379-394 (1998). 15. Muegge, I. & Martin, Y.C. A general and fast scoring function for protein-ligand interactions: A simplified potential approach. Journal of Medicinal Chemistry 42, 791-804 (1999). 16. Mitchell, J.B.O., Laskowski, R.A., Alex, A., Forster, M.J. & Thornton, J.M. BLEEP - Potential of mean force describing protein-ligand interactions: II. Calculation of binding energies and comparison with experimental data. Journal of Computational Chemistry 20, 1177-1185 (1999). 17. Mitchell, J.B.O., Laskowski, R.A., Alex, A. & Thornton, J.M. BLEEP - Potential of mean force describing protein-ligand interactions: I. Generating potential. Journal of Computational Chemistry 20, 1165-1176 (1999). 18. Gohlke, H., Hendlich, M. & Klebe, G. Knowledge-based scoring function to predict protein-ligand interactions. Journal of Molecular Biology 295, 337-356 (2000). 19. Kellogg, G.E. et al. Getting it right: modeling of pH, solvent and 'nearly' everything else in virtual screening of biological targets. Journal of Molecular Graphics & Modelling 22, 479-486 (2004). 20. Ishchenko, A.V. & Shakhnovich, E.I. SMall molecule growth 2001 (SMoG2001): An improved knowledge-based scoring function for protein-ligand interactions. Journal of Medicinal Chemistry 45, 2770-2780 (2002). 21. Rarey, M., Kramer, B., Lengauer, T. & Klebe, G. A fast flexible docking method using an incremental construction algorithm. Journal of Molecular Biology 261, 470-489 (1996). 22. Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. Journal of Molecular Biology 267, 727-748 (1997). 23. Ewing, T.J.A., Makino, S., Skillman, A.G. & Kuntz, I.D. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. Journal of Computer-Aided Molecular Design 15, 411-428 (2001). 24. Krammer, A., Kirchhoff, P.D., Jiang, X., Venkatachalam, C.M. & Waldman, M. LigScore: a novel scoring function for predicting binding affinities. Journal of Molecular Graphics & Modelling 23, 395-407 (2005). 25. Hendlich, M. Databases for protein-ligand complexes. Acta Crystallographica Section D-Biological Crystallography 54, 1178-1182 (1998). 26. Roche, O., Kiyama, R. & Brooks, C.L. Ligand-Protein DataBase: Linking protein-ligand complex structures to binding data. Journal of Medicinal Chemistry 44, 3592-3598 (2001). 27. Chen, X., Lin, Y.M., Liu, M. & Gilson, M.K. The Binding Database: data management and interface design. Bioinformatics 18, 130-139 (2002). 28. Puvanendrampillai, D. & Mitchell, J.B.O. L/D Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein-ligand complexes. Bioinformatics 19, 1856-1857 (2003). 29. Wang, R.X., Fang, X.L., Lu, Y.P. & Wang, S.M. The PDBbind database: Collection of binding affinities for protein-ligand complexes with known three-dimensional structures. Journal of Medicinal Chemistry 47, 2977-2980 (2004). 30. Hu, L.G., Benson, M.L., Smith, R.D., Lerner, M.G. & Carlson, H.A. Binding MOAD (Mother of All Databases). Proteins-Structure Function and Bioinformatics 60, 333-340 (2005). 31. Gehlhaar, D.K. et al. Molecular Recognition of the Inhibitor Ag-1343 by Hiv-1 Protease - Conformationally Flexible Docking by Evolutionary Programming. Chemistry & Biology 2, 317-324 (1995). 32. Kuntz, I.D. Structure-Based Strategies for Drug Design and Discovery. Science 257, 1078-1082 (1992). 33. Kuntz, I.D., Meng, E.C. & Shoichet, B.K. Structure-Based Molecular Design. Accounts of Chemical Research 27, 117-123 (1994). 34. Ewing, T.J.A. & Kuntz, I.D. Critical evaluation of search algorithms for automated molecular docking and database screening. Journal of Computational Chemistry 18, 1175-1189 (1997). 35. Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. & Ferrin, T.E. A Geometric Approach to Macromolecule-Ligand Interactions. Journal of Molecular Biology 161, 269-288 (1982). 36. Blaney, J. M.; Dixon, J. S. DockIt, version 1.0; Metaphorics, LLC: Mission Viejo, CA; www.metaphorics.com/products/dockit.html. 37. Friesner, R.A. et al. Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. Journal of Medicinal Chemistry 47, 1739-1749 (2004). 38. Halgren, T.A. et al. Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. Journal of Medicinal Chemistry 47, 1750-1759 (2004). 39. Lambert, M. H. Docking Conformationally Flexible Molecules into Protein Binding Sites. In Practical Application of Computer-Aided Drug Design; charifson, P. S., Ed.; Dekker: New York, 1997. 40. J. Gasteiger.; T. Engel. Chemoinformatics; WILEY-VCH Verlag GmbH & Co, Chapter 8; 2003. 41. Mehler, E.L. and Solmajer, T. Electrostatic Effects in Proteins: Comparison of Dielectric and Charge Models. Protein Engineering, 4, 903-910 (1991). 42. http://www.bio.mtu.edu/campbell/bl4820/lectures/lec3/482ek5.htm. 43. http://www.graphpad.com/curvefit/kinetics_vs__binding.htm. 44. Cheng, Y. & Prusoff, W.H. Relationship between Inhibition Constant (K1) and Concentration of Inhibitor Which Causes 50 Per Cent Inhibition (I50) of an Enzymatic-Reaction. Biochemical Pharmacology 22, 3099-3108 (1973). 45. SYBYL, Tripos Associates, Inc., St. Louis, MO. 46. Schuttelkopf, A.W. & van Aalten, D.M.F. PRODRG: a tool for high-throughput crystallography of protein-ligand complexes. Acta Crystallographica Section D-Biological Crystallography 60, 1355-1363 (2004). 47. The advances presented for the first time in Gaussian 03 are the work of M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, A. D. Daniels, O. Farkas, A. D. Rabuck, K. Raghavachari and J. V. Ortiz. 48. D.A. Case, T.A. Darden, T.E. Cheatham, III, C.L. Simmerling, J. Wang, R.E. Duke, R. Luo, K.M. Merz, B. Wang, D.A. Pearlman, M. Crowley, S. Brozell, V. Tsui, H. Gohlke, J. Mongan, V. Hornak, G. Cui, P. Beroza, C. Schafmeister, J.W. Caldwell, W.S. Ross, and P.A. Kollman (2004), AMBER 8, University of California, San Francisco. 49. Baker, N.A., Sept, D., Joseph, S., Holst, M.J. & McCammon, J.A. Electrostatics of nanosystems: Application to microtubules and the ribosome. Proceedings of the National Academy of Sciences of the United States of America 98, 10037-10041 (2001). 50. Dolinsky, T.J., Nielsen, J.E., McCammon, J.A. & Baker, N.A. PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Research 32, W665-W667 (2004). 51. http://www.r-project.org/index.html 52. Mevik, B.H. & Cederkvist, H.R. Mean squared error of prediction (MSEP) estimates for principal component regression (PCR) and partial least squares regression (PLSR). Journal of Chemometrics 18, 422-429 (2004). 53. Bozdogan, H. Akaike's information criterion and recent developments in information complexity. Journal of Mathematical Psychology 44, 62-91 (2000). 54. Hawkins, D.M. The problem of overfitting. Journal of Chemical Information and Computer Sciences 44, 1-12 (2004). 55. Peduzzi, P., Concato, J., Feinstein, A.R. & Holford, T.R. Importance of events per independent variable in proportional hazards regression analysis .2. Accuracy and precision of regression estimates. Journal of Clinical Epidemiology 48, 1503-1510 (1995). 56. Babyak, M.A. What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic Medicine 66, 411-421 (2004). 57. Kua, J., Zhang, Y.K. & McCammon, J.A. Studying enzyme binding specificity in acetylcholinesterase using a combined molecular dynamics and multiple docking approach. Journal of the American Chemical Society 124, 8260-8267 (2002). 58. Houle, D., Mezey, J. & Galpern, P. Interpretation of the results of common principal components analyses. Evolution 56, 433-440 (2002). 59. Lima, C.T. et al. Concurrent and construct validity of the AUDIT in an urban Brazilian sample. Alcohol and Alcoholism 40, 584-589 (2005).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32188	-
dc.description.abstract	發展類別最佳化之高解析度之配體受體交互作用評分函數中文摘要: 電腦模擬在現代的藥物發展設計已經成為不可或缺的一環，其對於先導化合物的最適化亦佔有相當重要的地位。電腦在新藥物設計與發展最廣泛的應用即為虛擬藥物篩選和分子嵌合。一個成功的分子嵌和實驗必須具備三要素：即，一具有相當分子數目的化學空間（一般稱為資料庫）、一套有效的搜尋運算法、一個準確的配體受體交互作用評分函數。在電腦計算的幫助下，除了可縮短研發針對具有潛力的標的蛋白質的藥物傳統上所需的時間，更能有效地節省龐大的研發經費。然而，目前雖然已有相當數量的虛擬藥物篩選和分子嵌合功能軟體以及化學分子資料庫，但是現有的軟體和模擬工具的預測準確性和適用性似乎仍然無法滿足我們的需求。設計配體受體交互作用評分函數時所使用的樣本數目不足、搜尋演算法的效能不夠，亦或是配體受體交互作用評分函數本身的準確性和適用性有待改善，都可能是造成計算預測準確度不佳的因素。許多已發表的比較虛擬藥物篩選和分子嵌合功能軟體的研究結果指出，目前並沒有任何一個軟體所預測的結果的判定係數(R平方)可以到達0.9，而平均平方根誤差(RMSE)可以到達2千卡每莫耳的水準。現在，我們建立了一個含有869個配體受體錯合物的化學資料庫(PLID)，並以其所提供的實驗抑制常數資料研究和設計新一代的配體受體交互作用評分函數。替代傳統設計單一廣泛適用配體受體交互作用評分函數的作法，我們採用了新的策略，那就是發展『類別最佳化』之高解析度之配體受體交互作用評分函數。以分子嵌合軟體AutoDock_3.05為出發點，我們藉由以資料庫中的607個配體受體錯合物為分析母群體發展了類別最佳化的配體受體交互作用評分函數。將資料庫中的配體受體錯合物經過了分類的步驟後，各類別的判定係數(R平方）皆可到達0.9的水準，而平均平方根誤差(RMSE)亦到達2千卡每莫耳的層次，相較於AutoDock_3.05在0.8左右的判定係數以及3千卡每莫耳的平均平方根誤差，我們的配體受體交互作用評分函數預測性明顯優於AutoDock_3.05。我們相信隨著我們所建立的配體受體錯合物的化學資料庫的擴大，用此類別最佳化發展策略的配體受體交互作用評分函數的預測能力和適用性將會有更長足的進步。	zh_TW
dc.description.abstract	Abstract: In silico computer-aided experiment has become a standard precursory step in the realm of nowadays drug discovery, design, and development. It has also been an essential tool for parent compounds refinement. Virtual screening and docking of drug candidates to potential target proteins are the most popular applications in drug discovery for computer-aided drug design. Successful docking must be comprised of three elements: a large enough chemical space of drug candidates (more often called databases), an efficient searching algorithm, and an accurate scoring function. With the aid of computation derivative knowledge about drug candidates and potential target proteins can not only accelerate the duration that drug discovery needs traditionally but also reduce the costs for new drug development. Although there have been many published or commercial databases and docking or virtual screening programs, there seemed to be no single software that could achieve satisfactory predictability and universal applicability. One reason could be that the scale of the training sets used for developing scoring functions is not enough, and the second reason maybe caused by the low efficiency searching algorithm, and the last one maybe due to the scoring functions themselves are not accurate or universal enough. The evaluations of scoring functions implemented in available academic free or commercial only docking programs have been a popular issue, and results have shown that there is no single program could reach the level of R-squared = 0.9 and RMSE = 2 kcal/mol2-4. We have constructed a database named PLID (for Protein-Ligand Information Database), which contained 869 protein-ligand complexes with known experimentally determined binding data (inhibitory or dissociation constants). Instead of searching for a “universally” applicable scoring function, we developed a new strategy for the “class-optimized” scoring functions design. By using AutoDock_3.05 scoring function as the starting point, we have constructed a suite of scoring functions by employing 607 protein-ligand complexes in PLID as the training set. After clustering all complexes into three classes, each class’s adjusted R-square value could reach 0.9 as compared to 0.8 or 0.7 of AutoDock_3.05 calculated, while the RMSE value could also be limited at 2 kcal/mol level, as compared to 3 kcal/mol of AutoDock_3.05 did. We believe that as binding data in PLID increase by time, the predictability of our “class-optimized” strategy will be continuously improved.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T03:35:48Z (GMT). No. of bitstreams: 1 ntu-95-R93423017-1.pdf: 2616575 bytes, checksum: e945e39059fb7b051803b45f2422c1c9 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Table of Contents Table of Contents ix Figure List xiii Table List xvii 中文摘要 xix Abstract xxi Chapter 1: Introductions 2 1.1 Drug Design, Virtual Screening, and Docking 2 1.2 Prediction Free Energy upon Binding by Protein-Ligand Complex 3 1.3 In silico Docking Importance 4 1.4 Databases with Known Protein-Ligand Binding Data 5 1.5 Docking Programs and Scoring Functions 8 1.6 From Structure to Activity: Molecular Descriptors 9 1.7 The Scoring Function of Autodock_3.05 12 1.8 Calculation of Ki Values 14 Chapter 2: Materials and Methods 18 2.1 PLID (Protein-Ligand Information Database) 18 2.2 Protein-Ligand File Preparation/Processing 21 2.2.1 Ligand Files Preparation/Processing 21 2.2.2 Protein Files Preparation/Processing 22 2.2.3 Complex Files Preparation for Minimization 23 2.2.4 Workflows 23 2.3 AutoDock_3.05 26 2.4 ProDrg 27 2.5 Other Programs Used in This Work 28 2.6 Molecular Descriptors Collection/Calculation 29 2.7 Statistical Analysis 31 2.7.1 Linear Regression 31 2.7.2 Partial Least Squares (PLS)52 32 2.7.3 Akaike’s Information Criterion (AIC) 35 2.7.4 Leave-One-Out Cross-Validation 36 2.7.5 Model Refinement and Overfitting Problem 37 2.7.6 5-Fold Cross-Validation 38 2.8 Decision Function for Clustering 38 Chapter 3: Results 40 3.1 Data Mining and Protein-Ligand Complexes Collection 40 3.2 Grid Parameter Test 41 3.3 Evaluation of the AutoDock_3.05 Scoring Function 42 3.4 Protein-Ligand Preparation/Processing Procedures Comparison 46 3.4.1 Processing Procedure 1 46 3.4.2 Processing Procedure 2 51 3.4.3 Processing Procedure 3 55 3.4.4 Processing Procedure 4 56 3.5 Linear Regression Models 56 3.6 Comparison of the Predictive Free Energies upon Binding 62 3.7 Metal Ions Involved Protein-Ligand Complexes Test 66 3.7.1 Non-Transition Metal Involved Protein-Ligand Complexes 67 3.7.2 Transition Metal Involved Protein-Ligand Complexes 68 3.7.3 Protein-Ligand Complexes without Metal Ion Involved 68 3.8 Clustering of the 607 Protein-Ligand Complexes 68 3.8.1 Hydrophobic Class 73 3.8.2 Mixed Class 78 3.8.3 Hydrophilic Class 83 3.9 5-Fold Cross-Validation 88 3.9.1 5-Fold Cross-Validation Result of the “Hydrophobic” Class 88 3.9.2 5-Fold Cross-Validation Result of the “Mixed” Class 90 3.9.3 5-Fold Cross-Validation Result of the“Hydrophilic” Class 92 3.10 Minimization Hydrogen Orientation of Protein-Ligand Complexes 94 Chapter 4: Discussions 98 4.1 Hydrogen Adding- All Hydrogen or Only Polar Hydrogen? 98 4.2 Water Molecules Consideration 98 4.3 Metal Ions Consideration 99 4.4 Uncertainties in Experiments of Inhibitory Constants Estimation 99 4.5 pH Effects 100 4.6 Uncertainties of X-ray or NMR Determined Structures 100 4.6.1 Conformational Changes (Induced-Fit Effects) upon Ligand Binding 100 4.6.2 X-Ray Structures or NMR Structures? 101 4.6.3 Structure-Dependent Scoring Functions 101 4.7 Minimization Refinement 102 4.8 Meaning of the Clustering Accordance 102 4.9 AIC versus PCA 102 4.10 Physical Meanings Encoded in the Class-Optimized Scoring Functions 104 Chapter 5: Conclusions 113 Chapter 6: Future Directions 115 References 117 Appendix: Protein-Ligand Complexes in PLID 123
dc.language.iso	en
dc.subject	類別	zh_TW
dc.subject	交互作用	zh_TW
dc.subject	最佳化	zh_TW
dc.subject	配體受體	zh_TW
dc.subject	評分函數	zh_TW
dc.subject	class	en
dc.subject	binding	en
dc.subject	protein-ligand complex	en
dc.subject	scoring function	en
dc.subject	optimized	en
dc.title	發展類別最佳化之高解析度之配體受體交互作用評分函數	zh_TW
dc.title	Developing 'Class-Optimized' Scoring Functions for Drug Design	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃明經(Ming-Jing Hwang),孔繁璐(Fan-Lu Kung),孫英傑(Ying-Chieh Sun)
dc.subject.keyword	配體受體,交互作用,評分函數,類別,最佳化,	zh_TW
dc.subject.keyword	protein-ligand complex,binding,scoring function,class,optimized,	en
dc.relation.page	149
dc.rights.note	有償授權
dc.date.accepted	2006-07-27
dc.contributor.author-college	醫學院	zh_TW
dc.contributor.author-dept	藥學研究所	zh_TW
顯示於系所單位：	藥學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	2.56 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。