應用G2DE分群法於分子嵌合能量回歸模型之研究

Chiun-Yao Chiang; 蔣鈞堯

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46445

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃乾綱
dc.contributor.author	Chiun-Yao Chiang	en
dc.contributor.author	蔣鈞堯	zh_TW
dc.date.accessioned	2021-06-15T05:09:24Z	-
dc.date.available	2012-07-26
dc.date.copyright	2010-07-26
dc.date.issued	2010
dc.date.submitted	2010-07-23
dc.identifier.citation	1. Tame, J., Scoring functions: a view from the bench. Journal of Computer-Aided Molecular Design, 1999. 13(2): p. 99-108. 2. Ajay and M.A. Murcko, Computational methods to predict binding free energy in ligand-receptor complexes. J Med Chem, 1995. 38(26): p. 4953-67. 3. Gohlke, H. and G. Klebe, Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angew Chem Int Ed Engl, 2002. 41(15): p. 2644-76. 4. Myers, S. and A. Baker, Drug discovery--an operating model for a new era. Nat Biotechnol, 2001. 19(8): p. 727-30. 5. Kitchen, D., et al., Docking and scoring in virtual screening for drug discovery: methods and applications. Nature reviews Drug discovery, 2004. 3(11): p. 935-949. 6. Kim, K., Outliers in SAR and QSAR: Is unusual binding mode a possible source of outliers? Journal of Computer-Aided Molecular Design, 2007. 21(1): p. 63-86. 7. Seber, G. and A. Lee, Linear regression analysis. 2003: Wiley-interscience New York. 8. Smola, A. and B. Scholkopf, A tutorial on support vector regression. Statistics and Computing, 2004. 14(3): p. 199-222. 9. Hearst, M., et al., Support vector machines. IEEE Intelligent Systems and Their Applications, 1998. 13(4): p. 18-28. 10. 吳智棚, 應用非線性函數於分子嵌合能量函數之研究, in 電機資訊學院資訊工程學研究所. 2007, 國立臺灣大學. 11. Chang, C. and C. Lin, LIBSVM: a library for support vector machines. 2001, Citeseer. 12. Hsieh, C.-H., D.T.-H. Chang, and Y.-J. Oyang, Data Classification with a Generalized Gaussian Components based Density Estimation Algorithm, in Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA. 2009. 13. Jolliffe, I., Principal component analysis. 2002: Springer verlag. 14. Bateman, A., et al., The Pfam protein families database. Nucleic Acids Res, 2004. 32(Database issue): p. D138-41. 15. Finn, R.D., et al., The Pfam protein families database. Nucleic Acids Res, 2008. 36(Database issue): p. D281-8. 16. Finn, R.D., et al., The Pfam protein families database. Nucleic Acids Res, 2010. 38(Database issue): p. D211-22. 17. Eldridge, M.D., et al., Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des, 1997. 11(5): p. 425-45. 18. Wang, R., L. Lai, and S. Wang, Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des, 2002. 16(1): p. 11-26. 19. Morris, G., Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, 1998. 19. 20. Muegge, I. and Y. Martin, A General and Fast Scoring Function for Protein- Ligand Interactions: A Simplified Potential Approach. J. Med. Chem, 1999. 42(5): p. 791-804. 21. Ishchenko, A. and E. Shakhnovich, SMall Molecule Growth 2001 (SMoG2001): an improved knowledge-based scoring function for protein-ligand interactions. Journal of medicinal chemistry, 2002. 45(13): p. 2770-2780. 22. Yang, C., R. Wang, and S. Wang, M-score: a knowledge-based potential scoring function accounting for protein atom mobility. J. Med. Chem, 2006. 49(20): p. 5903-5911. 23. Bohm, H., The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. Journal of Computer-Aided Molecular Design, 1994. 8(3): p. 243-256. 24. Murray, C.W., T.R. Auton, and M.D. Eldridge, Empirical scoring functions. II. The testing of an empirical scoring function for the prediction of ligand-receptor binding affinities and the use of Bayesian regression to improve the quality of the model. J Comput Aided Mol Des, 1998. 12(5): p. 503-19. 25. Kellogg, G.E., Getting it right: modeling of pH, solvent and 'nearly' everything else in virtual screening of biological targets. Journal of Molecular Graphics and Modelling, 2004. 22. 26. Krammer, A., et al., LigScore: a novel scoring function for predicting binding affinities. Journal of Molecular Graphics and Modelling, 2005. 23(5): p. 395-407. 27. de Azevedo, W.F., Jr. and R. Dias, Evaluation of ligand-binding affinity using polynomial empirical scoring functions. Bioorg Med Chem, 2008. 16(20): p. 9378-82. 28. Huey, R., et al., A semiempirical free energy force field with charge-based desolvation. Journal of computational chemistry, 2007. 28(6): p. 1145-1152. 29. Muegge, I., PMF scoring revisited. J. Med. Chem, 2006. 49(20): p. 5895-5902. 30. Wang, R., et al., The PDBbind Database: Collection of Binding Affinities for Protein- Ligand Complexes with Known Three-Dimensional Structures. J. Med. Chem, 2004. 47(12): p. 2977-2980. 31. van der Waals, J. and P. Kohnstamm, Lehrbuch der Thermodynamik. 1908: Maas & van Suchtelen Leipzig, Amsterdam. 32. Gilson, M. and B. Honig, Calculation of electrostatic potentials in an enzyme active site. 1987. 33. McDonald, I. and J. Thornton, Satisfying hydrogen bonding potential in proteins. Journal of molecular biology, 1994. 238(5): p. 777-793. 34. Chang, D.T., Y.J. Oyang, and J.H. Lin, MEDock: a web server for efficient prediction of ligand binding sites based on a novel optimization algorithm. Nucleic Acids Res, 2005. 33(Web Server issue): p. W233-8. 35. Kawashima, S., H. Ogata, and M. Kanehisa, AAindex: Amino Acid Index Database. Nucleic Acids Res, 1999. 27(1): p. 368-9. 36. Kawashima, S. and M. Kanehisa, AAindex: amino acid index database. Nucleic Acids Res, 2000. 28(1): p. 374. 37. Kawashima, S., et al., AAindex: amino acid index database, progress report 2008. Nucleic Acids Res, 2008. 36(Database issue): p. D202-5. 38. Maetschke, S.R. and Z. Yuan, Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinformatics, 2009. 10: p. 341. 39. Guo, D., et al., Prediction of peptide retention times in reversed-phase high-performance liquid chromatography I. Determination of retention coefficients of amino acid residues of model synthetic peptides. Journal of Chromatography A, 1986. 359: p. 499-518. 40. Guy, H., Amino acid side-chain partition energies and distribution of residues in soluble proteins. Biophysical journal, 1985. 47(1): p. 61-70. 41. Radzicka, A. and R. Wolfenden, Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution. Biochemistry, 1988. 27(5): p. 1664-1670. 42. Chothia, C., The nature of the accessible and buried surfaces in proteins. Journal of molecular biology, 1976. 105(1): p. 1-12. 43. Olsen, K., Internal residue criteria for predicting three-dimensional protein structures. Biochimica et Biophysica Acta (BBA)-Protein Structure, 1980. 622(2): p. 259-267. 44. ARGOS, P., J. Rao, and P. HARGRAVE, Structural prediction of membrane-bound proteins. European Journal of Biochemistry, 1982. 128(2-3): p. 565-575.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/46445	-
dc.description.abstract	研究蛋白質(protein)與配體(ligand)交互作用在基礎生物化學領域中相當重要。在這其中，一個很大的問題就是估計受體(receptor)與配體之間結合能(binding affinity)的評分函數。但如今分子嵌合模擬評分函數的部分，仍有很多的進步空間。在虛擬藥物篩選中，一個良好的評分函數，相當於在過程中扮演守門員的角色。本篇論文利用G2DE分群法，使用六個特徵的特性，將851筆複合體的資料集分為若干群組。這六個特徵為凡德瓦力(Van der Waals force)、靜電力(Electrostatic interaction)、氫鍵(Hydrogen bond)、退溶(Desolvation)、配體可扭轉之自由度(number of torsion bonds of a ligand)以及AutoDock 3特有的特徵水分子與極性原子結合時其氫鍵之平均估計能量值(Ehbond)。將主群組與例外者群組分開後，我們在例外者群組內進行分析討論，我們發現含有MHC_I功能區塊的複合體在預測嵌合能量值上偏差較大，單純去掉12條含有MHC_I的複合體之後，即可以讓RMSE(root-mean-squared-error)從2.12下降至2.046。我們也針對主群組建立一回歸模型，可以讓資料集的RMSE降到2.006，這也是眾多評分函數努力的目標，而R2有超過0.49，換算成相關係數(correlation coefficient)則是超過0.7，這也是相當不錯的結果。如結果所示，新的評分函數配合例外者群組的分析，可以提供未來生化分析時更多的線索。	zh_TW
dc.description.abstract	Research on protein-ligand interactions is a crucial part in basic biochemistry field. In this field, one of the important issues is to estimate the binding affinity between receptors and ligands. However, there is still much room for improvement in design of scoring function. In virtual screening, a good scoring function is like a strict goal keeper. Our studies applying G2DE by using 6 features, which are Van der Waals force, Electrostatic interaction, Hydrogen bond, Desolvation, number of torsion bonds of a ligand and Ehbond, divided the 851 protein-ligand complexes dataset into several groups. After the dataset was separated into outliers group and main group, we discovered that there are 12 complexes contains MHC_I domain was far away from the actual binding energy. By eliminating the 12, the RMSE of the predicting binding energy of the dataset is dropped from 2.12 to 2.046. We also construct an empirical scoring function according to the main group. The RMSE of the predicting binding energy of the main group RMSE is 2.006, and the R2 is 0.49. Our paper shows the new scoring function and the outlier detection method by using G2DE, which can provide further clues in biochemistry analysis.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T05:09:24Z (GMT). No. of bitstreams: 1 ntu-99-R97525023-1.pdf: 2940417 bytes, checksum: dc71d282c6a9cf5662b9957c2c4be667 (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	致謝 I 摘要 II Abstract III 目錄 IV 圖目錄 VI 表目錄 VII Chapter 1 緒論 1 1.1 研究動機： 1 1.2 研究目的： 3 1.3 研究流程與論文架構： 4 Chapter 2 工具與相關研究 6 2.1 使用工具 6 2.1.1 線性回歸 6 2.1.2 非線性回歸 7 2.1.3 G2DE分群法 8 2.1.4 主成份分析 10 2.1.5 Pfam 線上服務資料庫(Pfam web service) 11 2.2 例外者(outliers) 12 2.3 相關研究 (relative work) 13 Chapter 3 研究方法 18 3.1 資料集選用分析 20 3.2 特徵選取 25 3.2.1 凡德瓦力 25 3.2.2 靜電力 26 3.2.3 氫鍵 27 3.2.4 退溶與配體扭轉數量(由AutoDock程式計算) 28 3.3 實驗流程： 29 3.3.1 利用G2DE分群法分群 29 3.3.2 一階回歸模型 32 3.3.3 群組誤差驗證與最佳回歸模型 32 Chapter 4 實驗數據結果 35 4.1 G2DE法分群結果 35 4.2 大群組的一階回歸模型結果 39 4.3 群組RMSE驗證結果 44 Chapter 5 討論 48 5.1 文獻討論與比較 48 5.1.1 N/M 比值 48 5.1.2 各項係數權重比較 50 5.1.3 重複非線性回歸實驗 51 5.2 使用pfam與AA index分析例外者群組 54 Chapter 6 結論與未來工作 59 參考書目 61 附錄 A 64
dc.language.iso	zh-TW
dc.title	應用G2DE分群法於分子嵌合能量回歸模型之研究	zh_TW
dc.title	Applying G2DE Classifier on the Energy Scoring Function Model for Molecular Docking	en
dc.type	Thesis
dc.date.schoolyear	98-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	歐陽彥正,張瑞益,林榮信
dc.subject.keyword	能量評分函數,分子嵌合,例外者探測,	zh_TW
dc.subject.keyword	energy scoring function,molecular docking,outlier detection,	en
dc.relation.page	64
dc.rights.note	有償授權
dc.date.accepted	2010-07-26
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 目前未授權公開取用	2.87 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。