Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31352
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor高成炎
dc.contributor.authorChi-Hung Tsaien
dc.contributor.author蔡其杭zh_TW
dc.date.accessioned2021-06-13T02:44:52Z-
dc.date.available2009-01-24
dc.date.copyright2007-01-24
dc.date.issued2006
dc.date.submitted2006-10-19
dc.identifier.citation[1] Vapnik, V., The Nature of Statistical Learning Theory, Springer: New York 1995.
[2] Vapnik, V., Estimation of Dependences Based on Empirical Data. Springer: Berlin 1982.
[3] Vapnik, V., Golowich, S., Smola, A., Support vector method for function approximation, regression and signal processing. Advances in Neural Information processing Systems 9 (1997), 281-287.
[4] Xue, C. X., Zhang, R. S., Liu, H. X., Yao, X. J., Liu, M. C., Hu, Z. D., Fan, B. T., An accurate QSPR study of O-H bond dissociation energy in substituted phenols based on support vector machines. J Chem Info Comp Sci 44 (2004), 669-677.
[5] Chang, C. C., Lin, C. J., LIBSVM: a library for support vector machines. (2001). Available from: http://www.csie.ntu.edu.tw/_cjlin/libsvm/
[6] Wedemeyer, W. J., Welker, E., Narayan, M., Scheraga, H. A., Disulfide bonds and protein folding. Biochemistry 39 (2000), 4207-4216.
[7] Harrison, P. M., Sternberg, MJE. Analysis and classification of disulphide connectivity in proteins. J Mol Biol 244 (1994), 448-463.
[8] Huang, E. S., Samudrala, R., Ponder, J. W., ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. J Mol Biol 290 (1999), 267-281.
[9] Fiser, A., Cserzo,ˆ M., Tuぴdos, E., Simon, I., Different sequence environment of cysteines and half cysteines in proteins: application to predict disulfide forming residues. FEBS Lett 302 (1992), 117-120.
[10] Muskal, S. M., Holbrook, S. R., Kim, S. H., Prediction of the disulfide bonding state of cysteine in proteins. Protein Eng 3 (1990), 667-672.
[11] Fariselli, P., Riccobelli, P., Casadio, R., Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Proteins: Struct, Funct, Genet 36(1999), 340-346.
[12] Fiser, A., Simon, I., Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16 (2000), 251-256.
[13] Martelli, P. L., Fariselli, P., Malaguti, L., Casadio, R.. Prediction of the disulfide-binding state of cysteines in proteins at 88% accuracy. Protein Sci 11 (2002), 2735-2739.
[14] Vapnik, V., Statistical learning theory. New York: Wiley; (1998).
[15] Chen, Y. C., Lin, Y. S., Lin, C. J., Hwang, J. K., Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins 55 (2004), 1036-1042.
[16] Kubinyi, H., QSAR and 3-D QSAR in drug design. 2. Applications and problems. Drug Discovery Today 2 (1997), 538-546.
[17] Cramer, R. D., Patterson, D. E., Bunce, J. D., Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. Journal of the American Chemical Society 110 (1988), 5959-5967.
[18] Wold, S., Johansson, E., Cocchi, M., PLS - Partial least-squares projections to latent structures. in H.Kubinyi Ed., 3D QSAR in Drug Design; Theory, Methods and Applications. ESCOM Science Publishers, Leiden Holland (1993).
[19] Ortiz, A. R., Pisabarro, M. T., Gago, F., Wade, R. C., Prediction of drug binding affinities by comparative binding energy analysis. J Med Chem 38 (1995), 2681-2691.
[20] Perez, C., Pastor, M., Ortiz, A. R., Gago, F., Comparative binding energy analysis of HIV-1 protease inhibitors: incorporation of solvent effects and validation as a powerful tool in receptor-based drug design. J Med Chem 41 (1998), 836-852.
[21] Wanchana, S., Yamashita, F., Hashida, M., QSAR analysis of the inhibition of recombinant CYP 3A4 activity by structurally diverse compounds using a genetic algorithm-combined partial least squares method. Pharmaceutical Research 20 (2003), 1401-1408.
[22] Hasegawa, K., Kimura, T., Funatsu, K., GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA-Based PLS Method. Quantitative Structure-Activity Relationships 18 (1999), 262-272.
[23] Holloway, B., A prior prediction of activity for HIV-1 protease inhibitors employing energy minimization in the active site. J Med Chem 38 (1995), 305-317.
[24] Yang, J. M., Chen, C. C., GEMDOCK: a generic evolutionary method for molecular docking. Proteins: Structure, Function, and Bioinformatics 55 (2004), 288-304.
[25] Yang, J. M., Kao, C. Y., An evolutionary algorithm for the synthesis of multilayer coatings at oblique light incidence. IEEE/OSA Journal of Lightwave Technology 19 (2001), 559-570.
[26] Thompson, W. J., Fitzgerald, P. M.; Holloway, M. K.; Emini, E. A.; Darke, P. L. et al. Synthesis and antiviral activity of a series of HIV-1 protease inhibitors with functionality tethered to the P1 or P1' phenyl substituents: X-ray crystal structure assisted design. J Med Chem 35 (1992), 1685-1701.
[27] Wang, Y. X., Freedberg, D. I., Yamazaki, T., Wingfield, P. T., Stahl, S. J., Solution NMR Evidence That the HIV-1 Protease Catalytic Aspartyl Groups Have Different Ionization States in the Complex Formed with the Asymmetric Drug KNI-272. Biochemistry 35 (1996), 9945-9950.
[28] Lifson, S., Hagler, A. T., Dauber, P., Consistent Force Field Studies of Intermolecular Forces in Hydrogen Bonded Crystals I: Carboxylic Acids, Amides, and the C=O…H Hydrogen Bonds. Journal of the American Chemical Society 101 (1979), 5111-5120.
[29] Dewar, M. J. S., Zoebisch, E. G., Healy, E. F., Stewart, J. J. P., AM1: A New General Purpose Quantum Mechanical Molecular Model. Journal of the American Chemical Society 107 (1985), 3902-3909.
[30] Fariselli, P. and Casadio, R., Prediction of disulfide connectivity in proteins. Bioinformatics 17 (2001), 957-964.
[31] Fariselli, P., Riccobelli, P. and Casadio, R. A neural network based method for predicting the disulfide connectivity in proteins. In Damiani, E., Jain, L. C., Howlett, R. J. and Ichalkaranje, N. (eds), Knowledge based intelligent information engineering systems and allied technologies (KES 2002). IOS Press, Amsterdam, 1, 464-468.
[32] Vullo, A. and Frasconi, P., Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics 20 (2004), 653-659.
[33] Zhao, E. et al., Cysteine separations profiles on protein sequences infer disulfide connectivity. Bioinformatics 21 (2005), 1415-1420.
[34] Baldi, P., Cheng, J., Vullo, A., Large-scale prediction of disulphide bond connectivity. In: Saul, L. K., Weiss, Y., Bottou, L. (editors), Advances in neural information processing systems 17. Cambridge, MA: MIT Press; (2005), 97-104.
[35] Ferre, F., Clote, P., Disulfide connectivity prediction using secondary structure information and diresidue frequencies. Bioinformatics 21 (2005), 2336-2346.
[36] Harrison, P. M. and Sternberg, M. J. E., Analysis and classification of disulphide connectivity in proteins. J Mol Biol 244 (1994), 448-463.
[37] Altschul, S. F. et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 (1997), 3389-3402.
[38] Gabow, H. N., Implementation of algorithms for maximum matching on nonbipartite graphs. Phd Thesis, Stanford University, CA (1973).
[39] Rothberg, E. (1985) wmatch: a C Program to solve maximum weight matching.
[40] Bairoch, A. and Apweiler, R., The Swiss-Prot protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28 (2000), 45-48.
[41] Chen, Y. C., Hwang, J. K., Prediction of disulfide connectivity from protein sequences. Proteins: Struct, Funct, Bioinformatics 61 (2005), 507-512.
[42] Tsai, C. H., Chen, B. J., Chan, C. H., Liu, H. L., Kao, C. Y., Improving disulfide connectivity prediction with sequential distance between oxidized cysteines. Bioinformatics 21 (2005), 4416-4419.
[43] Platt, J., Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, A. J., Bartlett, P. L., Schoぴlkopf, B., Schuurmans, D. (editors), Advances in Large Margin Classifiers. Cambridge, MA: MIT Press. (2000).
[44] van Vlijmen, H. W., Gupta, A., Narasimhan, L. S., Singh, J., A novel database of disulfide patterns and its application to the discovery of distantly related homologs. J Mol Biol 335 (2004), 1083-1092.
[45] Zheng, W. and Tropsha, A., Novel Variable Selection Quantitative Strucure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle. Journal of chemical information and computer sciences 40 (2000), 185-194.
[46] Shen, M., et al., Development and Validation of k-Nearest-Neighbor QSPR Models of Metabolic Stability of Drug Candidates. Journal of Medicinal Chemistry 46 (2003), 3013-3020.
[47] Shen, M. et al., Quantitative Structure-Activity Relaionship Analysis of Functionalized Amino Acid Anticonvulsant Agents Using k Nearest Neighbor and Simulated Annealing PLS Methods. J Med Chem 45 (2002), 2811-2823.
[48] Hoffman, B. et al., Quantitative Structure-Acivity Relationship Modeling of Dopamine D1 Antagonists Using Comparative Molecular Field Analysis, Genetic Algorithms-Partial Least-Squares, and K Nearest Neighbor Methods. J Med Chem 42 (1999), 3217-3226.
[49] Rudolf, K. and Marcia, M. C. F., A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors I. The chemometric approach. Journal of Molecular Graphics 21, 435-448 (2003).
[50] Blaha, L., Damborsky, J. and Nemec, M., QSAR for Acute Toxicity of Saturated and Unsaturated Halogenated Aliphatic Compounds. Chemosphere 36 (1998), 1345-1365.
[51] Cajan, M., Damborsky, J., Stibor, I. and Koca, J., Stability of Complexes of Aromatic Amides with Bromide Anion: Quantitative Structure-Property Relationships. J Chem info Comp Sci 40 (2000), 1151-1157.
[52] Golbraikh, A. et al., Ratinal Selection of training and test sets for the development of validated QSAR models. Journal of Computer-Aided Molecular Design 17 (2003), 241-253.
[53] Shen, M. et al., Application of Predictive QSAR Models to Database Mining: Identification and Experimental Validation of Novel Anticonvulsant Compounds. J Med Chem 47 (2004), 2356-2364.
[54] Votano, J. R. et al., Three new consensus QSAR models for the prediction of Ames genotoxicity. Mutagenesis 19 (2004), 365-377.
[55] Cronin, M. T. D., Netzeva, T. I., Dearden, J. C., Edwards, R. and Worgan, A. D. P., Assessment and Modeling of the Toxicity of Organic Chemicals to Chlorella vulgaris: Development of Novel Database. Chemical Research in Toxicology 17 (2004), 545-554.
[56] Xiao, Z. et al., Antitumor Agents. 213. Modeling of Epipodophyllotoxin Derivatives Using Variable Selection k Nearest Neighbor QSAR Method. J Med Chem 45(2002), 2294-2309.
[57] Baker, J. E., Reducing Bias and Inefficiency in the Selection Algorithm. The Second International Conference on Genetic Algorithms and their Application (1987), 14-21.
[58] Tropsha, A., Gramatica, P. and Gombar, V. K., The Importance of Being Earnest: Validation is the absolute Essential for Successful Application and Interpretation of QSAR Models. QSAR & Combinatorial Science 22 (2003), 69-77.
[59] Rogers, D. and Hopfinger, A. J., Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J Chem Info Comp Sci 34 (1994), 854-866.
[60] Prabhakar, Y. S., A Combinatorial Approach to the Variable Selection in Multiple Linear Regression: Analysis of Selwood et al. Data Set - A Case Study. QSAR & Combinatorial Science 22 (2003), 583-595.
[61] Damborsky, J., Lynam, M. and Kuty, M., Structure-biodegradability relationships for chlorinated dibenzo-p-dioxins and dibenzofurans. in Biodegradation of Dioxins and Furans, ed. Wittich, R. M., 1998, 165-228.
[62] Chen, Y. C., Yang, J. M., Tsai, C. H., Kao, C. Y., GEMPLS: A New QSAR Method Combining Generic Evolutionary Method and Partial Least Squares, Lecture Notes in Computer Science 3349 (2005), 125-135.
[63] Selwood, D. L. et al., Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study, J Med Chem 33 (1990), 136-142.
[64] Besler, B. H., Merz, K. M., Kollman, P. A., Atomic Charges Derived from Semiempirical Methods. Journal of Computational Chemistry 11 (1990), 431-439.
[65] Kiralj, R., Ferreira, M. M. C., A priori molecular descriptors in QSAR: a case of HIV-1 protease in hibitors I. The chemometric approach, Journal of Molecular Graphics and Modeling, 21 (2003), 435-448.
[66] Guha, R. and Jurs, P. C., Development of Linear, Ensemble, and Nonlinear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors, J Chem Info Comp Sci, 44 (2004), 2179-2189.
[67] Pandey, A. et al., Identification of Orally Active, Potent, and Selective 4-Piperazinylquinazolines as Antagonists of the Platelet-Derived Growth Factor Receptor Tyrosine Kinase Family, J Med Chem, 45 (2002), 3772-3793.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/31352-
dc.description.abstractSupport Vector Machine (SVM) is widely adopted in the field of machine learning and pattern recognition, and recently the application of SVM techniques to bioinformatics is also very promising. In this dissertation, we applied SVM to two important issues in bioinformatics: protein disulfide connectivity prediction and quantitative-structure activity relationship (QSAR) model construction.
For disulfide connectivity prediction, we implemented an algorithm which infers pair-wise bonding probability by SVM, and introduced a descriptor which derived from the sequential distance between oxidized cysteines (DOC). From the analysis of prediction, it revealed that the prediction accuracy is improved with the addition of this descriptor DOC. Furthermore, we developed a two-level prediction model to integrate protein local and global information. The experimental results showed that the prediction accuracy is greatly enhanced. These results are compared with those of previous studies, and a prediction web-service is also provided on the internet.
For QSAR model construction, we developed an approach to build QSAR models by selecting the hypothetical descriptor pharmacophore (HDP) with generic evolutionary method (GEM) and correlating the descriptors to activities with SVM. Experimental results of 5 public datasets indicated that our approach is comparable to those of previous studies. Additionally, we incorporated k-means and hierarchical clustering methods to cluster compounds into subsets and construct specific QSAR model for each cluster. The experimental results show that compounds with particular structural features are successfully clustered into the same subset, and the prediction accuracy was enhanced using specific models build by these clusters.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T02:44:52Z (GMT). No. of bitstreams: 1
ntu-95-D90922008-1.pdf: 1979287 bytes, checksum: bc51cb9e4fcbb99329c79ff530684485 (MD5)
Previous issue date: 2006
en
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Background Knowledge 2
1.1.1 Supprt Vector Machine in bioinformatics 2
1.1.2 Disulfide Connectivity Prediction 3
1.1.3 Quantitative Structure-Activity Relationships (QSAR) 4
1.2 Thesis Overview 5
Chapter 2 Improving Disulfide Connectivity Prediction 7
2.1 Introduction 7
2.2 Prediction of the Disulfide Connectivity Pattern 8
2.2.1 Support Vector Machine 9
2.2.2 Data Encoding 9
2.2.3 Maxium Weight Matching 10
2.2.4 Evaluation Criteria 10
2.3 Dataset and Results 10
2.3.1 Cross-Validation of SP39 11
2.4 PreCys Web Server 12
2.5 Discussion and Conclusion 13
Chapter 3 Two-level Models for Disulfide Connectivity Prediction 16
3.1 Pair-wise and Pattern-wise Methods 16
3.2 Two-level Framework 17
3.2.1 Level-1: Pair-wise 18
3.2.2 Level-2: Pattern-wise 20
3.2.3 Reduction for Imbalance 21
3.3 Results and Discussion 22
3.3.1 Dataset Preparation 22
3.3.2 Validation with SP39 and SP43 23
3.4 Effects of Descriptors 24
3.4.1 Pair-wise Relation from Level-1 25
3.4.2 CSP implication 26
3.4.3 Global Information 27
3.4.4 Effect of Candidate Selection 28
3.5 Conclusion 28
Chapter 4 GEMSVM for QSAR Models construction 30
4.1 Introduction 30
4.2 Material and Methods 33
4.2.1 Screen Features by Mahalanobis Distance 33
4.2.2 Feature Selection by Generic Evolutionary Method 34
4.2.3 GEMSVM 35
4.2.4 GEMPLS 36
4.2.5 GEMkNN 37
4.2.6 Performance Evaluation 38
4.2.7 Dataset Preparation 38
4.3 Results and Discussion 41
4.3.1 Validation with Artificial Data Set 41
4.3.2 Validation with Public Data Sets 43
4.4 Conclusions 45
Chapter 5 Ligand Clustering and Specific QSAR Model 48
5.1 Introduction 48
5.2 Material and Methods 49
5.2.1 Identify Activity-Correlated Features 49
5.2.2 Ligand Clustering 49
5.2.3 Specific Model Construction and Prediction 51
5.2.4 Dataset Preparation 51
5.3 Results and Discussion 52
5.3.1 PDGFR dataset 52
5.4 Conclusions 56
Chapter 6 Conclusions 58
6.1 Summary 58
6.2 Future works 59
Bibliography 61
Appendix A. List of Publications 68
dc.language.isoen
dc.title應用支援向量機解蛋白質雙硫鍵預測及藥物結構活性量化回歸模型建構zh_TW
dc.titleApplying Support Vector Machines to Protein Disulfide Connectivity Prediction and QSAR Model Constructionen
dc.typeThesis
dc.date.schoolyear95-1
dc.description.degree博士
dc.contributor.oralexamcommittee歐陽明,歐陽彥正,趙坤茂,黃奇英,楊進木,劉宣良
dc.subject.keyword支援向量機,雙硫鍵,雙硫鍵預測,藥物結構活性迴歸模型,zh_TW
dc.subject.keywordSVM,disulfide-bond,disulfide connectivity prediction,QSAR,en
dc.relation.page70
dc.rights.note有償授權
dc.date.accepted2006-10-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-95-1.pdf
  目前未授權公開取用
1.93 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved