Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85760
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor曾宇鳳(Yufeng Jane Tseng)
dc.contributor.authorJen-Hao Chenen
dc.contributor.author陳人豪zh_TW
dc.date.accessioned2023-03-19T23:23:32Z-
dc.date.copyright2022-07-05
dc.date.issued2022
dc.date.submitted2022-05-19
dc.identifier.citation1. Hewitt M, Cronin MT, Enoch SJ et al. In silico prediction of aqueous solubility: the solubility challenge, Journal of Chemical Information and Modeling 2009;49:2572-2587. 2. Llinàs A, Glen RC, Goodman JM. Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?, Journal of Chemical Information and Modeling 2008;48:1289-1303. 3. Llinas A, Avdeef A. Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD∼ 0.17 log) and Loose (SD∼ 0.62 log) Test Sets, Journal of Chemical Information and Modeling 2019;59:3036-3040. 4. Butina D, Gola JM. Modeling aqueous solubility, Journal of chemical information and computer sciences 2003;43:837-841. 5. Lind P, Maltseva T. Support vector machines for the estimation of aqueous solubility, Journal of chemical information and computer sciences 2003;43:1855-1859. 6. Palmer DS, O'Boyle NM, Glen RC et al. Random forest models to predict aqueous solubility, Journal of Chemical Information and Modeling 2007;47:150-158. 7. Molecular operating environment (MOE). Chemical Computing Group Inc., Montreal, Quebec, Canada, 2016. 8. Mitchell BE, Jurs PC. Prediction of aqueous solubility of organic compounds from molecular structure, Journal of chemical information and computer sciences 1998;38:489-496. 9. Liu R, So S-S. Development of quantitative structure− property relationship models for early ADME evaluation in drug discovery. 1. Aqueous solubility, Journal of chemical information and computer sciences 2001;41:1633-1639. 10. Wegner JK, Zell A. Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method, Journal of chemical information and computer sciences 2003;43:1077-1084. 11. Cheng A, Merz KM. Prediction of aqueous solubility of a diverse set of compounds using quantitative structure− property relationships, Journal of medicinal chemistry 2003;46:3572-3580. 12. Hopfinger AJ, Esposito EX, Llinas A et al. Findings of the challenge to predict aqueous solubility, Journal of Chemical Information and Modeling 2009;49:1-5. 13. Zhong F, Xing J, Li X et al. Artificial intelligence in drug design, Science China Life Sciences 2018;61:1191-1204. 14. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems. 2012, p. 1097-1105. 15. Huang G, Liu Z, Van Der Maaten L et al. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 4700-4708. 16. Wang F, Jiang M, Qian C et al. Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, p. 3156-3164. 17. Abu-El-Haija S, Kothari N, Lee J et al. Youtube-8m: A large-scale video classification benchmark, arXiv preprint arXiv:1609.08675 2016. 18. Jin Y, Dou Q, Chen H et al. SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network, IEEE transactions on medical imaging 2017;37:1114-1126. 19. Carreira J, Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, p. 6299-6308. 20. Cho K, Merrienboer Bv, Gulcehre C et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Empirical Methods in Natural Language Processing (EMNLP). 2014. 21. Tran NK, Niedereée C. Multihop attention networks for question answer matching. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 2018, p. 325-334. 22. Gao J, He X, Deng L. Deep learning for web search and natural language processing 2015. 23. Chen H, Engkvist O, Wang Y et al. The rise of deep learning in drug discovery, Drug Discovery Today 2018;23:1241-1250. 24. Unterthiner T, Mayr A, Klambauer G et al. Toxicity prediction using deep learning, arXiv preprint arXiv:1503.01445 2015. 25. Kearnes S, McCloskey K, Berndl M et al. Molecular graph convolutions: moving beyond fingerprints, Journal of Computer-Aided Molecular Design 2016;30:595-608. 26. Kimber TB, Engelke S, Tetko IV et al. Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction. arXiv e-prints. 2018. 27. Gehring J, Auli M, Grangier D et al. Convolutional Sequence to Sequence Learning. In: Proceedings of the 34th International Conference on Machine Learning. 2017, p. 1243-1252. 28. Rogers D, Hahn M. Extended-Connectivity Fingerprints, Journal of Chemical Information and Modeling 2010;50:742-754. 29. Di-Wu L, Li L-L, Wang W-J et al. Identification of CK2 inhibitors with new scaffolds by a hybrid virtual screening approach based on Bayesian model; pharmacophore hypothesis and molecular docking, Journal of Molecular Graphics and Modelling 2012;36:42-47. 30. Planson AG, Carbonell P, Paillard E et al. Compound toxicity screening and structure–activity relationship modeling in Escherichia coli, Biotechnology and Bioengineering 2012;109:846-850. 31. Myint K-Z, Wang L, Tong Q et al. Molecular fingerprint-based artificial neural networks QSAR for ligand biological activity predictions, Molecular pharmaceutics 2012;9:2912-2923. 32. Jain N, Yalkowsky SH. Estimation of the aqueous solubility I: application to organic nonelectrolytes, Journal of pharmaceutical sciences 2001;90:234-252. 33. Huuskonen J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology, Journal of chemical information and computer sciences 2000;40:773-777. 34. Klopman G, Zhu H. Estimation of the aqueous solubility of organic molecules by the group contribution approach, Journal of chemical information and computer sciences 2001;41:439-445. 35. Goh GB, Siegel C, Vishnu A et al. Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2018:302-310. 36. Atwood J, Towsley D. Diffusion-Convolutional Neural Networks. In: Advances in Neural Information Processing Systems. 2016, p. 1993-2001. 37. Xu Y, Dai Z, Chen F et al. Deep Learning for Drug-Induced Liver Injury, Journal of Chemical Information and Modeling 2015;55:2085-2093. 38. Narayanan A, Chandramohan M, Venkatesan R et al. graph2vec: Learning Distributed Representations of Graphs. In: Proceedings of the 13th International Workshop on Mining and Learning with Graphs (MLG). 2017. 39. Yanardag P, Vishwanathan SVN. Deep Graph Kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015, p. 1365-1374. 40. Goh GB, Siegel C, Vishnu A et al. Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models. arXiv e-prints. arXiv e-prints, 2017. 41. Jiménez J, Škalič M, Martínez-Rosell G et al. KDEEP: Protein–Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks, Journal of Chemical Information and Modeling 2018;58:287-296. 42. Wallach I, Dzamba M, Heifets A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. arXiv e-prints. 2015. 43. Wang S, Guo Y, Wang Y et al. SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2019, p. 429-436. 44. Segler MHS, Kogej T, Tyrchan C et al. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks, ACS Central Science 2018;4:120-131. 45. Worachartcheewan A, Mandi P, Prachayasittikul V et al. Large-scale qsar study of aromatase inhibitors using smiles-based descriptors, Chemometrics and Intelligent Laboratory Systems 2014;138:120-126. 46. Jastrzębski S, Leśniak D, Czarnecki WM. Learning to SMILE(S), Workshop track - ICLR 2016. 47. Harel S, Radinsky K. Prototype-Based Compound Discovery Using Deep Generative Models, Molecular pharmaceutics 2018;15:4406-4416. 48. Dietterich TG. Ensemble methods in machine learning. In: International workshop on multiple classifier systems. 2000, p. 1-15. Springer. 49. Coley CW, Barzilay R, Green WH et al. Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction, Journal of chemical information and modeling 2017;57:1757-1772. 50. Lusci A, Pollastri G, Baldi P. Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules, Journal of chemical information and modeling 2013;53:1563-1575. 51. Perez L, Wang J. The effectiveness of data augmentation in image classification using deep learning, arXiv preprint arXiv:1712.04621 2017. 52. Kimber TB, Engelke S, Tetko IV et al. Synergy effect between convolutional neural networks and the multiplicity of smiles for improvement of molecular prediction, arXiv preprint arXiv:1812.04439 2018. 53. Schwaller P, Laino T, Gaudin T et al. Molecular transformer for chemical reaction prediction and uncertainty estimation, arXiv preprint arXiv:1811.02633 2018. 54. Bergstra J, Bengio Y. Random search for hyper-parameter optimization, The Journal of Machine Learning Research 2012;13:281-305. 55. Wang S, Li Z, Zhang S et al. Molecular Property Prediction Based on a Multichannel Substructure Graph, IEEE Access 2020;8:18601-18614. 56. Duvenaud DK, Maclaurin D, Iparraguirre J et al. Convolutional networks on graphs for learning molecular fingerprints. In: Advances in neural information processing systems. 2015, p. 2224-2232. 57. Oquab M, Bottou L, Laptev I et al. Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2014, p. 1717-1724. 58. Hu W, Liu B, Gomes J et al. Strategies for Pre-training Graph Neural Networks, arXiv preprint arXiv:1905.12265 2019. 59. Goh GB, Siegel C, Vishnu A et al. Using rule-based labels for weak supervised learning: a ChemNet for transferable chemical property prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, p. 302-310. 60. Hoffer E, Ben-Nun T, Hubara I et al. Augment your batch: better training with larger batches, arXiv preprint arXiv:1901.09335 2019. 61. Shahriari B, Swersky K, Wang Z et al. Taking the human out of the loop: A review of Bayesian optimization, Proceedings of the IEEE 2015;104:148-175. 62. Goh GB, Hodas NO, Siegel C et al. Smiles2vec: An interpretable general-purpose deep neural network for predicting chemical properties, arXiv preprint arXiv:1712.02034 2017. 63. Schwaller P, Gaudin T, Lanyi D et al. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models, Chemical science 2018;9:6091-6098. 64. Paul A, Jha D, Al-Bahrani R et al. Chemixnet: Mixed dnn architectures for predicting chemical properties using multiple molecular representations, arXiv preprint arXiv:1811.08283 2018. 65. Yang K, Swanson K, Jin W et al. Analyzing learned molecular representations for property prediction, Journal of Chemical Information and Modeling 2019;59:3370-3388. 66. Landrum G. RDKit: Open-source cheminformatics 2006. 67. Dauphin YN, Fan A, Auli M et al. Language Modeling with Gated Convolutional Networks, ICML'17 Proceedings of the 34th International Conference on Machine Learning 2017;70:933-941. 68. Arús-Pous J, Johansson SV, Prykhodko O et al. Randomized SMILES strings improve the quality of molecular generative models, Journal of Cheminformatics 2019;11:1-13. 69. Bjerrum EJ. Smiles enumeration as data augmentation for neural network modeling of molecules, arXiv preprint arXiv:1703.07076 2017. 70. Bjerrum EJ. SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv e-prints. 2017. 71. Landrum G. RDKit: Open-source cheminformatics. http://www.rdkit.org/. 72. Bergstra J, Yamins D, Cox D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: International conference on machine learning. 2013, p. 115-123. 73. Bergstra JS, Bardenet R, Bengio Y et al. Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems. 2011, p. 2546-2554. 74. Delaney JS. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure, Journal of chemical information and computer sciences 2004;44:1000-1005. 75. Wu Z, Ramsundar B, Feinberg EN et al. MoleculeNet: a benchmark for molecular machine learning, Chemical science 2018;9:513-530. 76. Ramsundar B, Eastman P, Walters P et al. Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. ' O'Reilly Media, Inc.', 2019. 77. Sliva K. Latest animal models for anti-HIV drug discovery, Expert opinion on drug discovery 2015;10:111-123. 78. Organization WH. https://apps.who.int/gho/data/node.main.623?lang=en (8/2 2020, date last accessed). 79. De Clercq E. Anti-HIV drugs: 25 compounds approved within 25 years after the discovery of HIV, International journal of antimicrobial agents 2009;33:307-320. 80. Program DT. AIDS Antiviral Screen Data. https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data (08/12 2020, date last accessed). 81. Lee YO, Kim YJ. The Effect of Resampling on Data‐Imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning, Molecular Informatics 2020. 82. Sutskever I, Martens J, Dahl G et al. On the importance of initialization and momentum in deep learning. In: International conference on machine learning (ICML). 2013, p. 1139-1147. 83. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907 2016. 84. Breiman L. Random forests, Machine learning 2001;45:5-32. 85. Goh GB, Hodas N, Siegel C et al. SMILES2vec. Proceedings of ACM SIGKDD Conference (KDD). 2018. 86. You J, Ying R, Ren X et al. Graphrnn: Generating realistic graphs with deep auto-regressive models, arXiv preprint arXiv:1802.08773 2018. 87. Almási A-D, Woźniak S, Cristea V et al. Review of advances in neural networks: Neural design technology stack, Neurocomputing 2016;174:31-41. 88. Cui X, Goel V, Kingsbury B. Data augmentation for deep neural network acoustic modeling, IEEE/ACM Transactions on Audio, Speech and Language Processing 2015;23:1469-1477. 89. Kimber TB, Engelke S, Tetko IV et al. Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction, arXiv e-prints 2018. 90. Oskooei A, Born J, Manica M et al. PaccMann: Prediction of anticancer compound sensitivity with multi-modal attention-based neural networks, Workshop on Machine Learning for Molecules and Materials in 32nd Conference on Neural Information Processing Systems (NeurIPS 2018 Workshop) 2018. 91. Schwaller P, Laino T, Gaudin T et al. Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation. arXiv e-prints. 2018. 92. Gilmer J, Schoenholz SS, Riley PF et al. Neural Message Passing for Quantum Chemistry. In: Proceedings of the 34th International Conference on Machine Learning. 2017, p. 1263-1272. 93. Schütt KT, Arbabzadah F, Chmiela S et al. Quantum-chemical insights from deep tensor neural networks, Nature Communications 2017;8. 94. Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems. 2016, p. 3844-3852. 95. Fujita T, Winkler DA. Understanding the roles of the “two QSARs”, Journal of Chemical Information and Modeling 2016;56:269-274. 96. Li Q, Wang Y, Bryant SH. A novel method for mining highly imbalanced high-throughput screening data in PubChem, Bioinformatics 2009;25:3310-3316. 97. Ertekin S, Huang J, Bottou L et al. Learning on the border: active learning in imbalanced data classification. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. 2007, p. 127-136. 98. Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced data sets, Computational intelligence 2004;20:18-36. 99. Gazzah S, Amara NEB. New oversampling approaches based on polynomial fitting for imbalanced data sets. In: 2008 The Eighth IAPR International Workshop on Document Analysis Systems. 2008, p. 677-684. IEEE. 100. Tang Y, Zhang Y-Q, Chawla NV et al. SVMs modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2008;39:281-288. 101. Xie J, Qiu Z. The effect of imbalanced data sets on LDA: A theoretical and empirical analysis, Pattern recognition 2007;40:557-562. 102. Sun A, Lim E-P, Liu Y. On strategies for imbalanced text classification using SVM: A comparative study, Decision Support Systems 2009;48:191-201. 103. Chang C-Y, Hsu M-T, Esposito EX et al. Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods, Journal of Chemical Information and Modeling 2013;53:958-971. 104. Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets, PloS one 2015;10:e0118432.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85760-
dc.description.abstract基於指紋、基於特徵和基於分子圖的表示都已在其他的研究中與不同的深度學習方法一起用於預測分子特性。不同的分子表示已經被清楚地證明會影響模型預測和可解釋性。我們回顧了不同的分子表示方法,並專注於使用圖形和線性表示方式進行深度學習模型的建立。通常,在計算其特性時,人們會使用一種固定的規範化學結構流程去表示一種分子。我們仔細檢查了表示單個分子的簡化分子線性輸入規範 (SMILES) 符號,並建議使用 SMILES 中的完整列舉以達到更高的模型預測準確性。我們使用了卷積神經網絡 (CNN)的技術來建立模型。SMILES 的完整列舉可以改進分子在模型上的呈現並以所有可能的角度描述分子。用這種方法訓練出的 CNN 模型在處理大型數據集時非常穩健,因為無需加入額外的化學知識來預測溶解度。此外,傳統上很難使用神經網絡來解釋化學結構對單個屬性的貢獻。我們展示了在解碼網絡中使用注意力機制來檢測與溶解度相關的分子部分,從CNN模型中解釋了化學結構對於預測屬性的影響。 生成用於預測分子性質的最佳深度學習模型的關鍵是測試和應用各種優化方法。雖然過去在製藥領域之外的不同研究中的各個優化方法都成功地提高了模型性能,但當小心地應用這些方法和實踐特定的優化方法組合時,模型效能可能可以得到更好的提升。我們使用和討論了文獻中出現的三種高性能優化方法。這些方法已被證明可以顯著提高其他領域的模型性能。我們最終找到一種通用程序,能夠針對不同分子特性去訓練出效果更優化的 CNN 模型。這三種技術分別是針對化合物 SMILES 表示的不同列舉比率去動態調整批量大小策略、用於選擇模型超參數的貝葉斯優化方法以及整合以化學特徵作為輸入資料的前饋神經網絡獲得的特徵與CNN網路學習的分子特徵向量進行結果預測。我們總共使用了七種不同的分子特性(水溶性、親脂性、水合能、電子特性、血腦屏障通透性和抑制)。我們演示了這三種模型優化技術中的每一種如何影響模型,以及最佳模型結合使用貝葉斯優化和動態批量大小調整中受益。zh_TW
dc.description.abstractFingerprint based, feature based, and molecular graph-based representations have all been used with different deep learning methods for prediction of the molecular properties. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. We reviewed different representations and also focused on using graph and line notations for modelling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular input line entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN. The key to generating the best deep learning model for predicting molecular property is to test and apply various optimization methods. While individual optimization methods from different past works outside the pharmaceutical domain each succeeded in improving the model performance, better improvement may be achieved when specific combinations of these methods and practices are applied. Three high-performance optimization methods in the literature that have been shown to dramatically improve model performance from other fields are used and discussed, eventually resulting in a general procedure for generating optimized CNN models on different properties of molecules. The three techniques are the dynamic batch size strategy for different enumeration ratios of the SMILES representation of compounds, Bayesian optimization for selecting the hyperparameters of a model, and feature learning using chemical features obtained by a feedforward neural network, which are concatenated with the learned molecular feature vector. A total of seven different molecular properties (water solubility, lipophilicity, hydration energy, electronic properties, blood–brain barrier permeability and inhibition) are used. We demonstrate how each of the three techniques can affect the model and how the best model can generally benefit from using Bayesian optimization combined with dynamic batch size tuning.en
dc.description.provenanceMade available in DSpace on 2023-03-19T23:23:32Z (GMT). No. of bitstreams: 1
U0001-1905202217054200.pdf: 2676074 bytes, checksum: 4ee870b9027e897786b2dd83fd118366 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents口試委員會審定書 i Acknowledgments ii 中文摘要 iii Abstract iv Contents vi List of Figures viii List of Tables ix 1. Introduction 1 2. Molecular property prediction and pharmaceutical patent search 4 2.1 Molecular representations 4 2.2 Optimization methods in deep learning 7 3. Learn-it-all model 13 3.1 Framework 13 3.2 Embedding construction 16 3.3 Convolutional block structure 17 3.4 Linear mapping and Attention mechanism 18 3.5 Data preprocessing 19 3.6 Optimization methods 21 4. Experiment 26 4.1 Dataset 26 4.2 Configuration 29 4.3 Enumerating solubility dataset 31 4.4 Optimization protocol analysis 36 5. Discussion and Conclusion 46 5.1 The case study in solubility dataset 46 5.2 A general optimization protocol for molecular property prediction 53 Bibliography 64 Figure 1. Overview of the procedure. 14 Figure 2. The detailed neural network structure of the Learn-it-all model. 15 Figure 3. Detailed framework of the deep learning model. 25 Figure 4. Scatter plots of solubility of the predicted and measured results. 32 Figure 5. Interpretability of the model. 35 Figure 6. The optimization framework. 37 Figure 7. Box plot depicting the performance of the original model and the models employing the optimization methods. 55 Figure 8. Box plot depicting the performance of the original model and the models employing the optimization methods with 10 random seeds on the BBBP dataset. 56 Figure 9. Boxplot of model performances on different configurations and baseline model. 61 Figure 10. Boxplot of model performances on Enu, Enu_DB and Enu_DB_BO configurations and baseline model GraphConv on the HIV dataset. 62 Table 1. Bayesian optimization ranges for hyperparameter tuning. 23 Table 2. Descriptions of the public dataset from MoleculeNet. 28 Table 3. The tuning hyperparameters and the configuration ranges. 31 Table 4. Mean square error (MSE) for different enumeration ratios of the training dataset. 32 Table 5. Mean square error (MSE) for the model trained by different enumeration dataset combinations. 33 Table 6. The model performance of GraphConv on two DeepChem versions 2.3 and 2.5 with the same split methods and dataset. 36 Table 7. Comparisons of the model performance with enumeration ratios. 38 Table 8. The evaluated combinations of the optimization methods. 40 Table 9. Comparison of the model performances with two dynamic batch size strategies. 41 Table 10. Performance comparison of the different batch size. 42 Table 11. Comparison of the model performance with four Bayesian optimization strategies. 43 Table 12. Comparison of the model performances with four hybrid representation strategies. 44 Table 13. Comparison of the different model performance on the Delaney solubility dataset. 48 Table 14. Comparison of the model performances with four optimization techniques. 53 Table 15. Comparison of the molecular data statuses in the ESOL, QM7, BBBP and HIV datasets. 59 Table 16. Performance comparison of the related works. 62
dc.language.isoen
dc.subject化學資訊學zh_TW
dc.subject藥物發現zh_TW
dc.subject生物科學zh_TW
dc.subjectDrug discoveryen
dc.subjectCheminformaticsen
dc.subjectBiological sciencesen
dc.title用於分子特性預測的電腦輔助藥物設計zh_TW
dc.titleComputer-Aided Drug Design for molecular property predictionen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree博士
dc.contributor.oralexamcommittee張瑞峰(Ruey-Feng Chang),陳文進(Wen-Chin Chen),楊佳玲(Chia-Lin Yang),蘇柏翰(Bo-Han Su)
dc.subject.keyword生物科學,藥物發現,化學資訊學,zh_TW
dc.subject.keywordBiological sciences,Drug discovery,Cheminformatics,en
dc.relation.page70
dc.identifier.doi10.6342/NTU202200783
dc.rights.note同意授權(全球公開)
dc.date.accepted2022-05-20
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
dc.date.embargo-lift2022-07-05-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
U0001-1905202217054200.pdf2.61 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved