可解釋性的圖神經網路之於預測 P-醣蛋白受體

許洸誠; Kuang-Cheng Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96777

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曾宇鳳	zh_TW
dc.contributor.advisor	Yufeng Jane Tseng	en
dc.contributor.author	許洸誠	zh_TW
dc.contributor.author	Kuang-Cheng Hsu	en
dc.date.accessioned	2025-02-21T16:30:23Z	-
dc.date.available	2025-02-22	-
dc.date.copyright	2025-02-21	-
dc.date.issued	2024	-
dc.date.submitted	2025-01-06	-
dc.identifier.citation	[1] A. H. Schinkel, "P-Glycoprotein, a gatekeeper in the blood–brain barrier," Advanced drug delivery reviews, vol. 36, no. 2-3, pp. 179-194, 1999. [2] T. Gewering et al., "Tracing the substrate translocation mechanism in P- glycoprotein," Elife, vol. 12, p. RP90174, 2024. [3] A. B. Deore, J. R. Dhumane, R. Wagh, and R. Sonawane, "The stages of drug discovery and development process," Asian Journal of Pharmaceutical Research and Development, vol. 7, no. 6, pp. 62-67, 2019. [4] W. Yu and A. D. MacKerell, "Computer-aided drug design methods," Antibiotics: methods and protocols, pp. 85-106, 2017. [5] Z. Wang, Y. Chen, H. Liang, A. Bender, R. C. Glen, and A. Yan, "P-glycoprotein substrate models using support vector machines based on a comprehensive data set," Journal of chemical information and modeling, vol. 51, no. 6, pp. 1447-1456, 2011. [6] D. Li, L. Chen, Y. Li, S. Tian, H. Sun, and T. Hou, "ADMET evaluation in drug discovery. 13. Development of in silico prediction models for P-glycoprotein substrates," Molecular pharmaceutics, vol. 11, no. 3, pp. 716-726, 2014. [7] L. Mora Lagares, N. Minovski, and M. Novič, "Multiclass classifier for P- glycoprotein substrates, inhibitors, and non-active compounds," Molecules, vol. 24, no. 10, p. 2006, 2019. [8] P.-H. Wang, Y.-S. Tu, and Y. J. Tseng, "PgpRules: a decision tree based prediction server for P-glycoprotein substrates and inhibitors," Bioinformatics, vol. 35, no. 20, pp. 4193-4195, 2019. [9] C. Esposito, S. Wang, U. E. Lange, F. Oellien, and S. Riniker, "Combining machine learning and molecular dynamics to predict P-glycoprotein substrates," Journal of Chemical Information and Modeling, vol. 60, no. 10, pp. 4730-4749, 2020. [10] M. Scott and L. Su-In, "A unified approach to interpreting model predictions," Advances in neural information processing systems, vol. 30, pp. 4765-4774, 2017. [11] G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal processing magazine, vol. 29, no. 6, pp. 82-97, 2012. [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, 2012. [13] T. N. Kipf and M. Welling, "Semi-supervised classification with graph convolutional networks," arXiv preprint arXiv:1609.02907, 2016. [14] T.-Z. Long et al., "Structural analysis and prediction of hematotoxicity using deep learning approaches," Journal of Chemical Information and Modeling, vol. 63, no. 1, pp. 111-125, 2022. [15] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929. [16] D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, "Smoothgrad: removing noise by adding noise," arXiv preprint arXiv:1706.03825, 2017. [17] M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for deep networks," in International conference on machine learning, 2017: PMLR, pp. 3319-3328. 47 [18] A. Vaswani, "Attention is all you need," Advances in Neural Information Processing Systems, 2017. [19] M. T. Ribeiro, S. Singh, and C. Guestrin, "" Why should i trust you?" Explaining the predictions of any classifier," in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135-1144. [20] G. P. Wellawatte, H. A. Gandhi, A. Seshadri, and A. D. White, "A perspective on explanations of molecular prediction models," Journal of Chemical Theory and Computation, vol. 19, no. 8, pp. 2149-2160, 2023. [21] V. Poongavanam, N. Haider, and G. F. Ecker, "Fingerprint-based in silico models for the prediction of P-glycoprotein substrates and inhibitors," Bioorganic & medicinal chemistry, vol. 20, no. 18, pp. 5388-5395, 2012. [22] L. Mak et al., "Metrabase: a cheminformatics and bioinformatics database for small molecule transporter data analysis and (Q) SAR modeling," Journal of cheminformatics, vol. 7, pp. 1-12, 2015. [23] M. Li et al., "Dgl-lifesci: An open-source toolkit for deep learning on graphs in life science," ACS omega, vol. 6, no. 41, pp. 27233-27238, 2021. [24] Z. Xiong et al., "Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism," Journal of medicinal chemistry, vol. 63, no. 16, pp. 8749-8760, 2019. [25] C. W. Yap, "PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints," Journal of computational chemistry, vol. 32, no. 7, pp. 1466-1474, 2011. [26] S. Kim et al., "PubChem substance and compound databases," Nucleic acids research, vol. 44, no. D1, pp. D1202-D1213, 2016. [27] F. Pedregosa et al., "Scikit-learn: Machine learning in Python," the Journal of machine Learning research, vol. 12, pp. 2825-2830, 2011. [28] Z. Wu et al., "MoleculeNet: a benchmark for molecular machine learning," Chemical science, vol. 9, no. 2, pp. 513-530, 2018. [29] D. P. Kingma, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014. [30] L. Prechelt, "Early stopping-but when?," in Neural Networks: Tricks of the trade: Springer, 2002, pp. 55-69. [31] J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, and D. D. Cox, "Hyperopt: a python library for model selection and hyperparameter optimization," Computational Science & Discovery, vol. 8, no. 1, p. 014008, 2015. [32] S. Watanabe, "Tree-structured parzen estimator: Understanding its algorithm components and their roles for better empirical performance," arXiv preprint arXiv:2304.11127, 2023. [33] L. Van der Maaten and G. Hinton, "Visualizing data using t-SNE," Journal of machine learning research, vol. 9, no. 11, 2008. [34] N. Kokhlikyan et al., "Captum: A unified and generic model interpretability library for PyTorch. arXiv," arXiv preprint arXiv:2009.07896, 2020. [35] A. Seelig, "P-glycoprotein: one mechanism, many tasks and the consequences for pharmacotherapy of cancers," Frontiers in oncology, vol. 10, p. 576559, 2020. [36] S. Kumari, A. V. Carmona, A. K. Tiwari, and P. C. Trippier, "Amide bond bioisosteres: Strategies, synthesis, and successes," Journal of medicinal chemistry, vol. 63, no. 21, pp. 12290-12358, 2020. [37] A. A. Adzhubei, M. J. Sternberg, and A. A. Makarov, "Polyproline-II helix in 48 proteins: structure and function," Journal of molecular biology, vol. 425, no. 12, pp. 2100-2132, 2013. [38] L. Chen, Y. Li, H. Yu, L. Zhang, and T. Hou, "Computational models for predicting substrates or inhibitors of P-glycoprotein," Drug discovery today, vol. 17, no. 7- 8, pp. 343-351, 2012.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96777	-
dc.description.abstract	P-糖蛋白（P-gp），是ABC轉運蛋白家族的一員，存在於細胞膜上。它與各種外來物質結合，並主動將它們從細胞中排出，從而減少細胞對它們的吸收。與P-gp相互作用並隨後被排出的這些物質稱為P-gp受體。鑑於其在全身範圍內的廣泛分佈，包括血腦屏障，了解藥物是否屬於P-gp受體以及其穿越血腦屏障的能力對於藥物開發至關重要。在我們的研究中，我們開發了一套強大的方法，利用各種圖神經網路（GNN）模型在預測P-gp受體上實現卓越的準確性，同時保持可解釋性。具體來說，我們探索了三種GNN架構：圖卷積神經網路（GCN）、AttentiveFP和基於AttentiveFP的集成模型。我們的重點是預測給定的藥物分子是否為P-gp受體。我們整理了一個包含1995個藥物分子的資料集，其中包括1202個P-gp受體和793個P-gp非受體，以9:1的比例分為訓練集和測試集。我們的方法優於傳統的機器學習模型，達到了優秀的0.848的ROC-AUC和0.815的準確度。利用積分梯度方法，我們提取了與P-gp受體相關的6個關鍵子結構，這與現有文獻發現一致。此外，我們的方法不僅提供出色的預測結果，還為藥物開發人員提供透明的見解，使其成為P-gp受體預測之外的藥物開發和優化的寶貴工具。	zh_TW
dc.description.abstract	P-glycoprotein (P-gp), a member of the ABC transporter family, is located on cell membranes; it binds to various foreign substances and actively transports them out of cells, thereby reducing their absorption. These substances that interact with P-gp and are subsequently expelled are termed P-gp substrates. Given the extensive distribution of P-gp throughout the body, including in the blood‒brain barrier, determining whether a drug is a P-gp substrate and understanding its ability to cross the blood‒brain barrier are crucial steps in drug development. In our study, we developed a robust protocol leveraging various graph neural network (GNN) models to accurately predict P-gp substrates while maintaining interpretability. Specifically, we explored three GNN architectures: the graph convolutional neural network (GCN), AttentiveFP, and an ensemble model based on AttentiveFP. We focused on predicting whether a given drug molecule is a P-gp substrate. We curated a dataset comprising 1995 drug molecules, including 1202 P-gp substrates and 793 P-gp non-substrates, which was split into a training set and a testing set with a 9:1 ratio. Our approach outperformed traditional machine learning models, achieving an impressive receiver operating characteristic (ROC)-area under the curve (AUC) of 0.848 and an accuracy of 0.815. Using the integrated gradient method, we identified 6 critical substructures associated with P-gp substrates, consistent with previous studies findings. Our protocol achieved outstanding prediction results and can provide transparent insights for drug developers, making it a valuable tool for drug development and optimization beyond P-gp substrate prediction.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-21T16:30:23Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-21T16:30:23Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 ................................................................................................................................... i 中文摘要 .......................................................................................................................... ii ABSTRACT ................................................................................................................... iii CONTENTS .................................................................................................................... v LIST OF FIGURES ..................................................................................................... viii LIST OF TABLES .......................................................................................................... x GLOSSARY .................................................................................................................. xii Chapter 1 Introduction ............................................................................................ 1 1.1 P-glycoprotein .................................................................................................. 1 1.2 Graph Neural Networks.................................................................................... 3 1.3 Explainable AI .................................................................................................. 4 1.4 Aims.................................................................................................................. 6 Chapter 2 Materials and Methods .......................................................................... 7 2.1 Data preparation ............................................................................................... 7 2.1.1 Data collection .......................................................................................... 7 2.1.2 Featurizer .................................................................................................. 7 2.1.3 Descriptor ............................................................................................... 10 2.2 ML modeling ...................................................................................................11 2.3 GNN modeling ................................................................................................11 2.3.1 Graph Convolutional Neural Network ....................................................11 2.3.2 AttentiveFP ............................................................................................. 12 2.3.3 Ensemble model ..................................................................................... 13 2.3.4 Construction, training and hyperparameter tuning of GNN models ...... 14 2.4 t-Distributed stochastic neighbor embedding ................................................. 15 2.5 Integrated gradients ........................................................................................ 16 2.6 p value............................................................................................................. 17 Chapter 3 Results.................................................................................................... 19 3.1 GNN models outperform traditional ML models ........................................... 19 3.2 GNNs exhibit exceptional molecular representation ability........................... 20 3.3 Attention weights indicate the importance of each node to P-gp substrate activity 23 3.4 IG results......................................................................................................... 24 3.4.1 The IG of node features quantifies the importance of each encoded property to P-gp substrate activity.......................................................................... 24 3.4.2 Aggregation of IGs underscores the importance of each node to P-gp substrate activity ..................................................................................................... 27 3.4.3 The IG of PubChem fingerprints reveals important substructures that contribute to P-gp substrate activity. ...................................................................... 27 Chapter 4 Discussion .............................................................................................. 33 4.1 GNNs demonstrated better classification power and molecular representation than ML methods in previous studies ......................................................................... 33 4.2 The extracted key substructures, including hydrogen bond acceptors and amide substructures, are consistent with the patterns found in previous pharmaceutical studies. 34 4.3 IG and attention weights provide a flexible and efficient interpretation method for P-gp substrate prediction compared to past computational works........................ 40 Chapter 5 Conclusions............................................................................................ 46 REFERENCES ............................................................................................................. 47	-
dc.language.iso	en	-
dc.title	可解釋性的圖神經網路之於預測 P-醣蛋白受體	zh_TW
dc.title	Interpretable graph neural networks for predicting P-glycoprotein substrates	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王珮驊;蘇柏翰	zh_TW
dc.contributor.oralexamcommittee	Pei Hua Wang;Bo Han Su	en
dc.subject.keyword	P-糖蛋白,電腦輔助藥物設計,圖神經網路,可解釋性 AI,注意力機制,整合梯度,深度學習,	zh_TW
dc.subject.keyword	P-glycoprotein,Computer-aided Drug Design,Graph Neural Network,Explainable AI,Attention Mechanism,Integrated Gradient,Deep Learning,	en
dc.relation.page	49	-
dc.identifier.doi	10.6342/NTU202500022	-
dc.rights.note	未授權	-
dc.date.accepted	2025-01-06	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 目前未授權公開取用	2.03 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。