運用機器學習與特徵選擇技術識別冠狀動脈粥狀硬化斑塊的生物標記組合發現

蔡祐琳; Yu-Ling Tsai

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90046

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	林永松	zh_TW
dc.contributor.advisor	Frank Yeong-Sung Lin	en
dc.contributor.author	蔡祐琳	zh_TW
dc.contributor.author	Yu-Ling Tsai	en
dc.date.accessioned	2023-09-22T17:11:31Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-09	-
dc.identifier.citation	C. Shao, J. Wang, J. Tian, and Y.-d. Tang, Coronary Artery Disease: From Mechanism to Clinical Practice. Springer Singapore, 2020. World Health Organization, “Cardiovascular diseases (CVDs). fact sheet-reviewed.” https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). Accessed: 2023-07-04. J. Palasubramaniam, X. Wang, and K. Peter, “Myocardial infarction—from atherosclerosis to thrombosis,” Arteriosclerosis, Thrombosis, and Vascular Biology, vol. 39, no. 8, pp. e176–e185, 2019. X. Shi, J. Gao, Q. Lv, H. Cai, F. Wang, R. Ye, and X. Liu, “Calcification in atherosclerotic plaque vulnerability: Friend or foe?,” Frontiers in Physiology, vol. 11, p. 56, 2020. C. Stefanadis, C. Antoniou, D. Tsiachris, and P. Pietri, “Coronary atherosclerotic vulnerable plaque: Current perspectives,” Journal of the American Heart Association, vol. 6, no. 3, p. e005543, 2017. T. S. Clemmensen, N. R. Holm, H. Eiskjær, B. B. Løgstrup, E. H. Christiansen, J. Dijkstra, T. Ørhøj Barkholt, C. J. Terkelsen, M. Maeng, and S. H. Poulsen, “Layered fibrotic plaques are the predominant component in cardiac allograft vasculopathy,” JACC: Cardiovascular Imaging, vol. 10, no. 7, pp. 773–784, 2017. E. Stakhneva, I. Meshcheryakova, E. Demidov, K. Starostin, S. Peltek, V. Shramko, M. Voevoda, and Y. Ragino, “Proteomic study of stable and unstable atherosclerotic plaques,” Atherosclerosis, vol. 287, p. e286, 2019. M.-H. Bao, R.-Q. Zhang, X.-S. Huang, J. Zhou, Z. Guo, B.-F. Xu, and R. Liu, “Transcriptomic and proteomic profiling of human stable and unstable carotid atherosclerotic plaques,” Frontiers in Genetics, vol. 12, no. 755507, 2021. M. Mann, C. Kumar, W.-F. Zeng, and M. T. Strauss, “Artificial intelligence for proteomics and biomarker discovery,” Cell Systems, vol. 12, no. 8, pp. 759–770, 2021. M. Dermit, T. M. Peters-Clarke, E. Shishkova, and J. G. Meyer, “Peptide correlation analysis (PeCorA) reveals differential proteoform regulation,” Journal of Proteome Research, vol. 20, no. 4, pp. 1972–1980, 2020. A. L. Swan, A. Mobasheri, D. Allaway, S. Liddell, and J. Bacardit, “Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology,” OMICS: A Journal of Integrative Biology, vol. 17, no. 12, pp. 595–610, 2013. B. D. W. Group., “Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework,” Clinical Pharmacology & Therapeutics, vol. 69, no. 3, pp. 89–95, 2001. Y. Xie, W.-Y. Meng, R.-Z. Li, Y.-W. Wang, X. Qian, C. Chan, Z.-F. Yu, X.-X. Fan, H.-D. Pan, C. Xie, Q.-B. Wu, P.-Y. Yan, L. Liu, Y.-J. Tang, X.-J. Yao, M.-F. Wang, and E. L.-H. Leung, “Early lung cancer diagnostic biomarker discovery by machine learning methods,” Translational Oncology, vol. 14, no. 1, p. 100907, 2021. C.-H. Chang, C.-H. Lin, and H.-Y. Lane, “Machine learning and novel biomarkers for the diagnosis of alzheimer＇s disease,” International Journal of Molecular Sciences, vol. 22, no. 5, p. 2761, 2021. E. S. Nakayasu, M. Gritsenko, P. D. Piehowski, Y. Gao, D. J. Orton, A. A. Schepmoes, T. L. Fillmore, B. I. Frohnert, M. Rewers, J. P. Krischer, C. Ansong, A. M. Suchy-Dicey, C. Evans-Molina, W.-J. Qian, B.-J. M. Webb-Robertson, and T. O. Metz, “Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation,” Nature Protocols, vol. 16, no. 8, pp. 3737–3760, 2021. M. P. Lam, P. Ping, and E. Murphy, “Proteomics research in cardiovascular medicine and biomarker discovery,” Journal of the American College of Cardiology, vol. 68, no. 25, pp. 2819–2830, 2016. D. H. Mazumder and R. Veilumuthu, “An enhanced feature selection filter for classification of microarray cancer data,” ETRI Journal, vol. 41, no. 3, pp. 358–370, 2019. F. Yuan, Y.-H. Zhang, X.-Y. Kong, and Y.-D. Cai, “Identification of candidate genes related to inflammatory bowel disease using minimum redundancy maximum relevance, incremental feature selection, and the Shortest-Path approach,” BioMed Research International, vol. 2017, p. 5741948, 2017. L. Gao, M. Ye, X. Lu, and D. Huang, “Hybrid method based on information gain and support vector machine for gene selection in cancer classification,” Genomics, Proteomics Bioinformatics, vol. 15, no. 6, pp. 389–395, 2017. J. C. Ang, A. Mirzal, H. Haron, and H. N. A. Hamed, “Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, no. 5, pp. 971–989, 2016. Z. Shi, B. Wen, Q. Gao, and B. Zhang, “Feature selection methods for protein biomarker discovery from proteomics or multiomics data,” Molecular Cellular Proteomics, vol. 20, p. 100083, 2021. X. Lin, X. Zhang, and X. Xu, “Efficient classification of hot spots and hub protein interfaces by recursive feature elimination and gradient boosting,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 17, no. 5, pp. 1525–1534, 2020. M. Ram, A. Najafi, and M. T. Shakeri, “Classification and biomarker genes selection for cancer gene expression data using random forest,” Iranian journal of pathology, vol. 12, no. 4, pp. 339–347, 2017. V. Bolón-Canedo and A. Alonso-Betanzos, “Ensembles for feature selection: A review and future trends,” Information Fusion, vol. 52, pp. 1–12, 2019. N. Amin, A. McGrath, and Y.-P. P. Chen, “FexRNA: Exploratory data analysis and feature selection of non-coding RNA,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 18, no. 6, pp. 2795–2801, 2021. A. Shahrjooihaghighi, H. Frigui, X. Zhang, X. Wei, B. Shi, and A. Trabelsi, “An ensemble feature selection method for biomarker discovery,” in 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 416–421, December 2017. D. Zafeiris, S. Rutella, and G. R. Ball, “An artificial neural network integrated pipeline for biomarker discovery using alzheimer’s disease as a case study,” Computational and Structural Biotechnology Journal, vol. 16, pp. 77–87, 2018. J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” Journal of Bioinformatics and Computational Biology, vol. 3, no. 2, pp. 185–205, 2005. T. Daniya, M. Geetha, and K. S. Kumar, “Classification and regression trees with gini index,” Advances in Mathematics, vol. 10, pp. 8237–8247, 2020. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 785–794, Association for Computing Machinery, August 2016. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS, pp. 4768–4777, Curran Associates Inc., December 2017. P. Baldi, “Autoencoders, unsupervised learning and deep architectures,” in Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop, vol. 27 of UTLW, pp. 37–50, JMLR.org, July 2011. L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90046	-
dc.description.abstract	冠狀動脈粥樣硬化是導致多種心血管疾病的主要因素，疾病早期通常不會出現症狀，然而後續可能導致冠狀動脈疾病、中風和心肌梗塞。冠狀動脈粥樣硬化斑塊的積累是導致上述狀況的主要原因，依據斑塊的種類，需要的醫療處置也不同。目前血管斑塊種類的判斷，需透過光學相干斷層掃描 (OCT) 及血管內超聲波 (IVUS) 等侵入性的影像檢查。為開發非侵入性且高專一性及靈敏度的血管斑塊檢測方法，在本研究中，我們採用機器學習和特徵選擇技術，篩選血漿蛋白體中，可用於診斷冠狀動脈粥樣硬化斑塊類型的生物標記 (Biomarker) 組合，此生物標記組合，未來可應用於冠狀動脈粥樣硬化患者的臨床診斷、管理和藥物規劃。　　我們利用兩組經臨床醫師以 OCT 檢測後，完成註記的血漿蛋白體胜肽 (Peptide) 資料集，進行資料分析。資料集進一步分為鈣化斑塊和脆弱性斑塊資料集。採用的特徵選擇方法包括t檢定、信息增益 (Information Gain)、最小冗餘最大相關性 (mRMR)、基尼指數 (Gini Index)、嵌入式 XGBoost 和 SHAP，以選出一組有效的特徵。這些特徵應用於斑塊分類模型，例如隨機森林和 XGBoost，並根據 Accuracy, F1 分數, AUC, Sensitivity 和 Specificity 來評估模型性能。　　我們的結果顯示，機器學習和特徵選擇技術結合應用顯著提高了斑塊的分類性能。此外，配合未來的生物晶片開發需求，我們的方法選出有限個特徵，依此訓練出的分類模型具有優異的性能。本研究可為未來冠狀動脈粥樣硬化斑塊領域識別研究奠定基礎，協助生物晶片的開發和臨床應用，從而縮短診斷時間並改善患者預後。	zh_TW
dc.description.abstract	Coronary atherosclerosis is a principal causative factor of a variety of cardiovascular diseases and often exhibits no early symptoms. The onset of this condition can lead to coronary artery disease, stroke, and potentially fatal myocardial infarctions. The accumulation of atherosclerotic plaques constitutes a significant cause of coronary atherosclerosis, while each type of plaques necessitates a distinct treatment. Classifying different types of plaques requires invasive imaging techniques such as Optical Coherence Tomography (OCT) and Intravascular Ultrasound (IVUS). To develop a non-invasive, highly sensitive, and specific method for detecting vascular plaques with high sensitivity, in this study, we employed machine learning and feature selection techniques to identify a combination of biomarkers from plasma proteomics that can be used for diagnosing different types of coronary atherosclerotic plaques. This biomarker combination can be applied in clinical diagnosis, management, and drug planning for patients with coronary atherosclerosis. We utilized two datasets of plasma peptide annotations, which were annotated by clinical physicians using OCT scans. The datasets were further divided into datasets for calcified plaques and vulnerable plaques for data analysis. The feature selection methods are employed, including t-tests, information gain, minimum Redundancy Maximum Relevance (mRMR), Gini Index, embedded XGBoost, and SHAP (SHapley Additive exPlanations), aid in generating an efficient set of features. These features are utilized in plaque classification models such as Random Forests and XGBoost, with their performance being gauged in terms of accuracy, F1 score, AUC, sensitivity, and specificity. Our results demonstrate that the integrated application of machine learning and feature selection techniques significantly improves plaque classification performance. Furthermore, our approach selects a limited number of features, and the generated model can take into account the number of features and the overall performance of the classification model, meet future biochip development needs. This study lays a solid foundation for future biomarker identification research in the domain of coronary atherosclerotic plaques. It potentially offers invaluable tools for the development of biochips and clinical applications, thereby shortening diagnosis time and improving patient prognosis.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:11:31Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:11:31Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 iii Abstract v Contents vii List of Figures x List of Tables xi Chapter 1 Introduction 1 1.1 Background Overview 1 1.2 Motivation 3 1.3 Objectives 5 1.4 Thesis Organization 6 Chapter 2 Literature Review 7 2.1 Biomarker Discovery 7 2.1.1 Overview of Biomarker 7 2.1.2 Biomarker Discovery 8 2.2 Machine Learning in Biomarker Discovery 9 2.2.1 Overview of Feature Selection 9 2.2.2 Filter Method 10 2.2.3 Wrapper Method 11 2.2.4 Embedded Method 12 2.2.5 Ensemble Method and Neural Network 13 2.3 Summary 14 Chapter 3 Method 15 3.1 Workflow of Biochip Development 15 3.2 Framework of Proposed Method 18 3.2.1 Framework of Using Statistic Feature Selection Method 18 3.2.2 Framework of Using Feature Importance Method 19 3.3 Feature Selection Methods 20 3.4 Feature Importance Methods 25 3.4.1 Gini Index (Embedded Random Forest) 26 3.4.2 Embedded XGBoost 27 3.4.3 SHAP 27 3.5 Classification Model 28 3.5.1 Neural Network 29 3.5.2 Random Forest 31 3.5.3 XGBoost 32 3.6 Evaluation metrics 34 Chapter 4 Experimental Results and Discussion 38 4.1 Peptide Dataset 38 4.2 Classification Baseline 41 4.3 Feature Selection Using Neural Network and Feature Importance Method 43 4.4 Feature Selection Using Statistic Feature Selection and Machine Learning Method 44 4.5 Feature Selection Using Feature Importance and Machine Learning Method 48 4.6 Discussion 50 4.6.1 ROC Curves 50 4.6.2 The Number of The Selected Features and AUC 52 4.6.3 The Distribution of the Selected Features 54 4.6.4 The Intensity of the Selected Features 57 4.6.5 Limitations 60 Chapter 5 Conclusions and Future Work 63 5.1 Conclusions 63 5.2 Future Work 64 References 67	-
dc.language.iso	en	-
dc.subject	特徵選擇	zh_TW
dc.subject	蛋白質體學	zh_TW
dc.subject	冠狀動脈粥狀硬化斑塊	zh_TW
dc.subject	生物標記	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	Machine Learning	en
dc.subject	Coronary Atherosclerotic Plaque	en
dc.subject	Proteomics	en
dc.subject	Biomarker Discovery	en
dc.subject	Feature Selection	en
dc.title	運用機器學習與特徵選擇技術識別冠狀動脈粥狀硬化斑塊的生物標記組合發現	zh_TW
dc.title	Using Machine Learning and Feature Selection Technologies for Biomarker Combination Discovery in Coronary Atherosclerotic Plaque Identification	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	廖辰中	zh_TW
dc.contributor.coadvisor	Chen-Chung Liao	en
dc.contributor.oralexamcommittee	鍾順平;呂俊賢;呂東武	zh_TW
dc.contributor.oralexamcommittee	Shun-Ping Chung;Chun-Hsien Lu;Tung-Wu Lu	en
dc.subject.keyword	機器學習,特徵選擇,生物標記,冠狀動脈粥狀硬化斑塊,蛋白質體學,	zh_TW
dc.subject.keyword	Machine Learning,Feature Selection,Biomarker Discovery,Coronary Atherosclerotic Plaque,Proteomics,	en
dc.relation.page	72	-
dc.identifier.doi	10.6342/NTU202303195	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-12	-
dc.contributor.author-college	管理學院	-
dc.contributor.author-dept	資訊管理學系	-
dc.date.embargo-lift	2028-08-09	-
Appears in Collections:	資訊管理學系

Files in This Item:

File	Size	Format
ntu-111-2.pdf Restricted Access	1.74 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets