運用機器學習與特徵選擇技術識別冠狀動脈粥狀硬化斑塊的生物標記組合發現

蔡祐琳; Yu-Ling Tsai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90046

標題:	運用機器學習與特徵選擇技術識別冠狀動脈粥狀硬化斑塊的生物標記組合發現 Using Machine Learning and Feature Selection Technologies for Biomarker Combination Discovery in Coronary Atherosclerotic Plaque Identification
作者:	蔡祐琳 Yu-Ling Tsai
指導教授:	林永松 Frank Yeong-Sung Lin
關鍵字:	機器學習,特徵選擇,生物標記,冠狀動脈粥狀硬化斑塊,蛋白質體學, Machine Learning,Feature Selection,Biomarker Discovery,Coronary Atherosclerotic Plaque,Proteomics,
出版年 :	2023
學位:	碩士
摘要:	冠狀動脈粥樣硬化是導致多種心血管疾病的主要因素，疾病早期通常不會出現症狀，然而後續可能導致冠狀動脈疾病、中風和心肌梗塞。冠狀動脈粥樣硬化斑塊的積累是導致上述狀況的主要原因，依據斑塊的種類，需要的醫療處置也不同。目前血管斑塊種類的判斷，需透過光學相干斷層掃描 (OCT) 及血管內超聲波 (IVUS) 等侵入性的影像檢查。為開發非侵入性且高專一性及靈敏度的血管斑塊檢測方法，在本研究中，我們採用機器學習和特徵選擇技術，篩選血漿蛋白體中，可用於診斷冠狀動脈粥樣硬化斑塊類型的生物標記 (Biomarker) 組合，此生物標記組合，未來可應用於冠狀動脈粥樣硬化患者的臨床診斷、管理和藥物規劃。　　我們利用兩組經臨床醫師以 OCT 檢測後，完成註記的血漿蛋白體胜肽 (Peptide) 資料集，進行資料分析。資料集進一步分為鈣化斑塊和脆弱性斑塊資料集。採用的特徵選擇方法包括t檢定、信息增益 (Information Gain)、最小冗餘最大相關性 (mRMR)、基尼指數 (Gini Index)、嵌入式 XGBoost 和 SHAP，以選出一組有效的特徵。這些特徵應用於斑塊分類模型，例如隨機森林和 XGBoost，並根據 Accuracy, F1 分數, AUC, Sensitivity 和 Specificity 來評估模型性能。　　我們的結果顯示，機器學習和特徵選擇技術結合應用顯著提高了斑塊的分類性能。此外，配合未來的生物晶片開發需求，我們的方法選出有限個特徵，依此訓練出的分類模型具有優異的性能。本研究可為未來冠狀動脈粥樣硬化斑塊領域識別研究奠定基礎，協助生物晶片的開發和臨床應用，從而縮短診斷時間並改善患者預後。 Coronary atherosclerosis is a principal causative factor of a variety of cardiovascular diseases and often exhibits no early symptoms. The onset of this condition can lead to coronary artery disease, stroke, and potentially fatal myocardial infarctions. The accumulation of atherosclerotic plaques constitutes a significant cause of coronary atherosclerosis, while each type of plaques necessitates a distinct treatment. Classifying different types of plaques requires invasive imaging techniques such as Optical Coherence Tomography (OCT) and Intravascular Ultrasound (IVUS). To develop a non-invasive, highly sensitive, and specific method for detecting vascular plaques with high sensitivity, in this study, we employed machine learning and feature selection techniques to identify a combination of biomarkers from plasma proteomics that can be used for diagnosing different types of coronary atherosclerotic plaques. This biomarker combination can be applied in clinical diagnosis, management, and drug planning for patients with coronary atherosclerosis. We utilized two datasets of plasma peptide annotations, which were annotated by clinical physicians using OCT scans. The datasets were further divided into datasets for calcified plaques and vulnerable plaques for data analysis. The feature selection methods are employed, including t-tests, information gain, minimum Redundancy Maximum Relevance (mRMR), Gini Index, embedded XGBoost, and SHAP (SHapley Additive exPlanations), aid in generating an efficient set of features. These features are utilized in plaque classification models such as Random Forests and XGBoost, with their performance being gauged in terms of accuracy, F1 score, AUC, sensitivity, and specificity. Our results demonstrate that the integrated application of machine learning and feature selection techniques significantly improves plaque classification performance. Furthermore, our approach selects a limited number of features, and the generated model can take into account the number of features and the overall performance of the classification model, meet future biochip development needs. This study lays a solid foundation for future biomarker identification research in the domain of coronary atherosclerotic plaques. It potentially offers invaluable tools for the development of biochips and clinical applications, thereby shortening diagnosis time and improving patient prognosis.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90046
DOI:	10.6342/NTU202303195
全文授權:	同意授權(限校園內公開)
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 目前未授權公開取用	1.74 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。