使用特徵選取於機器學習來改進膽囊疾病之預測

黃冠傑; Guan-Jie Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99783

標題:	使用特徵選取於機器學習來改進膽囊疾病之預測 Improving the Prediction of Gallbladder Disease Using Feature Selection in Machine Learning
作者:	黃冠傑 Guan-Jie Huang
指導教授:	張智星 Jyh-Shing Jang
關鍵字:	膽囊疾病,特徵選取,圖嵌入,DeepWalk,Node2Vec,LSTM, gallbladder disease,feature selection,graph embedding,DeepWalk,Node2Vec,LSTM,
出版年 :	2025
學位:	碩士
摘要:	本研究針對醫療問卷與健康檢查資料在膽囊疾病預測上的高維度、強共線與異質性問題，提出一個結合圖嵌入（graph embedding）與雙層特徵選取的流程，以在維持或提升預測表現的同時，降低模型訓練成本。方法上，首先以變項間的相關性構成特徵關聯圖，並採用多種圖嵌入技術（如 DeepWalk／Node2Vec／LINE／SDNE）將每一特徵映射為稠密向量，以保留局部與高階結構關係；接著設計雙層特徵選取：先取得穩定的中心變項，再透過分群與群內評估，選出最終特徵子集。於模型端，我們以 LSTM 處理具時間尺度的問卷／檢查資料，並與傳統過濾法、包裝法與 XGBoost 之嵌入式選擇進行比較。實驗結果顯示，圖嵌入驅動的特徵選取能有效降低維度與訓練時間，並在多數設置下帶來與全特徵相當或更佳的預測表現；同時，嵌入所保留的結構資訊有助於解釋特徵群之間與預測目標的關聯。本研究提供一套可擴充至其他醫療高維資料情境的特徵工程途徑。 This study addresses the challenges of high dimensionality, strong collinearity, and heterogeneity in medical questionnaire and health examination data for predicting gallbladder disease. We propose a pipeline that combines graph embeddings with two-stage feature selection to reduce training cost while maintaining or improving predictive performance. First, we construct a feature association graph from inter-variable correlations and apply multiple graph-embedding techniques (e.g., DeepWalk, Node2Vec, LINE, SDNE) to map each feature to a dense vector that preserves both local and higher-order structural relationships. We then design a two-stage selection procedure: (i) obtain stable central variables, and (ii) perform clustering and within-cluster evaluation to determine the final feature subset. On the modeling side, we use LSTM to handle questionnaire/exam data with temporal scales and compare our approach against traditional filter and wrapper methods, as well as embedded selection via XGBoost. Experimental results show that embedding-driven feature selection effectively reduces dimensionality and training time and, in most settings, achieves predictive performance comparable to or better than using all features. Moreover, the structural information retained by embeddings facilitates interpretation of relationships between feature groups and the prediction target. The proposed feature-engineering pipeline is readily extensible to other high-dimensional medical data scenarios.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99783
DOI:	10.6342/NTU202504358
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2025-09-18
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	5.6 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。