機器學習與深度學習於心臟病發作預測之比較分析

莫子佩; Muhammad Aulia Rahman

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101778

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖世偉	zh_TW
dc.contributor.advisor	Shih-Wei Liao	en
dc.contributor.author	莫子佩	zh_TW
dc.contributor.author	Muhammad Aulia Rahman	en
dc.date.accessioned	2026-03-04T16:29:36Z	-
dc.date.available	2026-03-05	-
dc.date.copyright	2026-03-04	-
dc.date.issued	2026	-
dc.date.submitted	2026-02-07	-
dc.identifier.citation	Arik, S. Ö., & Pfister, T. (2021). TabNet: Attentive interpretable tabular learning. Austin, P. C. (2007). A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Statistics in Medicine, 26(15), 2937–2957. Beam, A. L., & Kohane, I. S. (2018). Big data and machine learning in health care. JAMA, 319(13), 1317–1318. https://doi.org/10.1001/jama.2018.3572 Benjamin, E. J., et al. (2019). Heart disease and stroke statistics. Circulation, 139(10), e56–e528. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018 Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press. D’Agostino, R. B., et al. (2008). General cardiovascular risk profile for use in primary care. Circulation, 117(6), 743–753. Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(2), 113–127. Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930. Deshpande, T. (2021). Heart failure prediction CV score 90% – 5 models. Kaggle. https://www.kaggle.com/code/tanmay111999/heart-failure-prediction-cv-score-90-5-models/notebook Dritsas, E., & Trigka, M. (2024). Application of deep learning for heart attack prediction with explainable artificial intelligence. Computers, 13(10), 244. https://doi.org/10.3390/computers13100244 Esteva, A., Robicquet, A., Ramsundar, B., et al. (2019). A guide to deep learning in healthcare. Nature Medicine, 25(1), 24–29. https://doi.org/10.1038/s41591-018-0316-z Fedesoriano. (2021, September). Heart Failure Prediction Dataset. Kaggle. https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451 Goldstein, B. A., Navar, A. M., Pencina, M. J., & Ioannidis, J. P. A. (2017). Opportunities and challenges in developing risk prediction models with electronic health records data. Journal of the American Medical Informatics Association, 24(1), 198–208. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7 He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239 Hicks, S. A., Isaksen, J. L., Thambawita, V., Riegler, M. A., Halvorsen, P., & Hammer, H. L. (2022). On evaluation metrics for medical machine learning. Scientific Reports, 12, 5979. https://doi.org/10.1038/s41598-022-09961-9 Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2016). A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on Machine Learning, 448–456. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in R (2nd ed.). Springer. https://doi.org/10.1007/978-1-0716-1418-1 Kavakiotis, I., et al. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104–116. Krittanawong, C., Johnson, K. W., Rosenson, R. S., Wang, Z., Aydar, M., & Baber, U. (2017). Deep learning for cardiovascular medicine. Nature Reviews Cardiology, 14(1), 1–11. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer. https://doi.org/10.1007/978-1-4614-6849-3 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. Li, X., et al. (2023). Development and comparison of machine learning-based models for predicting heart failure after acute myocardial infarction. BMC Medical Informatics and Decision Making, 23, Article 240. https://doi.org/10.1186/s12911-023-02240-1 Mensah, G. A., et al. (2017). Decline in cardiovascular mortality. Circulation Research, 120(2), 366–380. Mitchell, T. M. (1997). Machine learning. McGraw-Hill. Nampalli, S. (2025). Heart failure prediction (88%). Kaggle. https://www.kaggle.com/code/srinivas26k/heart-failure-prediction-88 Nilashi, M., et al. (2019). A knowledge-based system for heart disease diagnosis. Computers & Industrial Engineering, 128, 634–646. Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future big data, machine learning, and clinical medicine. The New England Journal of Medicine, 375(13), 1216–1219. Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. The New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259 Rehman, F. (2024). Heart disease prediction using 9 models. Kaggle. https://www.kaggle.com/code/fahadrehman07/heart-disease-prediction-using-9-models/notebook Saito, T., & Rehmsmeier, M. (2015). The precision–recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432 Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press. Shickel, B., et al. (2018). Deep EHR: A survey of deep learning for electronic health records. IEEE Journal of Biomedical and Health Informatics, 22(5), 1589–1604. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7 World Health Organization. (2023). Cardiovascular diseases (CVDs). WHO. Yusuf, S., et al. (2020). Modifiable risk factors associated with myocardial infarction. The Lancet, 395(10226), 795–808. Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: Principles and techniques for data scientists. O’Reilly Media.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/101778	-
dc.description.abstract	心血管疾病仍然是全球主要的死亡原因，其中心肌梗塞（心臟病發作）因其突發性及可能致命的特性，對臨床及公共衛生構成特別嚴重的挑戰。因此，準確且及時地預測心肌梗塞風險對於有效的預防與早期介入至關重要。近年來，機器學習（ML）與深度學習（DL）技術已逐漸被應用於心血管風險預測；然而，針對其相對效能，尤其是在結構化臨床資料上的應用，目前仍缺乏共識。本研究針對心肌梗塞預測，對多種機器學習及深度學習模型進行系統性的比較分析，所使用的結構化臨床資料集包含918位病患的基本人口學資料、臨床檢查資訊及生活型態特徵。傳統的機器學習模型包括羅吉斯迴歸（Logistic Regression）、支持向量機（Support Vector Machines）、隨機森林（Random Forest）、梯度提升（Gradient Boosting）及XGBoost，與深度學習模型，包括多層感知器（MLP）、基於Keras的深度神經網路（DNN）以及TabNet，進行比較。所有模型皆在相同實驗條件下訓練與評估，採用統一的資料前處理流程、分層的訓練–測試集劃分，以及標準化的評估指標。模型效能評估指標包括準確率（Accuracy）、精確率（Precision）、召回率（Recall）、F1分數以及受試者工作特徵曲線下面積（ROC-AUC），其中特別強調召回率及ROC-AUC，因其在臨床上對於降低漏診具有高度相關性。實驗結果顯示，傳統機器學習模型，特別是集成方法，在此結構化資料集上的表現普遍優於深度學習模型。隨機森林取得最高的整體準確率（90.22%）與F1分數（91.35%），同時維持高召回率（93.14%）及具有競爭力的ROC-AUC（0.933），顯示其能有效辨識高風險病患。使用RBF核的支持向量機則取得最高ROC-AUC（0.949），反映出其在辨別高風險與非高風險病患之間的能力最佳。相比之下，深度學習模型表現雖然具有競爭力，但略低於傳統機器學習，其中Keras DNN達到89.67%的準確率與0.949 ROC-AUC，MLP稍低，而TabNet表現較弱（82.61%準確率、0.912 ROC-AUC），可能受限於資料量較小以及特徵豐富度不足。這些結果顯示，對於中等規模的結構化臨床資料，集成機器學習模型比深度學習模型更有效且實用。除了預測效能之外，像羅吉斯迴歸與隨機森林這類具可解釋性的模型，能讓臨床人員理解各特徵對預測結果的貢獻，對臨床採用至關重要。總體而言，本研究強調模型選擇應與資料特性及臨床需求相匹配，提供實證基礎的建議，以協助開發可靠的心肌梗塞預測系統，應用於醫療決策支援。	zh_TW
dc.description.abstract	Cardiovascular diseases remain a leading cause of global mortality, with heart attacks presenting a particularly severe clinical and public health challenge due to their sudden onset and potentially fatal outcomes. Accurate and timely prediction of heart attack risk is therefore essential for effective prevention and early intervention. In recent years, both machine learning (ML) and deep learning (DL) techniques have been increasingly applied to cardiovascular risk prediction; however, there remains limited consensus regarding their relative effectiveness, particularly when applied to structured clinical datasets. This study presents a systematic comparative analysis of selected machine learning and deep learning models for heart attack prediction using a structured clinical dataset comprising 918 patient records with demographic, clinical, and lifestyle-related features. Traditional ML models including Logistic Regression, Support Vector Machines, Random Forest, Gradient Boosting, and XGBoost were compared with DL approaches, namely a multilayer perceptron (MLP), a Keras-based deep neural network (DNN), and TabNet. All models were trained and evaluated under identical experimental conditions, employing a unified preprocessing pipeline, stratified train–test split, and standardized evaluation metrics. Model performance was assessed using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (ROC-AUC), with particular emphasis on recall and ROC-AUC given their clinical relevance in minimizing missed diagnoses. The experimental results demonstrate that traditional machine learning models, particularly ensemble-based methods, generally outperformed deep learning approaches on this structured dataset. Random Forest achieved the highest overall accuracy (90.22%) and F1-score (91.35%), while maintaining strong recall (93.14%) and a competitive ROC-AUC (0.933), demonstrating its ability to correctly identify at-risk patients. SVM with an RBF kernel attained the highest ROC-AUC (0.949), reflecting superior discriminative capability between patients at risk and those not at risk. In contrast, deep learning models exhibited comparable but slightly lower performance, with the Keras DNN achieving 89.67% accuracy and 0.949 ROC-AUC, the MLP slightly lower, and TabNet performing substantially weaker (82.61% accuracy, 0.912 ROC-AUC), likely due to the relatively small dataset size and limited feature richness. These findings suggest that, for structured tabular clinical data of moderate size, ensemble machine learning models offer a more effective and practical solution than deep learning approaches. Beyond predictive performance, interpretable models such as Logistic Regression and Random Forest allow clinicians to understand how individual features contribute to predictions, which is crucial for clinical adoption. Overall, this study underscores the importance of aligning model selection with both data characteristics and clinical requirements, providing evidence-based guidance for the development of reliable heart attack prediction systems in healthcare decision support contexts.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2026-03-04T16:29:36Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2026-03-04T16:29:36Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Master’s Thesis Acceptance Certificate i Acknowledgements ii 致謝 iii 摘要 iv Abstract vi Table of Contents viii List of Figures xi List of Table xi Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Problem Statement 2 1.3 Research Objectives 3 1.4 Research Questions 4 1.5 Scope of the Study 5 1.6 Significance of the Study 5 Chapter 2 Literature Review 6 2.1 Overview of Heart Attack and Cardiovascular Risk Factors 6 2.2 Machine Learning Techniques in Heart Attack Prediction 7 2.2.1 Logistic Regression 7 2.2.2 Decision Trees and Ensemble Methods 7 2.2.3 Support Vector Machines 8 2.3 Deep Learning Approaches in Heart Attack Prediction 8 2.3.1 Artificial Neural Networks 9 2.3.2 Deep Neural Networks 9 2.4 Evaluation Metrics for Medical Prediction Models 10 2.5 Comparative Studies of Machine Learning and Deep Learning Models 11 Chapter 3 Research Methodology 12 3.1 Overview of the Research Methodology 12 3.2 Dataset Description 13 3.3 Data Preprocessing and Feature Engineering 14 3.3.1 Feature Separation and Data Preparation 14 3.3.2 Feature Scaling and Encoding 15 3.4 Data Splitting Strategy 16 3.5 Machine Learning Models 17 3.5.1 Logistic Regression 18 3.5.2 Random Forest Classifier 20 3.5.3 Gradient Boosting Classifier 22 3.5.4 Extreme Gradient Boosting (XGBoost) 25 3.5.5 Support Vector Machine (SVM) 27 3.6 Deep Learning Models 31 3.6.1 Multilayer Perceptron (MLP) 31 3.6.2 Keras Deep Neural Network 34 3.6.3 TabNet 38 3.7 Model Evaluation Metrics 42 Chapter 4 Results and Discussion 45 4.1 Introduction 45 4.2 Overall Model Performance 46 4.3 Performance of Machine Learning Models 48 4.3.1 Logistic Regression 48 4.3.2 Random Forest 50 4.3.3 Gradient Boosting 52 4.3.4 XGBoost 53 4.3.5 Support Vector Machine (RBF Kernel) 55 4.4 Performance of Deep Learning Models 57 4.4.1 MLP Neural Network 57 4.4.2 Keras Deep Neural Network 59 4.4.3 TabNet 61 4.5 Comparative Analysis of Machine Learning and Deep Learning Models 63 4.6 Clinical Implications 66 4.7 Discussion 67 Chapter 5 Conclusions and Recommendations 68 5.1 Summary of Findings 68 5.2 Limitations of the Study 70 5.3 Future Study Recommendations 71 5.4 Conclusions 72 References 74	-
dc.language.iso	en	-
dc.subject	心肌梗塞預測	-
dc.subject	心血管疾病	-
dc.subject	機器學習	-
dc.subject	深度學習	-
dc.subject	集成學習	-
dc.subject	臨床決策支援	-
dc.subject	ROC-AUC	-
dc.subject	風險預測	-
dc.subject	Heart Attack Prediction	-
dc.subject	Cardiovascular Disease	-
dc.subject	Machine Learning	-
dc.subject	Deep Learning	-
dc.subject	Ensemble Learning	-
dc.subject	Clinical Decision Support	-
dc.subject	ROC-AUC	-
dc.subject	Risk Prediction	-
dc.title	機器學習與深度學習於心臟病發作預測之比較分析	zh_TW
dc.title	Comparative Analysis of Machine Learning and Deep Learning Approaches for Heart Attack Prediction	en
dc.type	Thesis	-
dc.date.schoolyear	114-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	高振滄;葉春超;李逸元;周克行	zh_TW
dc.contributor.oralexamcommittee	George J. Gau;Chun-Chao Yeh;Yi-Yuan Lee;Ke-Hsin Chou	en
dc.subject.keyword	心肌梗塞預測,心血管疾病機器學習深度學習集成學習臨床決策支援ROC-AUC風險預測	zh_TW
dc.subject.keyword	Heart Attack Prediction,Cardiovascular DiseaseMachine LearningDeep LearningEnsemble LearningClinical Decision SupportROC-AUCRisk Prediction	en
dc.relation.page	78	-
dc.identifier.doi	10.6342/NTU202600221	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2026-02-09	-
dc.contributor.author-college	共同教育中心	-
dc.contributor.author-dept	智慧醫療與健康資訊碩士學位學程	-
dc.date.embargo-lift	2031-01-21	-
顯示於系所單位：	智慧醫療與健康資訊碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-114-1.pdf 未授權公開取用	2.41 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。