基於機器學習的多中心菌血症預測模型

劉百賞; Pak-Sheung Lau

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95476

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	魏宏宇	zh_TW
dc.contributor.advisor	Hung-Yu Wei	en
dc.contributor.author	劉百賞	zh_TW
dc.contributor.author	Pak-Sheung Lau	en
dc.date.accessioned	2024-09-10T16:16:14Z	-
dc.date.available	2024-09-11	-
dc.date.copyright	2024-09-10	-
dc.date.issued	2023	-
dc.date.submitted	2023-12-14	-
dc.identifier.citation	L. M. Bush. Bacteremia - Infectious Diseases. [Online]. Available: https://www.msdmanuals.com/professional/infectious-diseases/biology-of-infectious-disease/bacteremia J. H. P. Dong Hyun Choi, Ki Jeong Hong, “Prediction of bacteremia at the emergency department during triage and disposition stages using machine learning models,” Am J Emerg Med, pp. 53:86–93, 2022 Mar. L. J. Vorvick. Blood culture. [Online]. Available: https://www.mountsinai.org/health-library/tests/blood-culture M. P. Weinstein, M. L. Towns, S. M. Quartey, S. Mirrett, L. G. Reimer, G. Parmigiani, and L. Barth Reller, “The Clinical Significance of Positive Blood Cultures in the 1990s: A Prospective Comprehensive Evaluation of the Microbiology, Epidemiology, and Outcome of Bacteremia and Fungemia in Adults,” Clinical Infectious Diseases, vol. 24, no. 4, pp. 584–602, 04 1997. [Online]. Available: https://doi.org/10.1093/clind/24.4.584 S. B. W. Nathan I Shapiro, Richard E Wolfe, “Who needs a blood culture? a prospectively derived and validated prediction rule,” The Journal of emergency medicine, pp. 35(3):255–264, 2008. R. S. Samiha Mohsen, James A Dickinson, “Update on the adverse effects of antimicrobial therapies in community practice,” Canadian family physician Medecin de famille canadien, vol. 66,9, pp. 651–659, 2020. D. Baur, B. P. Gladstone, F. Burkert, E. Carrara, F. Foschi, S. Döbele, and E. Tacconelli, “Effect of antibiotic stewardship on the incidence of infection and colonisation with antibiotic-resistant bacteria and clostridium difficile infection: a systematic review and meta-analysis,” The Lancet Infectious Diseases, vol. 17, no. 9, pp. 990–1001, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1473309917303250 D. A. Smith and S. M. Nehring, “Bacteremia,” Nov 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK441979/ “What is icd-10?” 2023. [Online]. Available: https://www.aapc.com/resources/what-is-icd-10 WHO, “Classification of diseases (icd),” 2022. [Online]. Available: https://www.who.int/standards/classifications/classification-of-diseases S. K. Kyoung Hwa Lee, Jae June Dong, “Prediction of bacteremia based on 12-year medical data using a machine learning approach: Effect of medical data by extraction time,” Diagnostics (Basel, Switzerland), no. 12(1), 102, 2022. C.-C. L. Vivian Goh, Yu-Jung Chou, “Predicting bacteremia among septic patients based on ed information by machine learning methods: A comparative study,” Diagnostics (Basel, Switzerland), no. 12(10), 2498, 2022. J. H. P. Dong Hyun Choi, Ki Jeong Hong, “Prediction of bacteremia at the emergency department during triage and disposition stages using machine learning models,” Infect Drug Resist, no. 53:86-93, 2022. M. B. Ebrahim Mahmoud, Mohammed Al Dhoayan, “Developing machine-learning prediction algorithm for bacteremia in admitted patients,” Infect Drug Resist, no. 14:757-765, 2021. K. M. Taku Harada, Yukinori Harada, “Bandemia as an early predictive marker of bacteremia: A retrospective cohort study,” Int J Environ Res Public Health, no. 19(4):2275, 2022. K. P. F.R.S., “X. on the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 50, no. 302, pp. 157–175, 1900. [Online]. Available: https://doi.org/10.1080/14786440009463897 R. L. Wasserstein and N. A. Lazar, “The asa statement on p-values: Context, process, and purpose,” The American Statistician, vol. 70, no. 2, pp. 129–133, 2016. [Online]. Available: https://doi.org/10.1080/00031305.2016.1154108 S. M. Lundberg and S. Lee, “A unified approach to interpreting model predictions,” CoRR, vol. abs/1705.07874, 2017. [Online]. Available: http://arxiv.org/abs/1705.07874 E. Keany, “Borutashap : A wrapper feature selection method which combines the boruta feature selection algorithm with shapley values,” Zenodo, 2020. [Online]. Available: https://doi.org/10.5281/zenodo.4247618 M. B. Kursa, A. Jankowski, and W. R. Rudnicki, “Boruta - a system for feature selection,” Fundam. Informaticae, vol. 101, pp. 271–285, 2010. manuel calzolari, “Shapicant,” GitHu, 2022. [Online]. Available: https://github.com/manuel-calzolari/shapicant/blob/main/docs/source/index.rst J. Verhaeghe, J. V. D. Donckt, F. Ongenae, and S. V. Hoecke, “Powershap: A power-full shapley feature selection method,” 2022. K. W. Bowyer, N. V. Chawla, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” CoRR, vol. abs/1106.1813, 2011. [Online]. Available: http://arxiv.org/abs/1106.1813 N. Thai-Nghe, Z. Gantner, and L. Schmidt-Thieme, “Cost-sensitive learning methods for imbalanced data,” in The 2010 International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1–8. L. van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008. [Online]. Available: http://jmlr.org/papers/v9/vandermaaten08a.html L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 10 2001. J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics 29, vol. no.5, p. 1189–1232, 2001. [Online]. Available: http://www.jstor.org/stable/2699986 T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” CoRR, vol. abs/1603.02754, 2016. [Online]. Available: http://arxiv.org/abs/1603.02754 A. V. Dorogush, A. Gulin, G. Gusev, N. Kazeev, L. O. Prokhorenkova, and A. Vorobev, “Fighting biases with dynamic boosting,” CoRR, vol. abs/1706.09516, 2017. [Online]. Available: http://arxiv.org/abs/1706.09516 G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Available: 6449f44a102fde848669bdd9eb6b76fa-Paper.pdf S. Ö. Arik and T. Pfister, “Tabnet: Attentive interpretable tabular learning,” CoRR, vol. abs/1908.07442, 2019. [Online]. Available: http://arxiv.org/abs/1908.07442 N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, and A. Smola, “Autogluon-tabular: Robust and accurate automl for structured data,” 2020. T. Sarkar, “Xbnet : An extremely boosted neural network,” CoRR, vol. abs/2106.05239, 2021. [Online]. Available: https://arxiv.org/abs/2106.05239 A. S. Michael Roimi, Ami Neuberger, “Early diagnosis of bloodstream infections in the intensive care unit using machine-learning algorithms,” Intensive Care Med, 2020. M. Romero, Y. Interian, T. D. Solberg, and G. Valdes, “Training deep learning models with small datasets,” CoRR, vol. abs/1912.06761, 2019. [Online]. Available: http://arxiv.org/abs/1912.06761 A. Kaplan, “Lifelong learning: Conclusions from a literature review.” Nov 2015. [Online]. Available: https://eric.ed.gov/?id=EJ1243611 G. M. van de Ven and A. S. Tolias, “Three scenarios for continual learning,” CoRR, vol. abs/1904.07734, 2019. [Online]. Available: http://arxiv.org/abs/1904.07734 A. Y. L. Cecilia S Lee, “Clinical applications of continual learning machine learning,” Lancet Digit Health, 2020. K. Rahmani, R. Thapa, P. Tsou, S. Casie Chetty, G. Barnes, C. Lam, and C. Foon Tso, “Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction,” International Journal of Medical Informatics, vol. 173, p. 104930, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1386505622002441 Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, ser. ICML ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 41–48. [Online]. Available: https://doi.org/10.1145/1553374.1553380 J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” CoRR, vol. abs/1610.05492, 2016. [Online]. Available: http://arxiv.org/abs/1610.05492	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95476	-
dc.description.abstract	菌血症為血液中存在細菌的狀況。菌血症的患者若缺乏及時的治療可能會進一步引發敗血症，嚴重者甚至導致休克。不幸的是，除了發燒以外，菌血症通常沒有任何其他明顯症狀，因此難以與其他病症區分。目前為止，血液培養是唯一診斷菌血症的方法。然而，此方法通常需要等待大約24-48小時才能得知結果，因此可能無法及時給予患者治療而導致後續更為嚴重的症狀。近年來，由於機器學習用於醫療數據的分析變得越來越普及，許多醫院使用機器學習來協助醫師進行診斷，讓病患得到更及時的治療。本論文的目標是透過機器學習來建立多中心之菌血症急診患者的預測模型，本研究使用了國立臺灣大學醫學院附設醫院 (NTUH) 的病人資料，其中涵蓋了台北、新竹和雲林分院自2009年至2018年的資料集。本論文主要分為三個部分，於每部都會對預測模型進行詳細的分析。本研究比較在靜態環境中機器學習模型的預測能力，以及探討模型更新的方法。在第一部分中，本研究合併了三個數據集來訓練基準模型，然後將其與分開訓練的資料集進行比較。於第二部分中，本研究探討模型更新的方法，以進一步了解特徵重要性在不同年份的變化，並總結出相對有效的更新方法。總結而言，本論文探索了機器學習模型在靜態環境中對菌血症的預測能力，以及更新模型的方法。透過文中機器學習模型的探討，可以進一步減少患者等待治療的時間、提高病患的存活率、同時也可以奠定未來對於菌血症研究的堅固基礎。	zh_TW
dc.description.abstract	Bacteremia is a serious and potentially life-threatening condition caused by the presence of bacteria in the bloodstream. It is a type of blood stream infection that can lead to sepsis, which is potentially fatal if not treated promptly. Unfortunately, aside from fever, there are typically no other specific symptoms that can be used to identify the occurrence of bacteremia. This makes diagnosis reliant on blood culture, a process that can take up to 24-48 hours to produce results. In this thesis, we aim to develop machine learning models for predicting multicenter bacteremia in emergency department patients. The study utilizes 10 years of data collected from National Taiwan University Hospital (NTUH) branches in Taipei, Hsinchu, and Yunlin. The research is divided into two parts to conduct a detailed investigation of the development of these models. Our goal is to investigate the predictive power of machine learning models in a static environment and the best practices for updating these models on a yearly basis. The experiments conducted in this thesis contribute to a deeper understanding of the nature of bacteremia, leading to a reduction in the waiting time for patients to receive appropriate treatment. By exploring the predictive power of machine learning models in a static environment and best practices for updating these models on a yearly basis, this study provides insights that can ultimately improve patient outcomes.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-10T16:16:14Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-09-10T16:16:14Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv Chapter 1. Introduction 1 1.1 Motivation 2 1.2 Background Knowledge 3 1.3 Data Description 4 1.4 Related Work 7 1.5 Contribution 7 1.6 Section Overview 8 Chapter 2. Technical Overview 9 2.1 Features Selection 10 2.2 Data Engineering 10 2.3 Data Visualization 12 2.4 Machine Learning Models 13 2.5 Explainable AI 17 Chapter 3. Static Environment Prediction 19 3.1 Data Description 20 3.2 Evaluation Metrics 20 3.3 Benchmark Model for Comparison 21 3.4 Result for Single Dataset Model 27 3.4.1 t-SNE Visualization 27 3.4.2 Experiment Result for Single Dataset 27 3.4.3 Best Result is Y0 Model 30 3.5 Result for Combination Dataset Model 31 3.5.1 t-SNE Visualization 31 3.5.2 Experiment Result for Combination Dataset 32 3.5.3 Best Results are T0Y0 and Benchmark Models 35 3.6 Explainable AI 35 3.6.1 Feature Importance and SHAP values 35 3.6.2 Individual SHAP values 37 3.7 Chapter Summary 38 Chapter 4. Yearly Model Updates 39 4.1 Related Work 41 4.2 Data Description 42 4.3 Methodology 42 4.4 Experiment Result 45 4.4.1 ROC-AUC Result for CatBoost 45 4.4.2 ROC-AUC Result for Random Forest 48 4.5 Past Models Comparison 50 4.6 Explainable AI 50 4.6.1 SHAP Values for CatBoost Retrain Method 50 4.6.2 SHAP Values for Random Forest Updating Method 52 4.6.3 Feature Importance for CatBoost and Random Forest 54 4.6.4 Individual SHAP value for CatBoost and Random Forest 54 4.7 Chapter Summary 58 Chapter 5. Conclusion 59 5.1 Limitation 59 5.2 Future Work 60 5.3 Summary 63 Bibliography 65 Appendices 71	-
dc.language.iso	en	-
dc.title	基於機器學習的多中心菌血症預測模型	zh_TW
dc.title	Machine Learning Models for Bacteremia Prediction in Multicenter Study	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林澤;王志宏;呂宗謙	zh_TW
dc.contributor.oralexamcommittee	Che Lin;Chih-Hung Wang;Tsung-Chien Lu	en
dc.subject.keyword	機器學習,菌血症預測,多中心,	zh_TW
dc.subject.keyword	Machine Learning,Bacteremia Prediction,Multicenter,	en
dc.relation.page	72	-
dc.identifier.doi	10.6342/NTU202301594	-
dc.rights.note	未授權	-
dc.date.accepted	2023-12-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 目前未授權公開取用	14.42 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。