藉由機器學習模型的組合預測加護病房重症患者的臨床結果

Yu-Sheng Yu; 余育昇

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73649

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆(Fei-Pei Lai)
dc.contributor.author	Yu-Sheng Yu	en
dc.contributor.author	余育昇	zh_TW
dc.date.accessioned	2021-06-17T08:07:24Z	-
dc.date.available	2021-02-22
dc.date.copyright	2021-02-22
dc.date.issued	2021
dc.date.submitted	2021-02-04
dc.identifier.citation	1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25(1):44-56. 2. Benjamens S, Dhunnoo P, Mesko B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med 2020;3:118. 3. Calvert, J., Mao, Q., Hoffman, J. L., Jay, M., Desautels, T., Mohamadlou, H., ... Das, R. (2016). Using electronic health record collected clinical variables to predict medical intensive care unit mortality. Annals of medicine and surgery, 11, 52-57. 4. Nanayakkara, S., Fogarty, S., Tremeer, M., Ross, K., Richards, B., Bergmeir, C., ... Kaye, D. M. (2018). Characterising risk of in-hospital mortality following cardiac arrest using machine learning: A retrospective international registry study. PLoS medicine, 15(11), e1002709. 5. Pirracchio, R., Petersen, M. L., Carone, M., Rigon, M. R., Chevret, S., van der Laan, M. J. (2015). Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. The Lancet Respiratory Medicine, 3(1), 42-52. 6. He HB, Garcia EA. Learning from Imbalanced Data. Ieee T Knowl Data En 2009;21(9):1263-1284. 7. Roumani YF, May JH, Strum DP, et al. Classifying highly imbalanced ICU data. Health Care Manag Sc 2013;16(2):119-128. 8. Sun YM, Wong AKC, Kamel MS. Classification of Imbalanced Data: A Review. Int J Pattern Recogn 2009;23(4):687-719. 9. Kim NJ, Bang JH, Choi JY, et al. The 2018 Clinical Guidelines for the Diagnosis and Treatment of HIV/AIDS in HIV-Infected Koreans. Infect Chemother 2019;51(1):77-88. 10. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 2011;12:2825-2830. 11. Brown LD, Cai TT, DasGupta A, et al. Interval estimation for a binomial proportion. Stat Sci 2001;16(2):101-133. 12. Kohn M, Senyak J. Sample Size Calculators Confidence interval for a proportion. UCSF CTSI Available at https://www.sample-size.net.: Accessed December 5, 2020. 13. Stow PJ, Hart GK, Higlett T, et al. Development and implementation of a high-quality clinical database: the Australian and New Zealand intensive care society adult patient database. J Crit Care 2006;21(2):133-141. 14. Knaus WA, Draper EA, Wagner DP, et al. Apache-Ii - a Severity of Disease Classification-System. Crit Care Med 1985;13(10):818-829. 15. Zimmerman JE, Kramer AA, McNair DS, et al. Acute physiology and chronic health evaluation (APACHE) IV: Hospital mortality assessment for today's critically ill patients. Crit Care Med 2006;34(5):1297-1310. 16. Gunn PP, Fremont AM, Bottrell M, et al. The Health Insurance Portability and Accountability Act Privacy Rule: a practical guide for researchers. Med Care 2004;42(4):321-327. 17. Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162(1):55-63. 18. Leisman DE, Harhay MO, Lederer DJ, et al. Development and Reporting of Prediction Models: Guidance for Authors From Editors of Respiratory, Sleep, and Critical Care Journals. Crit Care Med 2020;48(5):623-633. 19. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002;16:321-357. 20. He, H., Bai, Y., Garcia, E. A., Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE. 21. Troyanskaya O, Cantor M, Sherlock G, et al. Missing value estimation methods for DNA microarrays. Bioinformatics 2001;17(6):520-525. 22. Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175-185. 23. Sharma, H., Kumar, S. (2016). A survey on decision tree algorithms of classification in data mining. International Journal of Science and Research (IJSR), 5(4), 2094-2097. 24. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. 25. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 1997;55(1):119-139. 26. Chen, T., Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73649	-
dc.description.abstract	背景：儘管已經從大量患者中收集數據進行統計分析開發出各種預測評分系統了，但是預測加護病房患者的臨床預後仍然是一個重要而艱鉅的挑戰。近年來，隨著機器學習技術的發展，各種演算法都提供了更強大的模型推斷能力並且已被用於分析此類數據。目的：本研究旨在通過結合三種機器學習模型，採用三步策略來提高重症患者對於死亡率預測的靈敏度和精確率，並將入住ICU的住院天數分為四類，以使用分類模型來預測結果而非使用典型的回歸模型。方法：從NTUH CORE資料庫中提取了4,228名重症成人患者。在死亡率模型中，資料經過前處理後使用完整資料及平衡資料經由七種機器學習算法訓練。選擇了三種具有最高靈敏度，中等精確率和最高精確率的模型。在住院天數分類模型中，我們分析了住院天數在整個資料中的分佈，並將天數分為四種類別進行標記。然後，通過機器學習的多類預測，獲得這四種類別的結果和相應的預測概率。結果：在死亡率模型中，使用最高靈敏度模型將測試資料集中843名患者中的588名分類為低度風險組，死亡率為2.6％（95％CI，1.4至4.2％），其他255名患者則往下進行下一步的預測。經過中等精確率和最高精確率模型進行處理之後，這255名患者被進一步分為具有死亡率的中度風險組（210名患者），中高度風險組（26名患者）和調整後的高風險組（19名患者），死亡率分別為29％（95％CI，23至35.7％）、73.1％（95％CI，52.2至88.4％）和94.7％（95％CI，74至99.9％）。在住院天數分類模型中，F1-score為0.604，並且住院天數小於7天的患者的比大於7天的有更好的表現。結論：這項研究表明，通過結合最高靈敏度，中等精確率和最高精確率三種模型，三步策略過程提高了重症患者30天死亡率的可預測性。	zh_TW
dc.description.abstract	Background: Although different types of predictive scoring system have been developed from the statistical analysis of data collected for a large number of patients, prediction of clinical outcome for patients in intensive care units still remains an important and difficult challenge. In recent years, with the development of machine learning technology, various algorithms have provided more powerful model inference capabilities and has been used to analyze such data. Objective: This study aimed to use a three-step strategy to improve the sensitivity and precision in mortality prediction for critically ill patients by combining three machine learning models, and divide ICU length of stays into four classes for using the classification model to predict outcome instead of the regression model. Method: A total of 4,228 adult intensive care patients were extract from NTUH CORE database. In mortality model, the data is trained through seven machine learning algorithms with whole data and balanced data after data preprocessing. Three models were selected with the abilities of the highest sensitivity, moderate precision, and the highest precision, respectively. In LOS classification model, we analyze the distribution of LOS in the whole data and divide days into four classes for labeling. Then, through the multi-class prediction of machine learning, the results of the four classes and the corresponding probabilities are obtained. Result: In mortality model, 588 of the 843 patients in the testing dataset were classified into the low risk group with a mortality rate of 2.6% (95% CI, 1.4 to 4.2%) by using the highest sensitivity model, and other 255 patients went through the next step of prediction. After processing with moderate precision and the highest precision models, these 255 patients were further classified into the moderate risk group (210 patients), high-moderate risk group (26 patients), and adjusted high risk group (19 patients) with a mortality rate of 29% (95% CI, 23 to 35.7%), 73.1% (95% CI, 52.2 to 88.4%), and 94.7% (95% CI, 74 to 99.9%), respectively. The weighted average F1-score was 0.604 In LOS classification model, and the proportion of patients with LOS less than 7 days has better performance than those with LOS more than 7 days. Conclusion: This study revealed that a three-step strategy process enhanced the predictability of 30-day mortality of critically ill patients by combination of the highest sensitivity, moderate precision, and the highest precision models.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:07:24Z (GMT). No. of bitstreams: 1 U0001-3001202103404100.pdf: 2206234 bytes, checksum: b19da6bed11c597b6e1c2cd3b1bd395b (MD5) Previous issue date: 2021	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iv CONTENTS vi LIST OF FIGURES viii LIST OF TABLES ix Chapter1. Introduction 1 1.1 Background 1 1.2 Related works 2 1.3 Objective 2 Chapter2. Architecture 4 2.1 Workflow 4 2.2 Three-step strategy prediction 5 2.3 Four-class prediction on LOS classification model 6 Chapter3. Methods 9 3.1 Data source 9 3.2 Patient Selection 10 3.3 Data distribution of physiological data 13 3.4 Data distribution of ICU length of stay 15 3.5 Feature Selection and Feature Engineering 16 3.6 Imbalanced Data 18 3.7 Missing Data 19 3.8 K-Fold Validation 20 3.9 Classification Model 21 3.9.1 Logistic Regression 21 3.9.2 K-nearest neighbors 22 3.9.3 Decision Tree 22 3.9.4 Random Forest 23 3.9.5 Linear Discriminant Analysis 24 3.9.6 AdaBoost 24 3.9.7 XGBoost 25 3.10 Model Assessment 26 Chapter4. Results 27 4.1 Characteristics of input data 27 4.2 Mortality model 27 4.2.1 Influence of sample ratio 27 4.2.2 Model selection 28 4.2.3 Three-step strategy prediction 30 4.2.4 Feature Importance 32 4.3 LOS classification model 34 Chapter5. Discussion 35 5.1 Mortality model 35 5.2 LOS classification model 36 Chapter6. Limitation 38 Chapter7. Conclusions and future work 39 REFERENCE 40
dc.language.iso	en
dc.subject	住院天數	zh_TW
dc.subject	重症加護病房	zh_TW
dc.subject	預測	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	死亡率	zh_TW
dc.subject	length of stay	en
dc.subject	intensive care unit	en
dc.subject	prediction	en
dc.subject	machine learning	en
dc.subject	mortality	en
dc.title	藉由機器學習模型的組合預測加護病房重症患者的臨床結果	zh_TW
dc.title	Predicting Clinical Outcomes of Critically Ill Patients in Intensive Care Units with Combination of Machine Learning Models	en
dc.type	Thesis
dc.date.schoolyear	109-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	葉育彰(Yu-Chang Yeh),郭律成(Lu-Cheng Kuo),阮聖彰(Shanq-Jang Ruan),江岱倫(Dai-Lun Chiang)
dc.subject.keyword	重症加護病房,預測,機器學習,死亡率,住院天數,	zh_TW
dc.subject.keyword	intensive care unit,prediction,machine learning,mortality,length of stay,	en
dc.relation.page	42
dc.identifier.doi	10.6342/NTU202100268
dc.rights.note	有償授權
dc.date.accepted	2021-02-04
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
U0001-3001202103404100.pdf 未授權公開取用	2.15 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。