透過案例式推理方法進行不平衡多重測量肝癌病患資料分析及處理

Yan-Bo Lin; 林彥伯

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16713

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆(Feipei Lai)
dc.contributor.author	Yan-Bo Lin	en
dc.contributor.author	林彥伯	zh_TW
dc.date.accessioned	2021-06-07T23:44:13Z	-
dc.date.copyright	2014-07-16
dc.date.issued	2014
dc.date.submitted	2014-07-11
dc.identifier.citation	1. Li, D.-C., C.-W. Liu, and S.C. Hu, A learning method for the class imbalance problem with medical data sets. Computers in biology and medicine, 2010. 40(5): pp. 509-518. 2. Anand, A., et al., An approach for classification of highly imbalanced data using weighting and undersampling. Amino acids, 2010. 39(5): pp. 1385-1391. 3. Chawla, N.V., N. Japkowicz, and A. Kotcz, Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 2004. 6(1): pp. 1-6. 4. He, H. and E.A. Garcia, Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 2009. 21(9): pp. 1263-1284. 5. Vo, N.H. and Y. Won. Classification of unbalanced medical data with weighted regularized least squares. in Frontiers in the Convergence of Bioscience and Information Technologies, 2007. FBIT 2007. 2007. IEEE. 6. Galar, M., et al., EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recognition, 2013. 46(12): pp. 3460-3471. 7. Lusa, L., SMOTE for high-dimensional class-imbalanced data. BMC bioinformatics, 2013. 14(1): pp. 106. 8. Visa, S. and A. Ralescu. Issues in mining imbalanced data sets-a review paper. in Proceedings of the sixteen midwest artificial intelligence and cognitive science conference. 2005. 9. Japkowicz, N. and S. Stephen, The class imbalance problem: A systematic study. Intelligent data analysis, 2002. 6(5): pp. 429-449. 10. Yang, Q. and X. Wu, 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 2006. 5(04): pp. 597-604. 11. Sun, Y., A.K. Wong, and M.S. Kamel, Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 2009. 23(04): pp. 687-719. 12. Liu, X.-Y., J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 2009. 39(2): pp. 539-550. 13. Japkowicz, N. The class imbalance problem: Significance and strategies. in Proc. of the Int’l Conf. on Artificial Intelligence. 2000. Citeseer. 14. Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): pp. 25-36. 15. Ramentol, E., et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and information systems, 2012. 33(2): pp. 245-265. 16. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. arXiv preprint arXiv:1106.1813, 2011. 17. Beasley, R.P., et al., Hepatocellular carcinoma and hepatitis B virus: a prospective study of 22 707 men in Taiwan. The Lancet, 1981. 318(8256): pp. 1129-1133. 18. Chen, P.-H., et al., Important prognostic factors for the long-term survival of subjects with primary liver cancer in Taiwan: A hyperendemic area. European Journal of Cancer, 2007. 43(6): pp. 1076-1084. 19. Okuda, K., et al., Natural history of hepatocellular carcinoma and prognosis in relation to treatment study of 850 patients. Cancer, 1985. 56(4): pp. 918-928. 20. Kolodner, J.L., R.L. Simpson, and K. Sycara-Cyranski. A process model of cased-based reasoning in problem solving. in IJCAI. 1985. 21. Schmidt, R., et al., Cased-based reasoning for medical knowledge-based systems. International Journal of Medical Informatics, 2001. 64(2): pp. 355-367. 22. Holt, A., et al., Medical applications in case-based reasoning. The Knowledge Engineering Review, 2005. 20(03): pp. 289-292. 23. Juarez, J.M., et al. A reuse-based CBR system evaluation in critical medical scenarios. in Tools with Artificial Intelligence, 2009. ICTAI'09. 21st International Conference on. 2009. IEEE. 24. Huang, M.-J., M.-Y. Chen, and S.-C. Lee, Integrating data mining with case-based reasoning for chronic diseases prognosis and diagnosis. Expert Systems with Applications, 2007. 32(3): pp. 856-867. 25. Montani, S., Exploring new roles for case-based reasoning in heterogeneous AI systems for medical decision support. Applied Intelligence, 2008. 28(3): pp. 275-285. 26. Aamodt, A. and E. Plaza, Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 1994. 7(1): pp. 39-59. 27. Nilsson, M. and M. Sollenborn. Advancements and Trends in Medical Case-Based Reasoning: An Overview of Systems and System Development. in FLAIRS Conference. 2004. 28. Begum, S., et al., Case-based reasoning systems in the health sciences: a survey of recent trends and developments. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 2011. 41(4): pp. 421-434. 29. Llovet, J.M., C. Bru, and J. Bruix. Prognosis of hepatocellular carcinoma: the BCLC staging classification. in Seminars in liver disease. 1999. c 1999 by Thieme Medical Publishers, Inc. 30. Hsu, C.-W., et al., A CBR-BASED METHOD FOR RETRIEVING SIMILAR PATIENTS FROM CASE BASE. 31. Forner, A., et al. Current strategy for staging and treatment: the BCLC update and future prospects. in Seminars in liver disease. 2010. c Thieme Medical Publishers. 32. Dong, S. Liver function test report (In Chinese). 2014; Available from: http://www.jah.org.tw/form/index-1.asp?m=3&m1=8&m2=366&gp=361&id=522. 33. Zheng, C. Liver Tumor Index (In Chinese). 2014; Available from: http://www.tmn.idv.tw/tcfund/qa/qa/20001129_1.htm. 34. Prothrombin time. 2014; Available from: http://en.wikipedia.org/wiki/Prothrombin_time. 35. Torgo, L. and M.L. Torgo, Package ‘DMwR’. 2013. 36. Robnik-Šikonja, M. and I. Kononenko. An adaptation of Relief for attribute estimation in regression. in Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97). 1997. 37. Romanski, P. and M.L. Kotthoff, Package ‘FSelector’. 2013. 38. Kursa, M.B. and W.R. Rudnicki, Feature selection with the Boruta package. 2010, Journal of Statistical Software.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/16713	-
dc.description.abstract	在病患的臨床資料裡，有許多不同的用途，如果使用某項條件把整套資料一分為二，時常會有不平衡情況發生，意即某一邊的資料數量會多於另一邊的資料數量，而這樣的情況對於之後要拿來做分類或者預測的系統有著很大的影響，對於資料數量較多的一方，系統的訓練效能會比資料數量較少的一方良好許多，如此就會產生出有偏差的判定情況，在本研究中我們嘗試了常見的平衡資料模組的方法: 依大採樣 (Over-Sampling), 依小採樣 (Under-Sampling)，來對不平衡的肝癌病患資料作處理，且使用基於案例式推理原理的系統來進行復發的預測判定，同時我們也保留了不平衡的資料模組來當作一個比較的基準，根據系統的預測結果再進行靈敏度 (Sensitivity)、特異度 (specificity) 等相關統計，來比較各種處理資料方法對於預測的影響。	zh_TW
dc.description.abstract	In nowadays, the medicine clinical data are increasing very rapidly and most clinical data usually have imbalanced data problem. In this study, over-sampling and under-sampling are used for handling data imbalanced condition. Case based reasoning is used for developing classification models to predict recurrent statuses of patients with liver cancer. Classification results of these two methods are compared with those of an original imbalanced data set. Classification results are evaluated by sensitivity, specificity, balanced accuracy (BAC), positive predictive value (PPV), negative predictive value (NPV), and accuracy. Experiment results appear that balanced data sets can provide benefits for classification models and efficiently reduce biased classification. Furthermore, we also use some feature selection methods to give the feature weights and rank the feature weights from the highest to lowest. Then, these features are added stepwise to train and evaluate classification models. According to evaluation results, we could realize that using how many features could have better classification results.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T23:44:13Z (GMT). No. of bitstreams: 1 ntu-103-R01945037-1.pdf: 1695224 bytes, checksum: dadfa14e024e4e13b08d4c503f2009e6 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	致謝 ii 中文摘要 iii Abstract iv Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 Case Based Reasoning 3 2.2 Medical Application of Case Based Reasoning 4 2.3 Hepatocellular Carcinoma 4 Chapter 3 Method 6 3.1 Overview 6 3.2 Patient Data 7 3.2.1 Feature of Patient Data 8 3.2.2 Period 11 3.3 Grouping Method 13 3.3.1 Imbalanced 13 3.3.2 Over-sampling 13 3.3.3 Under-Sampling 16 3.4 CBR Calculation 17 3.5 Forward Feature Experiment 20 Chapter 4 Result 22 4.1 Influence of CBR system 22 4.2 Result of Forward Experiment 24 Chapter 5 Discussion 28 Chapter 6 Conclusion 31 Chapter 7 Future Work 32 Appendix 33 Reference 38
dc.language.iso	en
dc.subject	不平衡資料組	zh_TW
dc.subject	依大採樣	zh_TW
dc.subject	依小採樣	zh_TW
dc.subject	案例式推理	zh_TW
dc.subject	肝癌	zh_TW
dc.subject	Liver cancer	en
dc.subject	Under-Sampling	en
dc.subject	Over-Sampling	en
dc.subject	Case-based reasoning	en
dc.subject	Imbalanced data set	en
dc.title	透過案例式推理方法進行不平衡多重測量肝癌病患資料分析及處理	zh_TW
dc.title	Processing and analysis of imbalanced multiple measurements liver cancer patient data by case-based reasoning system	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳宜君(Yee-Chun Chen),蔡坤霖(Kun-Lin Tsai),許凱平(Kai-Ping Hsu),莊仁輝(Jen-Hui Chuang)
dc.subject.keyword	不平衡資料組,肝癌,案例式推理,依大採樣,依小採樣,	zh_TW
dc.subject.keyword	Imbalanced data set,Liver cancer,Case-based reasoning,Over-Sampling,Under-Sampling,	en
dc.relation.page	42
dc.rights.note	未授權
dc.date.accepted	2014-07-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	1.66 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。