基於機器學習建立健康保險理賠風險評估模型

Shu-Mei ZhangJian; 張簡淑美

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21496

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	曹承礎(Seng-Cho Chou)
dc.contributor.author	Shu-Mei ZhangJian	en
dc.contributor.author	張簡淑美	zh_TW
dc.date.accessioned	2021-06-08T03:35:48Z	-
dc.date.copyright	2019-08-05
dc.date.issued	2019
dc.date.submitted	2019-07-29
dc.identifier.citation	[1] Altman, D., D. M. Cutler, and R. J. Zeckhauser (1998), Adverse Selection and Adverse Retention, American Economic Review, 88(2): 122-126. [2] Kunreuther, Howard and Pauly, Mark, (1985), Market equilibrium with private knowledge: An insurance example, Journal of Public Economics, 26, issue 3, p. 269-288, [3] B. Zhu, W. Yang, H. Wang and Y. Yuan, 'A hybrid deep learning model for consumer credit scoring,' 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, 2018, pp. 205-208. [4] Mitsuyoshi Takahara, Naoto Katakami, Hideaki Kaneto, Midori Noguchi, Iichiro Shimomura, Prediction of the Presence of Insulin Resistance using General Health Checkup Data in Japanese Employees with Metabolic Risk Factors, Journal of Atherosclerosis and Thrombosis, 2014, Volume 21, Issue 1, Pages 38-48 [5] E.W.T. Ngai, Yong Hu, Y.H. Wong, Yijun Chen, Xin Sun,” The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature”, in Decision Support Systems, 2011, Pages 559-569 [6] Yibo Wang, Wei Xu, 'Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud', in Decision Support Systems, 2018, Pages 87-95, [7] C.A. Knapp, M.C. Knapp, ”The effects of experience and explicit fraud risk assessment in detecting fraud with analytical procedures”, in Accounting, Organizations and Society, 2001, pp. 25-37 [8] Liu, Qi, and Miklos Vasarhelyi. “Healthcare fraud detection: A survey and a clustering model incorporating Geo-location information.” 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia. 2013. [9] Phua, Clifton, et al. “A comprehensive survey of data mining-based fraud detection research.” arXiv preprint arXiv:1009.6119 (2010) [10] Johnson, Marina Evrim, and Nagen Nagarur. “Multi-stage methodology to detect health insurance claim fraud.” Health care management science 19.3 (2016): 249–260. [11] 鄭宇君、陳恭、陳百齡《社群媒體巨量資料蒐集與分析─以Facebook與Twitter為例》，人文與社會科學簡訊，2017年，頁37-46 [12] M. Diaz-Granados, J. Diaz-Montes and M. Parashar, 'Investigating insurance fraud using social media,' 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, 2015, pp. 1344-1349. [13] Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani. “An Introduction to Statistical Learning”. Springer. 2013: 204. [14] Bermingham, Mairead L.; Pong-Wong, Ricardo; Spiliopoulou, Athina; Hayward, Caroline; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Agakov, Felix; Navarro, Pau; Haley, Chris S. “Application of high-dimensional feature selection: evaluation for genomic prediction in man”. Sci. Rep. 2015, 5. [15] Kenji Kira, Larry A. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings [16] Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore, ”Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining”, arXiv:1711.08477v2 [cs.LG], 2018 [17] Derrig, R. A. (2002) Insurance fraud, Journal of Risk and Insurance 69 (3): 271-287 [18] Liuzhi Yin, Yong Ge, Keli Xiao, Xuehua Wang, Xiaojun Quan, “Feature selection for high-dimensional imbalanced data”, in Neurocomputing, 2013, Pages 3-11 [19] P. Saripalli, V. Tirumala and A. Chimmad, 'Assessment of healthcare claims rejection risk using machine learning,' 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, 2017, pp. 1-6. [20] J. C. Cassimiro, A. M. Santana, P. S. Neto and R. L. Rabelo, 'Investigating the effects of class imbalance in learning the claim authorization process in the Brazilian health care market,' 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 3265-3272. [21] F. H. Duarte de Araujo, A. Macedo Santana and P. de Alcantara dos Santos Neto, 'Evaluation of Classifiers Based on Decision Tree for Learning Medical Claim Process,' in IEEE Latin America Transactions, vol. 13, no. 1, pp. 299-306, Jan. 2015. [22] M. H. Tekieh and B. Raahemi, 'Importance of data mining in healthcare: A survey,' 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Paris, 2015, pp. 1057-1062. [23] S. Peñafiel et al., 'Associating risks of getting strokes with data from health checkup records using Dempster-Shafer Theory,' 2018 20th International Conference on Advanced Communication Technology (ICACT), Chuncheon-si Gangwon-do, Korea (South), 2018, pp. 239-246. [24] Yang Xie et al., Analyzing health insurance claims on different timescales to predict days in hospital,Journal of Biomedical Informatics,Volume 60,2016,Pages 187-196 [25] Boucher, Jean-Philippe, Michel Denuit, and Montserrat Guillen, 'Models of Insurance Claim Counts with Time Dependence Based on Generalization of Poisson and Negative Binomial Distributions,' Variance 2:1, 2008, pp. 135-162. [26] Yang Xie, Günter Schreier, Michael Hoy, Ying Liu, Sandra Neubauer, David C.W. Chang, Stephen J. Redmond, Nigel H. Lovell,Analyzing health insurance claims on different timescales to predict days in hospital,Journal of Biomedical Informatics,Volume 60,2016,Pages 187-196 [27] Y. Li, C. Yan, W. Liu, Research and Application of Random Forest Model in Mining Automobile Insurance Fraud, no. 61502280, pp. 1756-1761, 2016. [28] N. Carneiro, G. Figueira, M. Costa, 'A data mining based system for credit-card fraud detection in e-tail', Decis. Support Syst., 2017. [29] M. Kirlidog, C. Asuk, 'A Fraud Detection Approach with Data Mining in Health Insurance', Procedia - Soc. Behav. Sci., vol. 62, pp. 989-994, 2012.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21496	-
dc.description.abstract	臺灣已正式進入高齡社會，社會對於醫療照護的需求也隨之增加，健康醫療保險市場日益茁壯，然而長年期健康險保單的損失率卻高居不下，保險公司無法準確評估被保險人風險並給予相對應的經驗費率是亟欲解決的問題。因此本研究使用機器學習分類演算法─極限梯度上升分類器找出預測理賠風險的重要因子，並以 AUC 、 K-S 統計，平均精確率評估模型表現，為了便於應用預測模型協助核保流程，將預測模型對樣本給出的判定分數分為低風險、中等風險、高風險，在高風險族群的預測精確率為79.05%。找出的重要因子有：身體質量指數、郵遞區號、保額、年齡。較特別的為郵遞區號，此變數僅僅是保險公司為了寄送繳費單而登錄，卻在本研究中發現對於預測理賠風險具有相當大的幫助，分析後可再將此變數拆解為所得、教育程度、地區醫療資源。而年齡此因子則是發現除了老年族群風險高以外，零到五歲的嬰幼兒理賠風險也非常突出，在零到一歲的嬰兒甚至高過老年族群。這些因子應被納入核保訂價的考量中。而本研究根據郵遞區號，使用政府公開資料作為額外資料，瞭解地區醫療資源，並推論樣本的所得與教育程度，結果顯示這三種資訊為有幫助的因子，若能取得被保人實際教育程度與所得，或能對評估被保人理賠風險更有助益。根據本研究的資料集，評估保單理賠風險時，以越長期來看越準確，然由於資料從 2011 年後才漸趨完整，缺乏對於保險人長期的追蹤，因此尚無法進行長期風險預測與終身價值預測，此為未來可持續研究之處。	zh_TW
dc.description.abstract	Taiwan has officially entered the stage of an aged society, and the demand for medical care has also increased. The health care insurance market has grown stronger. However, the loss rate of long-term health insurance policies is high, and insurance companies cannot accurately assess the risks of the insured and give them the corresponding price is an issue that we want to solve. Therefore, this study uses the machine learning classification algorithm, eXtreme Gradient Boosting classifier, to find out the important factors of predicting claims risk and evaluates the model performance by AUC, K-S statistics and average precision over recall. In order to facilitate the application of the prediction model to assist the underwriting process, the forecast given by the model are classified into low risk, medium risk, and high risk, and the prediction precision in the high-risk group is 79.05%. The important factors identified are body mass index, postal code, insured amount, age. The special one is the postal code. The insurance company just collect this information for sending the bill. However, it is found in this study that it is quite helpful for predicting the risk. After the analysis, the variable can be broken down into income, education degree, and regional medical resources. The factor age is found in addition to the high risk of the elderly population, the risk for infants from zero to five years old is also very noticeable, especially in the zero-year-old and one-year-old babies are even higher than the elderly. These factors should be taken into consideration of the underwriting price. This study uses open data released by the government as additional information to understand the factor zip code. Regional medical resources, income, and education level are inferred based on the postal code. The results show that these three kinds of information are helpful factors. If the actual education level and income of the insured are obtained, these factors may be more helpful in assessing the insured's risk of claiming. According to the data set of this study, the longer-term the insurance policy is tracked, the more accurate the prediction model is. However, since the policy data is only available after 2011, there is a lack of long-term tracking of insurers, so long-term risk prediction and lifelong value evaluation is where could be future research direction.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:35:48Z (GMT). No. of bitstreams: 1 ntu-108-R06725036-1.pdf: 2951601 bytes, checksum: e4218f83f503b7d8107f5d62d641bc10 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	口試委員會審定書 # 誌謝 i 中文摘要 ii ABSTRACT iii 目錄 iv 圖目錄 vi 表目錄 viii 第 1 章緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 3 1.4 研究流程 4 第 2 章文獻回顧 5 2.1 保險 5 2.1.1 保險 5 2.1.2 商業醫療保險中的逆選擇 5 2.2 資料探勘 6 2.3 健康與醫療照護研究 9 2.4 保險理賠研究 9 第 3 章研究方法 10 3.1 資料集 10 3.2 研究步驟 17 3.2.1 資料預處理 17 3.2.2 機器學習模型 21 3.2.3 模型評估方法 24 第 4 章研究分析與結果 27 4.1 模型表現結果 27 4.2 屬性重要度 30 4.3 重要屬性探討 31 4.3.1 身體質量指數、身高、體重 31 4.3.2 郵遞區號、縣市 31 4.3.3 保額 35 4.3.4 年齡 35 第 5 章結論 37 5.1 研究結果 37 5.2 研究限制 37 參考文獻 38
dc.language.iso	zh-TW
dc.title	基於機器學習建立健康保險理賠風險評估模型	zh_TW
dc.title	Health Insurance Claim Risk Assessment based on Machine Learning	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	吳玲玲,周子元
dc.subject.keyword	商業醫療保險,理賠,風險評估,因子探討,機器學習,分類,	zh_TW
dc.subject.keyword	commercial health insurance,claims,risk assessment,factor discussion,machine learning,classification,	en
dc.relation.page	40
dc.identifier.doi	10.6342/NTU201901929
dc.rights.note	未授權
dc.date.accepted	2019-07-30
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	2.88 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。