Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21496
Title: | 基於機器學習建立健康保險理賠風險評估模型 Health Insurance Claim Risk Assessment based on Machine Learning |
Authors: | Shu-Mei ZhangJian 張簡淑美 |
Advisor: | 曹承礎(Seng-Cho Chou) |
Keyword: | 商業醫療保險,理賠,風險評估,因子探討,機器學習,分類, commercial health insurance,claims,risk assessment,factor discussion,machine learning,classification, |
Publication Year : | 2019 |
Degree: | 碩士 |
Abstract: | 臺灣已正式進入高齡社會,社會對於醫療照護的需求也隨之增加,健康醫療保險市場日益茁壯,然而長年期健康險保單的損失率卻高居不下,保險公司無法準確評估被保險人風險並給予相對應的經驗費率是亟欲解決的問題。
因此本研究使用機器學習分類演算法─極限梯度上升分類器找出預測理賠風險的重要因子,並以 AUC 、 K-S 統計,平均精確率評估模型表現,為了便於應用預測模型協助核保流程,將預測模型對樣本給出的判定分數分為低風險、中等風險、高風險,在高風險族群的預測精確率為79.05%。 找出的重要因子有:身體質量指數、郵遞區號、保額、年齡。較特別的為郵遞區號,此變數僅僅是保險公司為了寄送繳費單而登錄,卻在本研究中發現對於預測理賠風險具有相當大的幫助,分析後可再將此變數拆解為所得、教育程度、地區醫療資源。而年齡此因子則是發現除了老年族群風險高以外,零到五歲的嬰幼兒理賠風險也非常突出,在零到一歲的嬰兒甚至高過老年族群。這些因子應被納入核保訂價的考量中。 而本研究根據郵遞區號,使用政府公開資料作為額外資料,瞭解地區醫療資源,並推論樣本的所得與教育程度,結果顯示這三種資訊為有幫助的因子,若能取得被保人實際教育程度與所得,或能對評估被保人理賠風險更有助益。 根據本研究的資料集,評估保單理賠風險時,以越長期來看越準確,然由於資料從 2011 年後才漸趨完整,缺乏對於保險人長期的追蹤,因此尚無法進行長期風險預測與終身價值預測,此為未來可持續研究之處。 Taiwan has officially entered the stage of an aged society, and the demand for medical care has also increased. The health care insurance market has grown stronger. However, the loss rate of long-term health insurance policies is high, and insurance companies cannot accurately assess the risks of the insured and give them the corresponding price is an issue that we want to solve. Therefore, this study uses the machine learning classification algorithm, eXtreme Gradient Boosting classifier, to find out the important factors of predicting claims risk and evaluates the model performance by AUC, K-S statistics and average precision over recall. In order to facilitate the application of the prediction model to assist the underwriting process, the forecast given by the model are classified into low risk, medium risk, and high risk, and the prediction precision in the high-risk group is 79.05%. The important factors identified are body mass index, postal code, insured amount, age. The special one is the postal code. The insurance company just collect this information for sending the bill. However, it is found in this study that it is quite helpful for predicting the risk. After the analysis, the variable can be broken down into income, education degree, and regional medical resources. The factor age is found in addition to the high risk of the elderly population, the risk for infants from zero to five years old is also very noticeable, especially in the zero-year-old and one-year-old babies are even higher than the elderly. These factors should be taken into consideration of the underwriting price. This study uses open data released by the government as additional information to understand the factor zip code. Regional medical resources, income, and education level are inferred based on the postal code. The results show that these three kinds of information are helpful factors. If the actual education level and income of the insured are obtained, these factors may be more helpful in assessing the insured's risk of claiming. According to the data set of this study, the longer-term the insurance policy is tracked, the more accurate the prediction model is. However, since the policy data is only available after 2011, there is a lack of long-term tracking of insurers, so long-term risk prediction and lifelong value evaluation is where could be future research direction. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21496 |
DOI: | 10.6342/NTU201901929 |
Fulltext Rights: | 未授權 |
Appears in Collections: | 資訊管理學系 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-108-1.pdf Restricted Access | 2.88 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.