以機器學習方法解決保險理賠數據集不平衡之問題

Chen-Han Lu; 呂承翰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66813

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張原豪
dc.contributor.author	Chen-Han Lu	en
dc.contributor.author	呂承翰	zh_TW
dc.date.accessioned	2021-06-17T01:08:46Z	-
dc.date.available	2025-02-17
dc.date.copyright	2020-02-17
dc.date.issued	2020
dc.date.submitted	2020-01-22
dc.identifier.citation	[1] BREIDBACH, Christoph F.; RANJAN, Sasitharan. How do Fintech Service Platforms Facilitate Value Co-Creation? An Analysis of Twitter Data. In: ICIS. 2017. [2] ZHOU, Tao. An empirical examination of continuance intention of mobile payment services. Decision support systems, 2013, 54.2: 1085-1091. [3] BREIDBACH, Christoph F.; BRODIE, Roderick J. Engagement platforms in the sharing economy: conceptual foundations and research directions. Journal of Service Theory and Practice, 2017, 27.4: 761-777. [4] CHUEN, David Lee Kuo; DENG, Robert H. Handbook of blockchain, digital finance, and inclusion: Cryptocurrency, FinTech, InsurTech, regulation, ChinaTech, mobile security, and distributed ledger. Academic Press, 2017. [5] AGRESTI, Alan; KATERI, Maria. Categorical data analysis. Springer Berlin Heidelberg, 2011. [6] CATTEDDU, Daniele. Cloud Computing: benefits, risks and recommendations for information security. In: Iberic Web Application Security Conference. Springer, Berlin, Heidelberg, 2009. p. 17-17. [7] XIA, Feng, et al. Internet of things. International Journal of Communication Systems, 2012, 25.9: 1101. [8] HERNÁNDEZ, Mauricio A.; STOLFO, Salvatore J. Real-world data is dirty: Data cleansing and the merge/purge problem. Data mining and knowledge discovery, 1998, 2.1: 9-37. [9] SONAK, Apurva; PATANKAR, R. A. A survey on methods to handle imbalance dataset. Int. J. Comput. Sci. Mobile Comput, 2015, 4.11: 338-343. [10] DAVIS, Jesse; GOADRICH, Mark. The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd international conference on Machine learning. ACM, 2006. p. 233-240. [11] KATHAROPOULOS, Angelos; FLEURET, François. Not all samples are created equal: Deep learning with importance sampling. arXiv preprint arXiv:1803.00942, 2018. [12] HERNANDEZ, Julio; CARRASCO-OCHOA, Jesús Ariel; MARTÍNEZ-TRINIDAD, José Francisco. An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In: Iberoamerican Congress on Pattern Recognition. Springer, Berlin, Heidelberg, 2013. p. 262-269. [13] GRZYMALA-BUSSE, Jerzy W.; HU, Ming. A comparison of several approaches to missing attribute values in data mining. In: International Conference on Rough Sets and Current Trends in Computing. Springer, Berlin, Heidelberg, 2000. p. 378-385. [14] FRIEDL, Mark A.; BRODLEY, Carla E. Decision tree classification of land cover from remotely sensed data. Remote sensing of environment, 1997, 61.3: 399-409. [15] BREIMAN, Leo. Random forests. Machine learning, 2001, 45.1: 5-32. [16] LECUN, Yann, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86.11: 2278-2324. [17] LIU, Xu-Ying; WU, Jianxin; ZHOU, Zhi-Hua. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 39.2: 539-550. [18] ZOU, Kelly H.; TUNCALI, Kemal; SILVERMAN, Stuart G. Correlation and simple linear regression. Radiology, 2003, 227.3: 617-628. [19] KING, Gary; ZENG, Langche. Logistic regression in rare events data. Political analysis, 2001, 9.2: 137-163. [20] IANSITI, Marco; LAKHANI, Karim R. The truth about blockchain. Harvard Business Review, 2017, 95.1: 118-127. [21] NAKAMOTO, Satoshi. Bitcoin: A peer-to-peer electronic cash system. Manubot, 2019. [22] SOVBETOV, Yhlas. Factors influencing cryptocurrency prices: Evidence from bitcoin, ethereum, dash, litcoin, and monero. Journal of Economics and Financial Analysis, 2018, 2.2: 1-27. [23] WOOD, Gavin, et al. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 2014, 151.2014: 1-32. [24] HAFERKORN, Martin; DIAZ, Josué Manuel Quintana. Seasonality and interconnectivity within cryptocurrencies-an analysis on the basis of bitcoin, litecoin and namecoin. In: International Workshop on Enterprise Applications and Services in the Finance Industry. Springer, Cham, 2014. p. 106-120. [25] ARMKNECHT, Frederik, et al. Ripple: Overview and outlook. In: International Conference on Trust and Trustworthy Computing. Springer, Cham, 2015. p. 163-180. [26] CROSBY, Michael, et al. Blockchain technology: Beyond bitcoin. Applied Innovation, 2016, 2.6-10: 71.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66813	-
dc.description.abstract	近年來，金融科技提供了新形態的金融服務，給傳統的金融產業帶來了全面性的衝擊。保險業作為金融業的一部分，也因應了金融科技創新的潮流，提出了保險科技，包含了網路平台式的經營模式，與透過機器學習來分析數據。本篇研究將針對保險業中常見的資料集，車輛（包含機車與汽車）碰撞情況報告，使用機器學習方法進行分析，並預測車禍發生後傷害的嚴重程度。然而，在資料集當中比較少的死亡車禍案例，才是我們真正需要去在意的。在保險理賠中，死亡車禍會讓保險公司付出大量的金錢。因此，我們需要去進一步提升對於死亡車禍的預測。為了衡量我們預測的結果，本文中採用了Precision和Recall來取代Accuracy，著重在死亡車禍的判斷上。最後，我們會探討本研究在保險服務中的應用。	zh_TW
dc.description.abstract	In recent years, financial technology has provided a novel form of financial services, which has brought a comprehensive impact to the traditional financial market. The insurance sector, which is an important part of the financial sector, has developed InsurTech, including digital finance and machine learning. This paper will focus on the data sets commonly used in the insurance sector, such as collision reports of vehicles (including motorcycles and cars). We use machine learning methods to analyze and predict the severity of injuries after a car accident. However, the few cases of fatal crash in the data set are what we really need to care about. In insurance claims, a fatal crash will cost the insurance company lots of money. Therefore, we need to further improve the prediction of the fatal crash. In order to measure the results of our predictions, Precision and Recall are used in this paper to replace Accuracy, focusing on the judgment of fatal crash. Finally, we will explore further application of this research in insurance services.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T01:08:46Z (GMT). No. of bitstreams: 1 ntu-109-R06946008-1.pdf: 1364657 bytes, checksum: 4b095c53651194c84a892a45a43fda78 (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	口試委員會審定書 # Acknowledgments i ABSTRACT ii 中文摘要 iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 Chapter 2 Related Work 8 2.1 Imbalanced Data Set 9 2.2 Performance measure 10 2.3 Sampling 11 2.3.1 Importance sampling 11 2.3.2 Undersampling. 11 2.3.3 Oversampling. 12 2.4 Unknown attribute values 13 2.5 One-Hot Encoding 13 Chapter 3 Methodology 15 3.1 Decision Tree 15 3.2 Random Forest 17 3.3 Deep learning 18 3.3.1 Convolutional Neural Network 18 3.3.2 Convolutional Neural Network with Sampling 18 3.4 Regression Analysis 19 3.4.1 Linear Regression 20 3.4.2 Logistic Regression 20 Chapter 4 Results 22 4.1 Results – Random Forest 22 4.2 Results – Deep learning 23 4.3 Results – Logistic Regression 24 Chapter 5 Conclusion 25 Chapter 6 Future Work 27 6.1 Traditional insurance policies will be replaced 27 6.2 Cryptocurrencies will be used to pay claims 28 REFERENCE 31
dc.language.iso	en
dc.subject	隨機森林	zh_TW
dc.subject	羅吉斯回歸	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	不平衡的資料集	zh_TW
dc.subject	保險科技	zh_TW
dc.subject	金融科技	zh_TW
dc.subject	deep learning	en
dc.subject	random forest	en
dc.subject	Imbalance data set	en
dc.subject	FinTech	en
dc.subject	InsurTech	en
dc.subject	Logistic Regression	en
dc.title	以機器學習方法解決保險理賠數據集不平衡之問題	zh_TW
dc.title	Machine Learning Solutions for Imbalanced Data Set of Insurance Claims	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.coadvisor	崔茂培
dc.contributor.oralexamcommittee	韓傳祥,鄭文皇,繆維中
dc.subject.keyword	金融科技,保險科技,不平衡的資料集,隨機森林,深度學習,羅吉斯回歸,	zh_TW
dc.subject.keyword	FinTech,InsurTech,Imbalance data set,random forest,deep learning,Logistic Regression,	en
dc.relation.page	33
dc.identifier.doi	10.6342/NTU201904448
dc.rights.note	有償授權
dc.date.accepted	2020-01-22
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資料科學學位學程	zh_TW
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-109-1.pdf 未授權公開取用	1.33 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。