請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92890完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 蔡政安 | zh_TW |
| dc.contributor.advisor | Chen-An Tsai | en |
| dc.contributor.author | 蔣依儒 | zh_TW |
| dc.contributor.author | Yi-Ju Chiang | en |
| dc.date.accessioned | 2024-07-03T16:08:24Z | - |
| dc.date.available | 2024-07-04 | - |
| dc.date.copyright | 2024-07-03 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-06-27 | - |
| dc.identifier.citation | [1] Samuel Tober. Tree-based machine learning models with applications in insurance frequency modelling, 2020.
[2] Roel Henckaerts, Marie-Pier Côté, Katrien Antonio, and Roel Verbelen. Boosting insights in insurance tariff plans with tree-based machine learning methods, 2020. [3] Walter Olbricht. Tree-based methods: a useful tool for life insurance. European Actuarial Journal, 2:129–147, 2012. [4] Florian Buchner, Jürgen Wasem, and Sonja Schillo. Regression trees identify rele- vant interactions: Can this improve the predictive performance of risk adjustment? Health Economics, 26(1):74–85, 2017. [5] Chukwuebuka Joseph Ejiyi, Zhen Qin, Abdulhaq Adetunji Salako, Monday Nkanta Happy, Grace Ugochi Nneji, Chiagoziem Chima Ukwuoma, Ijeoma Amuche Chik- wendu, and Ji Gen. Comparative analysis of building insurance prediction using some machine learning algorithms. 2022. [6] Andrea Dal Pozzolo, Gianluca Moro, Gianluca Bontempi, and Dott Yann Aël Le Borgne. Comparison of data mining techniques for insurance claim prediction. Universita degli Studi di Bologna, 2011. [7] Leo Guelman. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 39(3):3659–3667, 2012. [8] Muhammad Arief Fauzan and Hendri Murfi. The accuracy of xgboost for insur- ance claim prediction. International Journal of Advances in Soft Computing & Its Applications, 10(2), 2018. [9] Rahul Sahai, Ali Al-Ataby, Sulaf Assi, Manoj Jayabalan, Panagiotis Liatsis, Chong Kim Loy, Abdullah Al-Hamid, Sahar Al-Sudani, Maitham Alamran, and Hoshang Kolivand. Insurance risk prediction using machine learning. In The Inter- national Conference on Data Science and Emerging Technologies, pages 419–433. Springer, 2022. [10] Bent Jørgensen and Marta C Paes De Souza. Fitting tweedie’s compound poisson model to insurance claims data. Scandinavian Actuarial Journal, 1994(1):69–93, 1994. [11] Yi Yang, Wei Qian, and Hui Zou. Insurance premium prediction via gradient tree- boosted tweedie compound poisson models, 2016. [12] Edward W. Frees and Emiliano A. Valdez. Hierarchical insurance claims modeling. Journal of the American Statistical Association, 103(484):1457–1469, 2008. [13] Eike Christian Brechmann Claudia Czado, Rainer Kastenmeier and Aleksey Min. A mixed copula model for insurance claims and claim sizes. Scandinavian Actuarial Journal, 2012(4):278–305, 2012. [14] Edward W Frees, Xiaoli Jin, and Xiao Lin. Actuarial applications of multivariate two-part regression models. Annals of Actuarial Science, 7(2):258–287, 2013. [15] EdwardWFrees,GeeLee,andLuYang.Multivariatefrequency-severityregression models in insurance. Risks, 4(1):4, 2016. [16] Glenn De’Ath. Multivariate regression trees: a new technique for modeling species– environment relationships. Ecology, 83(4):1105–1117, 2002. [17] David R Larsen and Paul L Speckman. Multivariate regression trees for analysis of abundance data. Biometrics, 60(2):543–549, 2004. [18] Andreas Hamann, Tim Gylander, and Pei-yu Chen. Developing seed zones and transfer guidelines with multivariate regression trees. Tree Genetics & Genomes, 7:399–408, 2011. [19] Zhiyu Quan and Emiliano A Valdez. Predictive analytics of insurance claims using multivariate decision trees. Dependence Modeling, 6(1):377–407, 2018. [20] J. Ross Quinlan. Induction of decision trees. Machine learning, 1:81–106, 1986. [21] J Ross Quinlan. C4. 5: programs for machine learning. Elsevier, 2014. [22] Leo Breiman. Classification and regression trees. Routledge, 2017. [23] Leo Breiman. Random forests. Machine learning, 45:5–32, 2001. [24] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm. In icml, volume 96, pages 148–156. Citeseer, 1996. [25] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001. [26] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Pro- ceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. [27] GuolinKe,QiMeng,ThomasFinley,TaifengWang,WeiChen,WeidongMa,Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017. [28] Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Doro- gush, and Andrey Gulin. Catboost: unbiased boosting with categorical features. Advances in neural information processing systems, 31, 2018. [29] Andreas Mayr, Harald Binder, Olaf Gefeller, and Matthias Schmid. The evolution of boosting algorithms. Methods of information in medicine, 53(06):419–427, 2014. [30] Glenn De’Ath. Multivariate regression trees: a new technique for modeling species– environment relationships. Ecology, 83(4):1105–1117, 2002. [31] MarkSegalandYuanyuanXiao.Multivariaterandomforests.Wileyinterdisciplinary reviews: Data mining and knowledge discovery, 1(1):80–87, 2011. [32] Eleftherios Spyromitros-Xioufis, Grigorios Tsoumakas, William Groves, and Ioan- nis Vlahavas. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning, 104:55–98, 2016. [33] Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. Classifier chains for multi-label classification. Machine learning, 85:333–359, 2011. [34] Leo Breiman. Classification and regression trees. Routledge, 2017. [35] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model pre- dictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish- wanathan, and R. Garnett, editors, Advances in Neural Information Processing Sys- tems 30, pages 4765–4774. Curran Associates, Inc., 2017. [36] Lloyd S Shapley et al. A value for n-person games. 1953. [37] Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature Ma- chine Intelligence, 2(1):2522–5839, 2020. [38] F.Pedregosa,G.Varoquaux,A.Gramfort,V.Michel,B.Thirion,O.Grisel,M.Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [39] Edward W. Frees, Catalina Bolancé, Montserrat Guillen, and Emiliano A. Valdez. Dependence modeling of multivariate longitudinal hybrid insurance data with dropout. Expert Systems with Applications, 185:115552, 2021. [40] Montserrat Guillen, Catalina Bolancé, Edward W. Frees, and Emiliano A. Valdez. Case study data for joint modeling of insurance claims and lapsation. Data in Brief, 39:107639, 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92890 | - |
| dc.description.abstract | 本論文利用進階的機器學習方法探討多元輸出回歸問題。研究將決策樹、隨機森林、CatBoost和Tweedie以及鏈回歸等方法應用於兩個不同的保險複數理賠資料集:LGPIF 資料集和西班牙資料集,並進行全面的分析。為了評估不同模型在單變量輸出與多變量輸出上的預測能力,研究使用均方誤差(MSE)作為評估指標。此外,研究也運用基尼重要性、排列重要性和 SHAP 值等方法,深入探討各變數對於模型預測的重要貢獻程度。本研究為複雜資料在不同模型及變數選擇方面提供了有價值的見解,增進了機器學習在多元輸出迴歸方面的了解,並為未來的研究提供了相關指引。 | zh_TW |
| dc.description.abstract | With this work, we investigate the recent advancements in machine learning techniques for insurance claims data, utilizing both univariate and multivariate approaches. This research applies decision trees, random forests, CatBoost, and Tweedie regression, in addition to innovative ensemble methods such as chain regression, to two insurance claims datasets: the LGPIF dataset and a Spanish dataset. Comprehensive data analysis is conducted, and the models'' predictive performances are evaluated using mean squared error (MSE). The study also explores variable importance through Gini importance, permutation importance, and SHAP values. Our experiments provide valuable insights into the effectiveness of various models and feature selection strategies for regression tasks involving complex data. This work enhances the understanding of machine learning applications in regression analysis and provides practical guidance for future implementations. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-03T16:08:24Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-07-03T16:08:24Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
摘要 ii Abstract iii Contents iv List of Figures viii List of Tables x 1 Introduction 1 2 Literature Review 3 3 Methodology 7 3.1 Univariate................................. 7 3.1.1 Decision Tree ............................. 7 3.1.2 Random Forest ............................ 8 3.1.3 CatBoost................................ 10 3.1.4 Tweedie Regression.......................... 12 3.2 Multivariate . . . . . . . . . . . . . 16 3.2.1 MRT:Multivariate Regression Tree ................. 16 3.2.2 Multivariate CatBoost......................... 17 3.2.3 Chain Regression ........................... 18 3.2.4 Ensemble Chain Regression ..................... 22 3.3 Variable Importance............................ 23 3.3.1 Gini Importance............................ 24 3.3.2 Permutation Importance........................ 25 3.3.3 SHAP Values ............................. 26 4 Case Study I : LGPIF....29 4.1 Data Introduction and Description ........... 29 4.2 Data Analysis............................... 34 4.3 Univariate Approach ........................... 35 4.3.1 Univariate Decision Tree ....................... 35 4.3.2 Univariate Random Forest ...................... 36 4.3.3 Univariate CatBoost.......................... 37 4.3.4 Univariate Tweedie Regression.................... 37 4.3.5 Result of univariate models (MSE, variable importance, and feature selection) .. . 38 4.4 Multivariate Approach .......................... 55 4.4.1 Multivariate Decision Tree ...................... 56 4.4.2 Multivariate Random Forest ..................... 56 4.4.3 Multivariate CatBoost......................... 57 4.4.4 Result of multivariate models (MSE, variable importance, and feature selection). . 57 4.5 Enesmble Chain Regression ....................... 66 4.5.1 Ensemble Chain Regression-Uni .................. 66 4.5.2 Ensemble Chain Regression-Multi ................. 67 4.6 Conclusion and Discussion ........................ 69 5 Case Study II : The Spanish Dataset..73 5.1 Data Introduction and Description ............ 73 5.2 Data Analysis............................... 77 5.3 Univariate Approach ........................... 78 5.3.1 Univariate Decision Tree ....................... 78 5.3.2 Univariate Random Forest ...................... 79 5.3.3 Univariate CatBoost.......................... 79 5.3.4 Univariate Tweedie Regression.................... 80 5.3.5 Result of univariate models (MSE, variable importance, and feature selection) . . 80 5.4 Multivariate Approach .......................... 95 5.4.1 Multivariate Decision Tree ...................... 95 5.4.2 Multivariate Random Forest ..................... 96 5.4.3 Multivariate CatBoost......................... 96 5.4.4 Result of multivariate models (MSE, variable importance, and feature selection) . . 97 5.5 Ensemble Chain Regression ....................... 103 5.5.1 Ensemble Chain Regression-Uni ..................104 5.5.2 Ensemble Chain Regression-Multi .................105 5.6 Conclusion and Discussion . . . . . . 106 6 Conclusion and Discussion ...111 References...115 Appendix A — LGPIF...121 Appendix B — The Spanish Dataset...127 | - |
| dc.language.iso | en | - |
| dc.subject | 多元輸出回歸 | zh_TW |
| dc.subject | 多元回歸樹 | zh_TW |
| dc.subject | CatBoost | zh_TW |
| dc.subject | Tweedie | zh_TW |
| dc.subject | 鏈迴歸 | zh_TW |
| dc.subject | 變數重要性 | zh_TW |
| dc.subject | SHAP值 | zh_TW |
| dc.subject | multivariate regression tree | en |
| dc.subject | SHAP values | en |
| dc.subject | variable importance | en |
| dc.subject | chain regression | en |
| dc.subject | Tweedie | en |
| dc.subject | CatBoost | en |
| dc.subject | multi-output | en |
| dc.title | 機器學習於預測保險複數理賠案件之比較分析 | zh_TW |
| dc.title | Comparative Analysis of Machine Learning Techniques for Predicting Multiple Insurance Claims | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 薛慧敏;陳錦華 | zh_TW |
| dc.contributor.oralexamcommittee | Huei-Min Hsueh;Jin-Hua Chen | en |
| dc.subject.keyword | 多元輸出回歸,多元回歸樹,CatBoost,Tweedie,鏈迴歸,變數重要性,SHAP值, | zh_TW |
| dc.subject.keyword | multi-output,multivariate regression tree,CatBoost,Tweedie,chain regression,variable importance,SHAP values, | en |
| dc.relation.page | 130 | - |
| dc.identifier.doi | 10.6342/NTU202401271 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-06-28 | - |
| dc.contributor.author-college | 共同教育中心 | - |
| dc.contributor.author-dept | 統計碩士學位學程 | - |
| 顯示於系所單位: | 統計碩士學位學程 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 未授權公開取用 | 10.2 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
