應用於分析信用評估巨量資料的決策樹分類法

Weng-U Lei; 李永裕

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57792

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳靜枝(Ching-Chin, Chern)
dc.contributor.author	Weng-U Lei	en
dc.contributor.author	李永裕	zh_TW
dc.date.accessioned	2021-06-16T07:03:57Z	-
dc.date.available	2024-12-31
dc.date.copyright	2014-07-15
dc.date.issued	2014
dc.date.submitted	2014-07-11
dc.identifier.citation	[1] Bhatla, T. P., V. Prabhu, and A. Dua, 'Understanding credit card frauds', Cards business review, 2003, [2] Bhattacharyya, S., S. Jha, K. Tharakunnel, and J. C. Westland, 'Data mining for credit card fraud: A comparative study', Decision Support Systems, Vol. 50, no. 3, 2011, pp 602-613. [3] Breiman, L., 'Random forests', Machine learning, Vol. 45, no. 1, 2001, pp 5-32. [4] Chang, N. and O. R. L. Sheng, 'Decision-Tree-Based Knowledge Discovery: Single- vs. Multi-Decision-Tree Induction', INFORMS Journal on Computing, Vol. 20, no. 1, December 21, 2008 2008, pp 46-54. [5] Chen, F. L. and F. C. Li, 'Combination of feature selection approaches with SVM in credit scoring', Expert Systems with Applications, Vol. 37, no. 7, 2010, pp 4902-4909. [6] Chen, S. C. and M. Y. Huang, 'Constructing credit auditing and control & management model with data mining technique', Expert Syst. Appl., Vol. 38, no. 5, 2011, pp 5359-5365. [7] Chen, Y. L., C. C. Wu, and K. Tang, 'Building a cost-constrained decision tree with multiple condition attributes', Inf. Sci., Vol. 179, no. 7, 2009, pp 967-979. [8] Chung, S. H. and Y. M. Suh, 'Estimating the utility value of individual credit card delinquents', Expert Systems with Applications, Vol. 36, no. 2, Part 2, 2009, pp 3975-3981. [9] Cieslak, D. A. and N. V. Chawla, 'Learning decision trees for unbalanced data,' in Machine Learning and Knowledge Discovery in Databases, ed: Springer, 2008, pp. 241-256. [10] Cortes, C. and V. Vapnik, 'Support vector machine', Machine learning, Vol. 20, no. 3, 1995, pp 273-297. [11] Domingos, P. and G. Hulten, 'Mining high-speed data streams,' in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 71-80. [12] Englund, C. and A. Verikas, 'A novel approach to estimate proximity in a random forest: An exploratory study', Expert Systms with Applications, Vol. 39, no. 17, 2012, pp 13046-13050. [13] Garcia, S., A. Fernandez, and F. Herrera, 'Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems', Applied Soft Computing, Vol. 9, no. 4, 2009, pp 1304-1314. [14] Han, J., M. Kamber, and J. Pei, Data mining: concepts and techniques: Morgan kaufmann, 2006. [15] Hosmer Jr, D. W., S. Lemeshow, and R. X. Sturdivant, Applied logistic regression: Wiley. com, 2013. [16] Hulten, G., L. Spencer, and P. Domingos, 'Mining time-changing data streams,' in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 97-106. [17] Lopez-Chau, A., J. Cervantes, L. Lopez-Garcia, and F. G. Lamont, 'Fisher’s decision tree', Expert Systems with Applications, Vol. 40, no. 16, 2013, pp 6283-6291. [18] Mahmood, A. M., P. Gudapati, V. G. Kavuluru, and M. R. Kuppa, 'A New Pruning Approach For Better And Compact Decision Trees', International Journal on Computer Science & Engineering, 2010, pp 2551-2558. [19] McAfee, A. and E. Brynjolfsson, 'Big data: the management revolution', Harvard business review, Vol. 90, no. 10, 2012, pp 60-66. [20] Nie, G., W. Rowe, L. Zhang, Y. Tian, and Y. Shi, 'Credit card churn forecasting by logistic regression and decision tree', Expert Systems with Applications, Vol. 38, no. 12, 2011, pp 15273-15285. [21] Olaru, C. and L. Wehenkel, 'A complete fuzzy decision tree technique', Fuzzy Sets and Systems, Vol. 138, no. 2, 2003, pp 221-254. [22] Ordonez, C. and K. Zhao, 'Evaluating association rules and decision trees to predict multiple target attributes', Intell. Data Anal., Vol. 15, no. 2, 2011, pp 173-192. [23] Phua, C., V. Lee, K. Smith, and R. Gayler, 'A comprehensive survey of data mining-based fraud detection research', arXiv preprint arXiv:1009.6119, 2010, [24] Quinlan, J. R., 'Induction of decision trees', Machine learning, Vol. 1, no. 1, 1986, pp 81-106. [25] Rahmani, M., S. Hashemi, A. Hamzeh, and A. Sami, 'Agent Based Decision Tree Learning: A Novel Approach', International Journal of Software Engineering and Knowledge Engineering, Vol. 19, no. 07, 2009, pp 1015-1022. [26] Sahin, Y. and E. Duman, 'Detecting credit card fraud by decision trees and support vector machines,' in Proceedings of the International MultiConference of Engineers and Computer Scientists, 2011. [27] Schumacher, P., 'A comparison of logistic regression, neural networks, and classification trees predicting success of actuarial students', Journal of Education for Business, Vol. 85, no. 5, 2010, p. 258. [28] Siddiqi, N., Credit risk scorecards: developing and implementing intelligent credit scoring vol. 3: John Wiley & Sons, 2005. [29] Su, J. and H. Zhang, 'A fast decision tree learning algorithm,' in Proceedings of the National Conference on Artificial Intelligence, 2006, p. 500. [30] Tripathi, K. K. and M. A. Pavaskar, 'Survey on Credit Card Fraud Detection Methods', International Journal of Emerging Technology and Advanced Engineering, Vol. 2, no. 11, November 2012, [31] Wang, G., J. Ma, L. Huang, and K. Xu, 'Two credit scoring models based on dual strategy ensemble trees', Knowledge-Based Systems, Vol. 26, no. 0, 2012, pp 61-68. [32] Yap, B. W., S. H. Ong, and N. H. M. Husain, 'Using data mining to improve assessment of credit worthiness via credit scoring models', Expert Systems with Applications, Vol. 38, no. 10, 2011, pp 13274-13283.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/57792	-
dc.description.abstract	信用評估已成為金融機構評估是否核准顧客申請貸款的一項重要步驟。企業能根據不同的資料探勘技術來建立穩定而可靠的評估標準，降低放貸所產生的違約風險。但實際上，資料的來源已進入巨量資料時代。大量而雜亂的資料，加上複雜的資料探勘技術，大大提高了企業在資料處理與模型應用的難度。因此，本研究提出了一個使用決策樹作為主要的資料探勘模型來解決信用評估問題的方法（DTCAA）。其可讀性與良好的預測能力，以及它所產生的各種風險規則，有助於企業更好地理解顧客的特性，並能準確地結合實務應用。另外，本研究亦提出多種資料處理的方法來解決巨量資料下雜亂的資料定義與來源所帶來的問題，降低實務應用的門檻。經由使用真實的車貸申請資料，本研究驗證了DTCAA在實務上的可行性。即使在不同的違約比例與多種因素的改變下，決策樹同樣能夠提供與其他資料探勘方法相近的預測能力。	zh_TW
dc.description.abstract	Credit assessment has been a large-scale problem among finance institutes. Their demand in reducing risk of customer debt can be achieved by applying data mining techniques to determine whether a new application should be approved or not. The problem, however, is actually under a Big Data environment. Complicated preprocessing steps are required because of the vast and messy data sources. The study proposes a Decision-Tree-Based Credit Assessment Approach (DTCAA) to solve the problem. Decision tree model is selected because of its interpretability and easily understanding rules, as well as its competitive performance. Additionally, the approach also includes various methods for data preprocessing. The consolidations can reduce messiness of the raw data, facilitating the implementation process. By acquiring the real data from one of the three biggest car collateral loan companies in Taiwan, the experiments indicate that decision Tree is competitive among various situations. Within multiple factors, the experiments suggest the usability of DTCAA in practice.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T07:03:57Z (GMT). No. of bitstreams: 1 ntu-103-R01725019-1.pdf: 10687378 bytes, checksum: 61c4d6d9bc789a1350143e8a368541a7 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	Contents i List of Figures iv List of Tables v Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Objective 5 1.3 Scope 6 Chapter 2 Literature Review 7 2.1 Credit Assessment Problems 7 2.2 Supervised Learning / Classification 9 2.3 Decision Tree 13 2.4 Conclusion 16 Chapter 3 Problem Description 18 3.1 The Credit Assessment Problem 18 3.2 Classification and Decision Tree 20 3.3 The Big Data Environment 24 3.4 Problem Statement 25 3.5 Summary 27 Chapter 4 The Decision-Tree Based Credit Assessment Approach 29 4.1 Step 1: Data Analysis and Preprocessing 30 4.1.1 Defining the Target Variable 31 4.1.2 Consolidating data 34 4.1.3 Data Sampling and Attribute Selection 40 4.1.4 Data Partition 43 4.2 Step 2: Decision Tree Models Building 44 4.2.1 Model Building 46 4.2.2 Model Assessment 47 4.3 Step 3: Data Prediction and Scoring 48 4.4 Complexity 48 Chapter 5 Computational Analysis 50 5.1 Data Description 50 5.2 Factors 52 5.2.1 Target Variable 52 5.2.2 Different Multi-Class Approaches 53 5.2.3 Variable Selection 54 5.3 Experiments 54 5.3.1 Case 1: Balance Dataset with 1 run 57 5.3.2 Case 2: Balance Dataset with 30 runs 62 5.3.3 Case 3: Imbalance Dataset with 1 run 67 5.3.4 Case 4: Imbalance Dataset with 30 runs 72 5.4 Summary 76 Chapter 6 Conclusion and Future Work 78 6.1 Conclusion 78 6.2 Future Work 79 Reference 81
dc.language.iso	en
dc.subject	信用評估	zh_TW
dc.subject	決策樹	zh_TW
dc.subject	巨量資料	zh_TW
dc.subject	海量資料	zh_TW
dc.subject	大數據	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	資料整合	zh_TW
dc.subject	Decision Tree	en
dc.subject	Credit Assessment	en
dc.subject	Record linkage	en
dc.subject	Data Pre-processing	en
dc.subject	Data Mining	en
dc.subject	Big Data	en
dc.title	應用於分析信用評估巨量資料的決策樹分類法	zh_TW
dc.title	A Decision Tree Classifier for Big Data Analytics on Credit Assessment Problem	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	魏志平(Chih-Ping Wei),陳建錦(Chien-Chin Chen),楊錦生
dc.subject.keyword	信用評估,決策樹,巨量資料,海量資料,大數據,資料探勘,資料整合,	zh_TW
dc.subject.keyword	Credit Assessment,Decision Tree,Big Data,Data Mining,Data Pre-processing,Record linkage,	en
dc.relation.page	84
dc.rights.note	有償授權
dc.date.accepted	2014-07-14
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	10.44 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。