隨機森林與梯度提升決策樹在大數據下之探討

Sheng-Wei Chen; 陳聖惟

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2427

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林智仁(Chih-Jen Lin)
dc.contributor.author	Sheng-Wei Chen	en
dc.contributor.author	陳聖惟	zh_TW
dc.date.accessioned	2021-05-13T06:40:06Z	-
dc.date.available	2017-08-01
dc.date.available	2021-05-13T06:40:06Z	-
dc.date.copyright	2017-08-01
dc.date.issued	2017
dc.date.submitted	2017-07-28
dc.identifier.citation	B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, August 1996. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. L. Breiman, J. Friedman, C. J. Stone, and R. Olshen. Classification and Regression Trees. Chapman and Hall/CRC; 1 edition, 1984. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In KDD '16: Proceedings of the 22th ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. URL http://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf. B.-Y. Chu, C.-H. Ho, C.-H. Tsai, C.-Y. Lin, and C.-J. Lin. Warm start for parameter selection of linear classifiers. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015. URL http://www.csie.ntu.edu.tw/cjlin/libsvmtools/warm-start/warm-start.pdf. C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273–297, 1995. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/cjlin/papers/liblinear.pdf. M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15:3133–3181, 2014. URL http://jmlr.org/papers/v15/delgado14a.html. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a Statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Statist., 28(2):337–407, 04 2000. doi: 10.1214/aos/1016218223. URL http://dx.doi.org/10.1214/aos/1016218223. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001. J. H. Friedman and J. J. Meulman. Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9):1365–1381, 2003. ISSN 1097-0258. doi: 10.1002/sim.1501. URL http://dx.doi.org/10.1002/sim.1501. R. Jin and G. Agrawal. Communication and memory efficient parallel decision tree construction. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 119–129. SIAM, 2003. S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural Computation, 15(7): 1667–1689, 2003. doi: 10.1162/089976603321891855. URL http://dx.doi.org/10.1162/089976603321891855. P. Li. Robust logitboost and adaptive base class (ABC) logitboost. CoRR, abs/1203.3491, 2012. URL http://arxiv.org/abs/1203.3491. Q. Meng, G. Ke, T. Wang, W. Chen, Q. Ye, Z.-M. Ma, and T. Liu. A communication-efficient parallel algorithm for decision tree. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1279–1287. Curran Associates, Inc., 2016. URL http://papers.nips.cc/paper/6381-a-communication-efficient-parallel-algorithm-for F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, third edition, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2427	-
dc.description.abstract	In the past, we know that the tree-based methods may not handle the large-scale data sets. Therefore, the solver of the gradient boosting decision trees performs excellent in the large-scale data competitions. To know the details, we analyze the models of these tree-based methods. Furthermore, we compare their test accuracy and training time, we also consider the linear model and kernel method.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T06:40:06Z (GMT). No. of bitstreams: 1 ntu-106-R03546051-1.pdf: 677259 bytes, checksum: 22c88c07c706fd37540dd9902f14d5f5 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	Abstract i 1 Introduction 1 2 The Models 2 2.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Classification and Regression Tree . . . . . . . . . . . . . . . . . . . . . 6 2.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Gradient Boosting Decision Trees . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Boosting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.3 Newton Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.4 Some Technical Details . . . . . . . . . . . . . . . . . . . . . . 17 3 The Implementations 22 3.1 The Complexity of CART . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Complexity of Random Forests . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Complexity of GBDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Experiments 4.1 33 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.2 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.3 GBDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Conclusion 41 Bibliography 42
dc.language.iso	en
dc.title	隨機森林與梯度提升決策樹在大數據下之探討	zh_TW
dc.title	A Study of Random Forests and Gradient Boosting Decision Trees for Large-Scale Data	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee)
dc.subject.keyword	梯度提升決策樹,隨機森林,支持向量機,分類與回歸樹,	zh_TW
dc.subject.keyword	gradient boosting decision trees,random forests,support vector machine,classification and regression tree,	en
dc.relation.page	44
dc.identifier.doi	10.6342/NTU201702065
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2017-07-28
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工業工程學研究所	zh_TW
顯示於系所單位：	工業工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf	661.39 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。