請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2427
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 林智仁(Chih-Jen Lin) | |
dc.contributor.author | Sheng-Wei Chen | en |
dc.contributor.author | 陳聖惟 | zh_TW |
dc.date.accessioned | 2021-05-13T06:40:06Z | - |
dc.date.available | 2017-08-01 | |
dc.date.available | 2021-05-13T06:40:06Z | - |
dc.date.copyright | 2017-08-01 | |
dc.date.issued | 2017 | |
dc.date.submitted | 2017-07-28 | |
dc.identifier.citation | B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152. ACM Press, 1992.
L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, August 1996. L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. L. Breiman, J. Friedman, C. J. Stone, and R. Olshen. Classification and Regression Trees. Chapman and Hall/CRC; 1 edition, 1984. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In KDD '16: Proceedings of the 22th ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. URL http://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf. B.-Y. Chu, C.-H. Ho, C.-H. Tsai, C.-Y. Lin, and C.-J. Lin. Warm start for parameter selection of linear classifiers. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015. URL http://www.csie.ntu.edu.tw/cjlin/libsvmtools/warm-start/warm-start.pdf. C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273–297, 1995. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008. URL http://www.csie.ntu.edu.tw/cjlin/papers/liblinear.pdf. M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15:3133–3181, 2014. URL http://jmlr.org/papers/v15/delgado14a.html. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a Statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Statist., 28(2):337–407, 04 2000. doi: 10.1214/aos/1016218223. URL http://dx.doi.org/10.1214/aos/1016218223. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189–1232, 2001. J. H. Friedman and J. J. Meulman. Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9):1365–1381, 2003. ISSN 1097-0258. doi: 10.1002/sim.1501. URL http://dx.doi.org/10.1002/sim.1501. R. Jin and G. Agrawal. Communication and memory efficient parallel decision tree construction. In Proceedings of the 2003 SIAM International Conference on Data Mining, pages 119–129. SIAM, 2003. S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural Computation, 15(7): 1667–1689, 2003. doi: 10.1162/089976603321891855. URL http://dx.doi.org/10.1162/089976603321891855. P. Li. Robust logitboost and adaptive base class (ABC) logitboost. CoRR, abs/1203.3491, 2012. URL http://arxiv.org/abs/1203.3491. Q. Meng, G. Ke, T. Wang, W. Chen, Q. Ye, Z.-M. Ma, and T. Liu. A communication-efficient parallel algorithm for decision tree. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 1279–1287. Curran Associates, Inc., 2016. URL http://papers.nips.cc/paper/6381-a-communication-efficient-parallel-algorithm-for F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, third edition, 2011. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2427 | - |
dc.description.abstract | In the past, we know that the tree-based methods may not handle the large-scale data sets. Therefore, the solver of the gradient boosting decision trees performs excellent in the large-scale data competitions. To know the details, we analyze the models of these tree-based methods. Furthermore, we compare their test accuracy and training time, we also consider the linear model and kernel method. | en |
dc.description.provenance | Made available in DSpace on 2021-05-13T06:40:06Z (GMT). No. of bitstreams: 1 ntu-106-R03546051-1.pdf: 677259 bytes, checksum: 22c88c07c706fd37540dd9902f14d5f5 (MD5) Previous issue date: 2017 | en |
dc.description.tableofcontents | Abstract i
1 Introduction 1 2 The Models 2 2.1 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Classification and Regression Tree . . . . . . . . . . . . . . . . . . . . . 6 2.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Gradient Boosting Decision Trees . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Boosting Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.2 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.3 Newton Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.4 Some Technical Details . . . . . . . . . . . . . . . . . . . . . . 17 3 The Implementations 22 3.1 The Complexity of CART . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Complexity of Random Forests . . . . . . . . . . . . . . . . . . . . . . . 30 3.3 Complexity of GBDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4 Experiments 4.1 33 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.2 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1.3 GBDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5 Conclusion 41 Bibliography 42 | |
dc.language.iso | en | |
dc.title | 隨機森林與梯度提升決策樹在大數據下之探討 | zh_TW |
dc.title | A Study of Random Forests and Gradient Boosting Decision Trees for Large-Scale Data | en |
dc.type | Thesis | |
dc.date.schoolyear | 105-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 林軒田(Hsuan-Tien Lin),李育杰(Yuh-Jye Lee) | |
dc.subject.keyword | 梯度提升決策樹,隨機森林,支持向量機,分類與回歸樹, | zh_TW |
dc.subject.keyword | gradient boosting decision trees,random forests,support vector machine,classification and regression tree, | en |
dc.relation.page | 44 | |
dc.identifier.doi | 10.6342/NTU201702065 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2017-07-28 | |
dc.contributor.author-college | 工學院 | zh_TW |
dc.contributor.author-dept | 工業工程學研究所 | zh_TW |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-1.pdf | 661.39 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。