以漸進隨機標籤集解決成本導向多標籤分類問題

Yu-Ping Wu; 吳宇平

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51285

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林軒田(Hsuan-Tien Lin)
dc.contributor.author	Yu-Ping Wu	en
dc.contributor.author	吳宇平	zh_TW
dc.date.accessioned	2021-06-15T13:29:27Z	-
dc.date.available	2018-02-24
dc.date.copyright	2016-02-24
dc.date.issued	2016
dc.date.submitted	2016-02-04
dc.identifier.citation	[1] A. Beygelzimer, J. Langford, and P. Ravikumar. Multiclass classification with filter trees. Preprint, June, 2, 2007. [2] A. Beygelzimer, J. Langford, and P. Ravikumar. Error-correcting tournaments. In Proceedings of the 20th International Conference on Algorithmic Learning Theory, pages 247–262, 2009. [3] M. R. Boutell, J. Luo, X. Shen, and C. M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004. [4] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011. [5] Y.-N. Chen and H.-T. Lin. Feature-aware label space dimension reduction for multilabel classification. In Advances in Neural Information Processing Systems, pages 1529–1537, 2012. [6] K. Dembczynski, W. Cheng, and E. Hüllermeier. Bayes optimal multilabel classification via probabilistic classifier chains. In Proceedings of the 27th International Conference on Machine Learning, pages 279–286, 2010. [7] K. Dembczynski, W. Waegeman, and E. Hüllermeier. An analysis of chaining in multi-label classification. In Proceedings of the 21st European Conference on Artificial Intelligence, pages 294–299, 2012. [8] K. J. Dembczynski, W. Waegeman, W. Cheng, and E. Hüllermeier. An exact algorithm for F-measure maximization. In Advances in Neural Information Processing Systems, pages 1404–1412, 2011. [9] J. R. Doppa, J. Yu, C. Ma, A. Fern, and P. Tadepalli. HC-search for multi-label prediction: An empirical study. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, pages 1795–1801, 2014. [10] C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pages 973–978, 2001. [11] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871–1874, 2008. [12] C.-S. Ferng and H.-T. Lin. Multilabel classification using error-correcting codes of hard or soft bits. IEEE Transactions on Neural Networks and Learning Systems, 24(11):1888–1900, 2013. [13] Y. Freund and R. E. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(771-780):1612, 1999. [14] E. C. Goncalves, A. Plastino, and A. A. Freitas. A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In Proceedings of the 25th International Conference on Tools with Artificial Intelligence, pages 469–476, 2013. [15] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12): 2639–2664, 2004. [16] D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in Neural Information Processing Systems, pages 772–780, 2009. [17] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. Supervised machine learning: A review of classification techniques. In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering, pages 3–24, 2007. [18] A. Kumar, S. Vembu, A. K. Menon, and C. Elkan. Learning and inference in probabilistic classifier chains with beam search. In Machine Learning and Knowledge Discovery in Databases, pages 665–680. 2012. [19] C.-L. Li and H.-T. Lin. Condensed filter tree for cost-sensitive multi-label classification. In Proceedings of the 31st International Conference on Machine Learning, pages 423–431, 2014. [20] H.-Y. Lo. Cost-Sensitive Multi-Label Classification with Applications. PhD thesis, National Taiwan University, Jan. 2013. [21] H.-Y. Lo, S.-D. Lin, and H.-M. Wang. Generalized k-labelsets ensemble for multilabel and cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 26(7):1679–1691, 2014. [22] H.-Y. Lo, J.-C. Wang, H.-M. Wang, and S.-D. Lin. Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 13(3):518–529, 2011. [23] D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine learning, neural and statistical classification. Citeseer, 1994. [24] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang. Correlative multi-label video annotation. In Proceedings of the 15th International Conference on Multimedia, pages 17–26, 2007. [25] J. Read, L. Martino, and D. Luengo. Efficient monte carlo methods for multidimensional learning with classifier chains. Pattern Recognition, 47(3):1535–1546, 2014. [26] J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multi-label classification. Machine learning, 85(3):333–359, 2011. [27] R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2):135–168, 2000. [28] L. Sun, S. Ji, and J. Ye. Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):194–200, 2011. [29] F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, 24(9):2508–2542, 2012. [30] K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas. Multi-label classification of music into emotions. In Proceedings of the 9th International Conference on Music Information Retrieval, pages 325–330, 2008. [31] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005. [32] G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3):1–13, 2007. [33] G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook, pages 667–685. 2010. [34] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, and I. Vlahavas. MULAN: A java library for multi-label learning. Journal of Machine Learning Research, 12:2411–2414, 2011. [35] G. Tsoumakas and I. Vlahavas. Random k-labelsets: An ensemble method for multilabel classification. In Machine learning: ECML 2007, pages 406–417. 2007. [36] H.-H. Tu and H.-T. Lin. One-sided support vector regression for multiclass costsensitive classification. In Proceedings of the 27th International Conference on Machine Learning, pages 1095–1102, 2010. [37] M.-L. Zhang and Z.-H. Zhou. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048, 2007. [38] Y. Zhang and J. G. Schneider. Multi-label output codes using canonical correlation analysis. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pages 873–882, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/51285	-
dc.description.abstract	在真實世界中，不同的多標籤問題往往需要不同的衡量標準，因此，將衡量標準考量進演算法中成為了一項重要的課題。我們將此種問題稱為成本導向多標籤分類問題 (cost-sensitive multi-label classification)。大部分現有的方法無法處理任意的衡量標準，而其他成本導向的方法卻又有過高的時間複雜度。在此研究中，我們提出漸進隨機標籤集 (progressive random k-labelsets) 演算法以解決上述兩個問題。此演算法延伸自著名的隨機標籤集 (random k-labelsets) 演算法，因此具有與之相同的效率。此外，此方法逐步而漸進地將原始問題轉化為一系列的成本導向多元分類問題 (cost-sensitive multi-class classification)，並能處理普遍的衡量標準。實驗結果顯示，與其他特別為某些衡量標準設計的演算法相比，漸進隨機標籤集演算法的表現與之不相上下。而在其他衡量標準下，我們提出的方法顯著地優於其他方法。	zh_TW
dc.description.abstract	Many real-world applications of multi-label classification come with different performance evaluation criteria. It is thus important to design general multi-label classification methods that can flexibly take different criteria into account. Such methods tackle the problem of cost-sensitive multi-label classification (CSMLC). Most existing CSMLC methods either suffer from high computational complexity or focus on only certain specific criteria. In this work, we propose a novel CSMLC method, named progressive random k-labelsets (PRAKEL), to resolve the two issues above. The method is extended from a popular multi-label classification method, random k-labelsets, and hence inherits its efficiency. Furthermore, the proposed method can handle general evaluation criteria by progressively transforming the CSMLC problem into a series of cost-sensitive multi-class classification problems. Experimental results demonstrate that PRAKEL is competitive with existing methods under the specific criteria they can optimize, and is superior under general criteria.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T13:29:27Z (GMT). No. of bitstreams: 1 ntu-105-R02922167-1.pdf: 2612221 bytes, checksum: 958bdbfc2379dad0926687508eccbe93 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables viii 1 Introduction 1 2 Preliminaries 4 2.1 Regular Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Cost-Sensitive Classification . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Cost-Sensitive Multi-Label Classification . . . . . . . . . . . . . . . . . 6 2.4 Loss Functions for Multi-Label Classification . . . . . . . . . . . . . . . 7 3 Related Work 9 4 Proposed Method 14 4.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Cost Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Strategy for Defining Reference Label Vectors . . . . . . . . . . . . . . . 17 4.4 Weighting of Base Classifiers . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Analysis of Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Experiment 24 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1.3 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.2.1 Comparison of Variants of PRAKEL . . . . . . . . . . . . . . . . 26 5.2.2 Comparison with State-of-the-art Methods . . . . . . . . . . . . 30 5.2.3 Comparison with EPCC and CFT under Composite Loss . . . . . 31 5.2.4 Comparison with CS-RAKEL . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 33 A Proof 34 Bibliography 37
dc.language.iso	en
dc.title	以漸進隨機標籤集解決成本導向多標籤分類問題	zh_TW
dc.title	Progressive Random k-Labelsets for Cost-Sensitive Multi-Label Classification	en
dc.type	Thesis
dc.date.schoolyear	104-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	許永真(Yung-jen Hsu),李育杰(Yuh-Jye Lee)
dc.subject.keyword	機器學習,多標籤分類,損失函數,成本導向,標籤集,集成方法,	zh_TW
dc.subject.keyword	machine learning,multi-label classification,loss function,cost-sensitive,labelset,ensemble method,	en
dc.relation.page	41
dc.rights.note	有償授權
dc.date.accepted	2016-02-04
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf 目前未授權公開取用	2.55 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。