請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86525完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林智仁(Chih-Jen Lin) | |
| dc.contributor.author | Li-Chung Lin | en |
| dc.contributor.author | 林立中 | zh_TW |
| dc.date.accessioned | 2023-03-20T00:01:02Z | - |
| dc.date.copyright | 2022-08-18 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-08-12 | |
| dc.identifier.citation | K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. Theextreme classification repository: Multilabel datasets and code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html. W.C. Chang, D. Jiang, H.F. Yu, C.H. Teo, J. Zhang, K. Zhong, K. Kolluri, Q. Hu,N. Shandilya, V. Ievgrafov, J. Singh, and I. S. Dhillon. Extreme multilabel learningfor semantic matching in product search. In Proceedings of the 27th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (KDD), 2021. S. Chanpuriya and C. Musco. InfiniteWalk: Deep network embeddings as Laplacian embeddings with a nonlinearity. In Proceedings of the 26th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD), page 1325–1333, 2020. B.Y. Chu, C.H. Ho, C.H. Tsai, C.Y. Lin, and C.J. Lin. Warm start for parameterselection of linear classifiers. In Proceedings of the 21th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD), 2015. URL http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/warm-start/warm-start.pdf. E. Faerman, F. Borutta, K. Fountoulakis, and M. W. Mahoney. LASAGNE: locality andstructure aware graph node embedding. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence (WI), pages 246–253, 2018. . R.E. Fan and C.J. Lin. A study on threshold selection for multilabel classification.Technical report, Department of Computer Science, National Taiwan University, 2007. R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin. LIBLINEAR: a libraryfor large linear classification. Journal of Machine Learning Research, 9:1871–1874,2008. URL http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf. A. Grover and J. Leskovec. Node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining (KDD), page 855–864, 2016. . S. Khandagale, H. Xiao, and R. Babbar. Bonsai: diverse and shallow trees for extrememultilabel classification. Machine Learning, 109:2099–2119, 2020. . M. Khosla, V. Setty, and A. Anand. A comparative study for unsupervised network representation learning. IEEE Transactions on Knowledge and Data Engineering, 33(5):1807–1818, 2021. . Y. Kim. Convolutional neural networks for sentence classification. In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP), pages1746–1751, 2014. . D. D. Lewis, R. E. Schapire, J. P. Callan, and R. Papka. Training algorithms for lineartext classifiers. Proceedings of the 19th annual international ACM SIGIR conferenceon Research and development in information retrieval, pages 298–306, 1996. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. RCV1: A new benchmark collection for textcategorization research. Journal of Machine Learning Research, 5:361–397, 2004. J. Li, J. Zhu, and B. Zhang. Discriminative deep random walk for network classification. In Proceedings of the 54th Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 1004–1013, 2016. . LibMultiLabel Project Authors. LibMultiLabel user guide, 2022. URL https://www.csie.ntu.edu.tw/~cjlin/papers/libmultilabel/userguide.pdf. L.C. Lin, C.H. Liu, C.M. Chen, K.C. Hsu, I.F. Wu, M.F. Tsai, and C.J. Lin. On theuse of unrealistic predictions in hundreds of papers evaluating graph representations.In Proceedings of the ThirtySixth AAAI Conference on Artificial Intelligence (AAAI) ,2022. URL https://www.csie.ntu.edu.tw/~cjlin/papers/multilabel-embedding/multilabel_embedding.pdf. J.J. Liu, T.H. Yang, S.A. Chen, and C.J. Lin. Parameter selection: Why we should paymore attention to it. In Proceedings of the 59th Annual Meeting of the Association ofComputational Linguistics (ACL), 2021. URL https://www.csie.ntu.edu.tw/~cjlin/papers/parameter_selection/acl2021_parameter_selection.pdf. Short paper. X. Liu and K.S. Kim. A comparative study of network embedding based on matrix factorization. In International Conference on Data Mining and Big Data, pages 89–101,2018. Y. Liu, R. Jin, and L. Yang. Semisupervised multilabel learning by constrained nonnegative matrix factorization. In Proceedings of the TwentyFirst National Conferenceon Artificial Intelligence (AAAI), pages 421–426, 2006. S. A. P. Parambath, N. Usunier, and Y. Grandvalet. Optimizing fmeasures by costsensitive classification. In Advances in Neural Information Processing Systems, volume 27, 2014. B. Perozzi, R. AlRfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 701–710, 2014. . I. Pillai, G. Fumera, and F. Roli. Designing multilabel classifiers that maximize f measures: State of the art. Pattern Recognition, 61:394–404, 2017. J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrixfactorization: Unifying DeepWalk, LINE, PTE, and Node2vec. In Proceedings of theEleventh ACM International Conference on Web Search and Data Mining (WSDM),page 459–467, 2018. . J. Read, B. Pfahringer, and G. Holmes. Multilabel classification using ensembles ofpruned sets. In Proceedings of IEEE International Conference on Data Mining (ICDM),pages 995–1000, 2008. J. Read, B. Pfahringer, G. Holmes, and E. Frank. Classifier chains for multilabel classification. Machine learning, 85:333–359, 2011. J. Schlötterer, M. Wehking, F. S. Rizi, and M. Granitzer. Investigating extensions to random walk based graph embedding. In Proceedings of IEEE International Conferenceon Cognitive Computing, pages 81–89, 2019. F. Tai and H.T. Lin. Multilabel classification with principal label space transformation.Neural Computation, 24:2508–2542, 2012. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Largescale informationnetwork embedding. In Proceedings of the 24th international Conference on WorldWide Web (WWW), pages 1067–1077, 2015. L. Tang and H. Liu. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and KnowledgeManagement (CIKM), pages 1107–1116, 2009. L. Tang, S. Rajan, and V. K. Narayanan. Large scale multilabel classification via metalabeler. In Proceedings of the 18th International Conference on World Wide Web (WWW),pages 211–220, 2009. G. Tsoumakas and I. Vlahavas. Random klabelsets: An ensemble method for multilabelclassification. In European conference on machine learning, pages 406–417, 2007.X.Z. Wu and Z.H. Zhou. A unified view of multilabel performance measures. In Proceedings of the 34th International Conference on Machine Learning (ICML), pages3780–3788, 2017. Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):69–90, 1999. Y. Yang. A study on thresholding strategies for text categorization. In W. B. Croft, D. J.Harper, D. H. Kraft, and J. Zobel, editors, Proceedings of the 24th ACM InternationalConference on Research and Development in Information Retrieval, pages 137–145,New Orleans, US, 2001. ACM Press, New York, US. R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu. AttentionXML: Label treebased attentionaware deep model for highperformance extreme multilabeltext classification. In Advances in Neural Information Processing Systems, volume 32,2019. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86525 | - |
| dc.description.abstract | 在機器學習中,利用基準真相進行預測是自相矛盾的做法。但這般不切實際的實驗設計廣泛在圖表徵學習領域中被使用。利用圖表徵的節點分類多標籤問題中,許多著作假設每個測試數據的標籤數在預測階段為已知。實際應用中這種資訊罕為已知。我們指出這種不恰當的設計已成為此領域的標準。我們詳細調查使用不實際資訊的始末。據分析,利用不實際的資訊很可能高估預測表現。我們指出現有多標籤方法使用上的困難為造成這種情形地的可能原因。我們提出、簡單、有效而實際的多標籤方法以利未來研究。最後我們使用這次機會比較主要的圖表徵學習方法在多標籤的節點分類問題中的表現。 | zh_TW |
| dc.description.abstract | Prediction using the ground truth sounds like an oxymoron in machine learning. However, such an unrealistic setting was used in hundreds, if not thousands of papers in the area of finding graph representations. To evaluate the multi-label problem of node classification by using the obtained representations, many works assume that the number of labels of each test instance is known in the prediction stage. In practice such ground truth information is rarely available, but we point out that such an inappropriate setting is now ubiquitous in this research area. We detailedly investigate why the situation occurs. Our analysis indicates that with unrealistic information, the performance is likely over-estimated. To see why suitable predictions were not used, we identify difficulties in applying some multi-label techniques. For the use in future studies, we propose simple and effective settings without using practically unknown information. Finally, we take this chance to compare major graph-representation learning methods on multi-label node classification. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-20T00:01:02Z (GMT). No. of bitstreams: 1 U0001-0108202216193400.pdf: 662144 bytes, checksum: 3285a338412697d726f1f7a1abd56433 (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | 口試委員會審定書 i About this thesis ii 摘要 iii Abstract iv 1 Introduction 1 2 On the Use of Unrealistic Predictions in Hundreds of Papers Evaluating Graph Representations 2 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Unrealistic Predictions in Past Works . . . . . . . . . . . . . . . . . . . . 4 2.3 Analysis of the Unrealistic Predictions . . . . . . . . . . . . . . . . . . . 7 2.3.1 Predicting at Least One Label per Instance . . . . . . . . . . . . 10 2.4 Appropriate Methods for Training and Prediction . . . . . . . . . . . . . 11 2.4.1 Extending Onevsrest to Incorporate Parameter Selection . . . . 12 2.4.2 Thresholding Techniques . . . . . . . . . . . . . . . . . . . . . . 13 2.4.3 Costsensitive Learning . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . 15 2.5.2 Multilabel Training and Prediction Methods for Comparisons . . 16 2.5.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 17 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 A Comparison of Multilabel Methods for Text Classification 22 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.1 Linear Classifiers in LibMultiLabel . . . . . . . . . . . . . . . . 23 3.1.2 Experiments on the RCV1 Data Set . . . . . . . . . . . . . . . . 25 3.1.3 Experiments on the EurLex Data Set . . . . . . . . . . . . . . . 26 3.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliography 28 Appendix 33 A Proofs of Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 33 A.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . 34 A.3 Proof of Theorem 3 . . . . . . . . . . . . . . . . . . . . . . . . . 35 B Details of Generating Embedding Vectors . . . . . . . . . . . . . . . . . 35 C Additional Implementation Details . . . . . . . . . . . . . . . . . . . . . 37 C.1 Execution Environment for Graph Representation Learning . . . 37 C.2 Execution Environment for Classification Task . . . . . . . . . . 37 D Complete Experimental Results . . . . . . . . . . . . . . . . . . . . . . 37 D.1 Choice of CV Splits . . . . . . . . . . . . . . . . . . . . . . . . 38 D.2 Choice of (C, t) values for cost-sensitive-simple . . . . . . . . . 39 D.3 The MacroF1 and MicroF1 Tradeoff . . . . . . . . . . . . . . . 40 | |
| dc.language.iso | en | |
| dc.subject | 分類 | zh_TW |
| dc.subject | 多標籤 | zh_TW |
| dc.subject | 分類 | zh_TW |
| dc.subject | 多標籤 | zh_TW |
| dc.subject | multi-label | en |
| dc.subject | multi-label | en |
| dc.subject | classification | en |
| dc.subject | classification | en |
| dc.title | 探索多標籤分類的應用 | zh_TW |
| dc.title | Investigations in Applying Multilabel Classification | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 李育杰(Yuh-Jye Lee),蔡銘峰(Ming-Feng Tsai) | |
| dc.subject.keyword | 多標籤,分類, | zh_TW |
| dc.subject.keyword | multi-label,classification, | en |
| dc.relation.page | 49 | |
| dc.identifier.doi | 10.6342/NTU202201936 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2022-08-15 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-08-18 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-0108202216193400.pdf | 646.62 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
