語言引導之變換器於聯邦式多標籤分類

劉亦傑; I-Jieh Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92675

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王鈺強	zh_TW
dc.contributor.advisor	Yu-Chiang Frank Wang	en
dc.contributor.author	劉亦傑	zh_TW
dc.contributor.author	I-Jieh Liu	en
dc.date.accessioned	2024-06-04T16:06:22Z	-
dc.date.available	2024-06-05	-
dc.date.copyright	2024-06-04	-
dc.date.issued	2024	-
dc.date.submitted	2024-05-30	-
dc.identifier.citation	[1] J. Chen, W. Xu, S. Guo, J. Wang, J. Zhang, and H. Wang. Fedtune: A deep dive into efficient federated fine-tuning with pre-trained transformers. arXiv preprint arXiv:2211.08025, 2022. [2] T. Chen, L. Lin, R. Chen, X. Hui, and H. Wu. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1371–1384, 2020. [3] T. Chen, M. Xu, X. Hui, H. Wu, and L. Lin. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 522–531, 2019. [4] Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5177–5186, 2019. [5] Y. J. Cho, J. Wang, and G. Joshi. Client selection in federated learning: Convergence analysis and power-of-choice selection strategies. arXiv preprint arXiv:2010.01243,2020. [6] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik. Emnist: Extending mnist to hand-written letters. In 2017 international joint conference on neural networks (IJCNN),pages 2921–2926. IEEE, 2017. [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [8] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zis-serman. The pascal visual object classes challenge: A retrospective. International journal of computer vision, 111:98–136, 2015. [9] L. Gao, H. Fu, L. Li, Y. Chen, M. Xu, and C.-Z. Xu. Feddc: Federated learning with non-iid data via local drift decoupling and correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10112–10121, 2022. [10] T. Guo, S. Guo, J. Wang, X. Tang, and W. Xu. Promptfl: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. IEEE Transactions on Mobile Computing, 2023. [11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [12] W. Huang, M. Ye, and B. Du. Learn from others and be yourself in heterogeneous federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10143–10153, 2022. [13] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015. [14] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pages 5132–5143. PMLR, 2020. [15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [16] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009. [17] J. Lanchantin, T. Wang, V. Ordonez, and Y. Qi. General multi-label image classification with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16478–16488, 2021. [18] Q. Li, B. He, and D. Song. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10713–10722, 2021. [19] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020. [20] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang. On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2019. [21] X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou. Fed{bn}: Federated learning on non-{iid} features via local batch normalization. In International Conference on Learning Representations, 2021. [22] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014. [23] S. Liu, L. Zhang, X. Yang, H. Su, and J. Zhu. Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834, 2021. [24] M. Luo, F. Chen, D. Hu, Y. Zhang, J. Liang, and J. Feng. No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34:5972–5984, 2021. [25] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017. [26] M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, and C. Chen. Local learning matters: Rethinking data heterogeneity in federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8397–8406, 2022. [27] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [28] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. [29] T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. ZelnikManor. Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 82–91, 2021. [30] C. Song, F. Granqvist, and K. Talwar. Flair: Federated learning annotated image repository. Advances in Neural Information Processing Systems, 35:37792–37805, 2022. [31] S. Su, M. Yang, B. Li, and X. Xue. Cross-domain federated adaptive prompt tuning for clip. arXiv preprint arXiv:2211.07864, 2022. [32] G. Sun, M. Mendieta, T. Yang, and C. Chen. Exploring parameter-efficient finetuning for improving communication efficiency in federated learning. arXiv preprint arXiv:2210.01708, 2022. [33] Y. Tan, G. Long, L. Liu, T. Zhou, Q. Lu, J. Jiang, and C. Zhang. Fedproto: Federated prototype learning over heterogeneous devices. arXiv preprint arXiv:2105.00243, 3, 2021. [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. [35] J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020. [36] W. Zhuang, X. Gan, Y. Wen, S. Zhang, and S. Yi. Collaborative unsupervised visual representation learning from decentralized data. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4912–4921, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92675	-
dc.description.abstract	聯邦學習(FL)是一種新興的模型學習框架,該方法使多個用戶能夠在不共享私人數據的情況下協作訓練一個強大的模型,以保護隱私。大多數現有的FL方法僅考慮傳統的單標籤圖像分類問題,忽略了將任務轉移到多標籤圖像分類時的影響。然而,在現實世界的FL場景中,FL仍然難以應對用戶在本地數據分佈上的異質性,而在多標籤圖像分類中,這個問題變得更加嚴重。有鑑於變換器模型於中央化學習設定下之成功經驗,我們於是提出了一個新穎的FL框架用於多標籤分類。由於在訓練過程中本地客戶可能觀察到部分標籤之間的相關性,直接聚合本地更新的模型將不能產生滿意的全局模型效能。因此,我們提出了一個新穎的FL框架「語言引導之變換器」(FedLGT)來應對這一具有挑戰性的任務,旨在利用、傳輸不同客戶間的知識以學習一個強大的全局模型。根據我們於各種多標籤數據集(例如 FLAIR,MS-COCO等)進行大量實驗,我們展示了FedLGT能夠達到足夠好的性能,在多標籤FL場景下顯著優於標準FL技術。	zh_TW
dc.description.abstract	Federated Learning (FL) is an emerging paradigm that enables multiple users to collaboratively train a robust model in a privacy-preserving manner without sharing their private data. Most existing approaches of FL only consider traditional single-label image classification, ignoring the impact when transferring the task to multi-label image classification. Nevertheless, it is still challenging for FL to deal with user heterogeneity in their local data distribution in the real-world FL scenario, and this issue becomes even more severe in multi-label image classification. Inspired by the recent success of Transformers in centralized settings, we propose a novel FL framework for multi-label classification. Since partial label correlation may be observed by local clients during training, direct aggregation of locally updated models would not produce satisfactory performances. Thus, we propose a novel FL framework of Language-Guided Transformer (FedLGT) to tackle this challenging task, which aims to exploit and transfer knowledge across different clients for learning a robust global model. Through extensive experiments on various multi-label datasets (e.g., FLAIR, MS-COCO, etc.), we show that our FedLGT is able to achieve satisfactory performance and outperforms standard FL techniques under multi-label FL scenarios. Code is available at https://github.com/Jack24658735/FedLGT.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-06-04T16:06:22Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-06-04T16:06:22Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents Acknowledgements i 摘要 iii Abstract iv Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Works 5 2.1 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Multi-Label Image Classification . . . . . . . . . . . . . . . . . . . 7 2.2.1 Centralized Learning . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3 Proposed Method 9 3.1 Problem Formulation and Setup . . . . . . . . . . . . . . . . . . . . 9 3.2 A Brief Review for C-tran . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Federated Language-Guided Transformer . . . . . . . . . . . . . . . 12 3.3.1 Client-Aware Masked Label Embedding . . . . . . . . . . . . . . . 12 3.3.2 Universal Label Embedding . . . . . . . . . . . . . . . . . . . . . 14 3.4 Training of FedLGT . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 4 Experiments 18 4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Performance Results and Analysis . . . . . . . . . . . . . . . . . . . 20 4.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 5 Conclusion 23 References 24 Appendix A — Introduction 30 A.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.2 More Ablation Studies and Experiments . . . . . . . . . . . . . . . . 31 A.2.1 Parameters Analysis for CA-MLE . . . . . . . . . . . . . . . . . . 31 A.2.2 Choices of Label Embedding . . . . . . . . . . . . . . . . . . . . . 31 A.2.3 Discussion of Input Label Embedding . . . . . . . . . . . . . . . . 32 A.2.4 Per-class Results on FLAIR . . . . . . . . . . . . . . . . . . . . . . 33 A.2.5 Different Client Sampling Strategies . . . . . . . . . . . . . . . . . 33 A.2.6 Ablation Study for Fine-grained FLAIR . . . . . . . . . . . . . . . 35 A.2.7 Larger Rounds Results on FLAIR . . . . . . . . . . . . . . . . . . 36	-
dc.language.iso	en	-
dc.subject	視覺與語言	zh_TW
dc.subject	多標籤分類	zh_TW
dc.subject	聯邦學習	zh_TW
dc.subject	Vision and Language	en
dc.subject	Federated Learning	en
dc.subject	Multi-label Classification	en
dc.title	語言引導之變換器於聯邦式多標籤分類	zh_TW
dc.title	Language-Guided Transformer for Federated Multi-Label Classification	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳祝嵩;楊福恩	zh_TW
dc.contributor.oralexamcommittee	Chu-Song Chen;Fu-En Yang	en
dc.subject.keyword	聯邦學習,多標籤分類,視覺與語言,	zh_TW
dc.subject.keyword	Federated Learning,Multi-label Classification,Vision and Language,	en
dc.relation.page	36	-
dc.identifier.doi	10.6342/NTU202401012	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-05-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電信工程學研究所	-
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	3.61 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。