Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92733Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 陳縕儂 | zh_TW |
| dc.contributor.advisor | Yun-Nung Chen | en |
| dc.contributor.author | 陳柏衡 | zh_TW |
| dc.contributor.author | Po-Heng Chen | en |
| dc.date.accessioned | 2024-06-17T16:08:18Z | - |
| dc.date.available | 2024-06-18 | - |
| dc.date.copyright | 2024-06-17 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-06-16 | - |
| dc.identifier.citation | [1] D. I. Adelani, M. Masiak, I. A. Azime, J. Alabi, A. L. Tonja, C. Mwase, O. Ogun- depo, B. F. P. Dossou, A. Oladipo, D. Nixdorf, C. C. Emezue, sana al azzawi, B. Sibanda, D. David, L. Ndolela, J. Mukiibi, T. Ajayi, T. Moteu, B. Odhiambo, A. Owodunni, N. Obiefuna, M. Mohamed, S. H. Muhammad, T. M. Ababu, S. A. Salahudeen, M. G. Yigezu, T. Gwadabe, I. Abdulmumin, M. Taye, O. Awoyomi, I. Shode, T. Adelani, H. Abdulganiyu, A.-H. Omotayo, A. Adeeko, A. Afolabi, A. Aremu, O. Samuel, C. Siro, W. Kimotho, O. Ogbu, C. Mbonu, C. Chukwuneke, S. Fanijo, J. Ojo, O. Awosan, T. Kebede, T. S. Sakayo, P. Nyatsine, F. Sidume, O. Yousuf, M. Oduwole, T. Tshinu, U. Kimanuka, T. Diko, S. Nxakama, S. Ni- gusse, A. Johar, S. Mohamed, F. M. Hassan, M. A. Mehamed, E. Ngabire, J. Jules, I. Ssenkungu, and P. Stenetorp. Masakhanews: News topic classification for african languages, 2023.
[2] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Rad- ford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020. [3] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online, July 2020. Association for Computational Linguistics. [4] A. Conneau, R. Rinott, G. Lample, A. Williams, S. Bowman, H. Schwenk, and V. Stoyanov. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium, Oct.-Nov. 2018. Association for Computational Linguistics. [5] A. Deshpande, P. Talukdar, and K. Narasimhan. When is BERT multilingual? iso- lating crucial ingredients for cross-lingual transfer. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3610–3623, Seattle, United States, July 2022. Association for Computational Linguistics. [6] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding, 2019. [7] A. Ebrahimi, M. Mager, A. Oncevay, V. Chaudhary, L. Chiruzzo, A. Fan, J. Or- tega, R. Ramos, A. Rios, I. V. Meza Ruiz, G. Giménez-Lugo, E. Mager, G. Neubig, A. Palmer, R. Coto-Solano, T. Vu, and K. Kann. AmericasNLI: Evaluating zero- shot natural language understanding of pretrained multilingual models in truly low-resource languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6279–6299, Dublin, Ireland, May 2022. Association for Computational Linguistics. [8] N. Foroutan, M. Banaei, R. Lebret, A. Bosselut, and K. Aberer. Discovering language-neutral sub-networks in multilingual language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7560–7575, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Compu- tational Linguistics. [9] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Ges- mundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp, 2019. [10] J.HowardandS.Ruder.Universallanguagemodelfine-tuningfortextclassification, 2018. [11] B. Lester, R. Al-Rfou, and N. Constant. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics. [12] X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for gen- eration. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, Aug. 2021. Association for Computational Linguistics. [13] J. Libovický, R. Rosa, and A. Fraser. On the language neutrality of pre-trained multilingual representations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1663–1674, Online, Nov. 2020. Association for Computational Linguistics. [14] X. Liu, K. Ji, Y. Fu, W. Tam, Z. Du, Z. Yang, and J. Tang. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland, May 2022. Association for Computational Linguistics. [15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle- moyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019. [16] M. McCloskey and N. J. Cohen. Catastrophic interference in connectionist net- works: The sequential learning problem. volume 24 of Psychology of Learning and Motivation, pages 109–165. Academic Press, 1989. [17] J. Pfeiffer, I. Vulić, I. Gurevych, and S. Ruder. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online, Nov. 2020. Association for Computational Linguistics. [18] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer, 2020. [19] S.-A. Rebuffi, H. Bilen, and A. Vedaldi. Learning multiple visual domains with residual adapters, 2017. [20] T. Schick and H. Schütze. Exploiting cloze-questions for few-shot text classifica- tion and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online, Apr. 2021. Association for Computational Linguistics. [21] L. Tu, C. Xiong, and Y. Zhou. Prompt-tuning can be much better than fine-tuning on cross-lingual understanding with multilingual language models, 2022. [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need, 2023. [23] T. Vu, B. Lester, N. Constant, R. Al-Rfou’, and D. Cer. SPoT: Better frozen model adaptation through soft prompt transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5039–5059, Dublin, Ireland, May 2022. Association for Computational Linguistics. [24] A. Williams, N. Nangia, and S. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Asso- ciation for Computational Linguistics, 2018. [25] S. Wu and M. Dredze. Beto, bentz, becas: The surprising cross-lingual effective- ness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China, Nov. 2019. Association for Computational Linguistics. [26] Q. Zhong, L. Ding, J. Liu, B. Du, and D. Tao. Panda: Prompt transfer meets knowl- edge distillation for efficient model adaptation, 2022. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92733 | - |
| dc.description.abstract | 多語言預訓練語言模型(mPLMs)已在零樣本跨語言轉移任務中展示了顯著的能力。具體來說,它們可以僅在來源語言的任務上進行微調,然後應用於目標語言的任務。然而,對於預訓練過程中未見的低資源語言,僅依賴零樣本跨語言轉移通常會產生較差的結果。一種常見的策略是在目標語言上繼續使用遮罩預測來繼續訓練模型。但是,由於需要調整所有參數以進行語言適應,這樣的方法效率不彰。
在本篇論文中,我們提出了一種更有效的解決方案:用於語言適應的軟提示微調。我們的實驗發現,通過特別設計的軟提示來調整多語言模型,使模型能夠實現對以前未見過的語言的下游任務進行有效的零樣本跨語言轉移。值得注意的是,我們發現,相對於傳統的微調方法,提示調整在兩個的文本分類的資料集上,都展現了更佳的零樣本跨語言轉移表現,同時僅利用了調整參數的百分之0.28。這些結果強調了相對於傳統微調方法,軟提示調整可以為預訓練模型提供更加有效且高效的新增語言適應。 | zh_TW |
| dc.description.abstract | Multilingual pre-trained language models (mPLMs) have demonstrated notable effectiveness in zero-shot cross-lingual transfer tasks. Specifically, they can be fine-tuned solely on tasks in the source language and subsequently applied to tasks in the target language. However, for low-resource languages unseen during pre-training, relying solely on zero-shot language transfer often yields sub-optimal results. One common strategy is to continue training mPLMs using mask language modeling objectives on the target language. Nonetheless, this approach can be inefficient due to the need to adjust all parameters for language adaptation.
In this paper, we propose a more efficient solution: soft-prompt tuning for language adaptation. Our experiments demonstrate that with carefully designed prompts, soft-prompt tuning enables mPLMs to achieve effective zero-shot cross-lingual transfer to downstream tasks in previously unseen languages. Notably, we found that prompt tuning outperforms continuously trained baselines on two text classification benchmarks, encompassing 18 low-resource languages, while utilizing a mere 0.28% of the tuned parameters. These results underscore the superior adaptability of mPLMs to previously unseen languages afforded by soft-prompt tuning compared to traditional fine-tuning methods. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-06-17T16:08:18Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-06-17T16:08:18Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 致謝 i
摘要 iii Abstract v Contents vii List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Motivation 1 1.2 Main Contribution 3 1.3 Thesis Structure 4 2 Background 7 2.1 Transformer 7 2.1.1 Self-Attention Mechanism 7 2.1.2 Multi-Head Attention 8 2.2 Pre-trained Language Model 8 2.2.1 Bert 9 2.2.2 RoBERTa 10 2.2.3 XLM-R 10 2.3 Zero-shot and Few-shot Learning 11 2.4 Prompt Tuning 12 3 Related Work 15 3.1 Fine-tuning-based Method 15 3.2 Adapter-Based Method 16 4 Proposed Approach 17 4.1 Overview 17 4.2 MLM on Unlabeled Data 17 4.3 Tuning on Source-Language Labeled Data 18 4.3.1 Template and verbalizer 20 4.3.1.1 Template 20 4.3.1.2 Verbalizer 20 4.4 Objective function 21 5 Experiments 23 5.1 Dataset 23 5.2 Setup 24 5.3 Baselines 25 5.4 Results 26 6. Discussion 29 6.1 Parameter and Storage Efficiency 29 6.1.1 The Volume of Target Language Unlabeled Data 30 6.1.2 Few-shot Evaluation 31 6.1.3 Trainable Soft-Prompt Layers for Downstream Task 33 6.1.4 The visualization of Representation 35 7. Conclusion 37 8. Future Work 39 References 41 | - |
| dc.language.iso | en | - |
| dc.subject | 自然語言理解 | zh_TW |
| dc.subject | 跨語言遷移 | zh_TW |
| dc.subject | 軟提示 | zh_TW |
| dc.subject | 輕量化微調 | zh_TW |
| dc.subject | 語言模型 | zh_TW |
| dc.subject | Language Model | en |
| dc.subject | Natural Language Understanding | en |
| dc.subject | Cross Lingual Transfer | en |
| dc.subject | Soft Prompt | en |
| dc.subject | Parameter-Efficient Fine-Tuning | en |
| dc.title | 預訓練模型增加語言之高效率適應方法 | zh_TW |
| dc.title | Efficient Unseen Language Adaptation for Multilingual Pre-Trained Language Models | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳尚澤;孫紹華 | zh_TW |
| dc.contributor.oralexamcommittee | Shang-Tse Chen;Shao-Hua Sun | en |
| dc.subject.keyword | 自然語言理解,跨語言遷移,軟提示,輕量化微調,語言模型, | zh_TW |
| dc.subject.keyword | Natural Language Understanding,Cross Lingual Transfer,Soft Prompt,Parameter-Efficient Fine-Tuning,Language Model, | en |
| dc.relation.page | 46 | - |
| dc.identifier.doi | 10.6342/NTU202401193 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2024-06-17 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| Appears in Collections: | 資訊工程學系 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-112-2.pdf Access limited in NTU ip range | 2.03 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
