Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電信工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89971
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李宏毅zh_TW
dc.contributor.advisorHung-yi Leeen
dc.contributor.author黃世丞zh_TW
dc.contributor.authorShih-Cheng Huangen
dc.date.accessioned2023-09-22T16:53:18Z-
dc.date.available2023-11-09-
dc.date.copyright2023-09-22-
dc.date.issued2023-
dc.date.submitted2023-08-08-
dc.identifier.citation[1] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
[2] A. Antoniou, H. Edwards, and A. Storkey. How to train your maml. arXiv preprint arXiv:1810.09502, 2018.
[3] J. L. Ba, J. R. Kiros, and G. E. Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
[4] D. Bahdanau, K. H. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, 2015.
[5] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. arXiv preprint arXiv:1911.03863, 2019.
[6] Y. Bengio. Learning deep architectures for AI. Now Publishers Inc, 2009.
[7] R. Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
[8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[9] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pages 1126– 1135. PMLR, 2017.
[10] C.-L. Fu, Z.-C. Chen, Y.-R. Lee, and H.-y. Lee. Adapterbias: Parameter-efficient token-dependent representation shift for adapters in nlp tasks. arXiv preprint arXiv:2205.00305, 2022.
[11] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[12] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
[13] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
[14] K. Kawaguchi. A multithreaded software model for backpropagation neural network applications. The University of Texas at El Paso, 2000.
[15] T. Khot, A. Sabharwal, and P. Clark. Scitail: A textual entailment dataset from science question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[16] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[17] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. nature, 521(7553):436–444, 2015.
[18] B. Lester, R. Al-Rfou, and N. Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
[19] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
[20] X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
[21] B. McCann, N. S. Keskar, C. Xiong, and R. Socher. The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730, 2018.
[22] W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, 1943.
[23] J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, and I. Gurevych. Adapterfusion: Nondestructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020.
[24] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[25] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67, 2020.
[26] S. Ravi and H. Larochelle. Optimization as a model for few-shot learning. 2016.
[27] F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6):386, 1958.
[28] P. Shaw, J. Uszkoreit, and A. Vaswani. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155, 2018.
[29] A. C. Stickland and I. Murray. Bert and pals: Projected attention layers for efficient adaptation in multi-task learning. In International Conference on Machine Learning, pages 5986–5995. PMLR, 2019.
[30] S. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. Advances in neural information processing systems, 28, 2015.
[31] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
[32] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[33] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32, 2019.
[34] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. Glue: A multitask benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
[35] Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. Awadallah, and J. Gao. List: Lite prompted self-training makes parameter-efficient few-shot learners. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2262–2281, 2022.
[36] Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao. Adamix: Mixture-of-adapter for parameter-efficient tuning of large language models. arXiv preprint arXiv:2205.12410, 2022.
[37] S. Wu, H. R. Zhang, and C. Ré. Understanding and improving information transfer in multi-task learning. arXiv preprint arXiv:2005.00944, 2020.
[38] Q. Ye, B. Y. Lin, and X. Ren. Crossfit: A few-shot learning challenge for cross-task generalization in nlp. arXiv preprint arXiv:2104.08835, 2021.
[39] E. B. Zaken, S. Ravfogel, and Y. Goldberg. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
[40] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89971-
dc.description.abstract隨著預訓練語言模型(Pre-trained Language Model)的參數量變得越來越大,輕量化微調(Parameter-Efficient Fine-tuning)顯得更為重要,但在少樣本學習(Few-Shot Learning)的情境下進行輕量化微調的效果卻遠遠不及微調整個預訓練模型。

為了解決這個問題,本研究提出在進行輕量化微調前加入一個稱為「引導」(Priming)的訓練過程來強化預訓練語言模型的量化微調效果,並且在一個包含160個不同自然語言處理任務的少樣本資料集上驗證了本方法的有效性。相較於直接進行輕量化微調,經過引導的模型在ARG(Average Relative Gain)分數上達到了近30%的進步量,其表現也超越了其他的輕量化微調基石模型。

除此之外,我們針對引導模型的方法進行了系統性的實驗,分析了在引導階段使用不同訓練演算法和訓練不同參數對於引導效果的影響,並找出最有效的引導方法。本研究的結果將能有效增強輕量化微調在少樣本學習上的表現,並使得大型預訓練語言模型的微調和使用更加有效率。
zh_TW
dc.description.abstractAs the parameter size of pre-trained language models (PLMs) continues to grow, parameter-efficient fine-tuning becomes more important. However, the effectiveness of parameter-efficient fine-tuning is far inferior to that of fine-tuning the entire pre-trained model in the context of few-shot learning.

To address this issue, this study proposed a training process called "priming" to enhance the effectiveness of parameter-efficient fine-tuning by strengthening the pre-trained language model before performing the downstream fine-tuning. The effectiveness of this method was verified on a few-shot dataset consisting of 160 different NLP tasks. Compared to directly performing parameter-efficient fine-tuning, the primed model achieved an improvement of nearly 30% in ARG (Average Relative Gain) score and outperformed other parameter-efficient fine-tuning baselines.

In addition, we conducted systematic experiments to analyze the impact of different training algorithms and different upstream trainable parameters and identify the most effective priming method. The results of this study will effectively enhance the performance of parameter-efficient fine-tuning in few-shot learning and make fine-tuning and usage of large-scale pre-trained language models more efficient.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T16:53:18Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-22T16:53:18Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents目錄
Page
致謝 iii
摘要 v
Abstract vii
目錄 ix
圖目錄 xiii
表目錄 xv
第一章 導論 1
1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 論文研究方向 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 主要貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 章節安排 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
第二章 背景知識 5
2.1 深層類神經網路 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 基本原理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 轉換器類神經網路 . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 附加器模塊 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 多任務學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 元學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 基本原理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 模型無關元學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
第三章 轉換器模型加上附加器之訓練框架 17
3.1 任務簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 模型架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 BART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 模型訓練框架 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 簡介 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 上游訓練階段 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.3 下游微調階段 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 特定的可訓練參數組合 . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 實驗設置 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1 超參數 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 附加器 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 評量標準 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
第四章 在 GLUE 上初步實驗之結果 31
4.1 模型訓練方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 多任務學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.2 模型無關元學習 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 MNLI 做為測試任務的表現 . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 移除訓練任務的影響 . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 其他任務作為下游測試任務的表現 . . . . . . . . . . . . . . . . 34
4.3 小結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
第五章 主實驗結果分析討論 37
5.1 概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 上游訓練方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 上游可訓練參數的組合 . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 任務 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.6 小結 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
第六章 結論與展望 43
6.1 研究貢獻與討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
參考文獻 45
-
dc.language.isozh_TW-
dc.subject少樣本學習zh_TW
dc.subject自然語言處理zh_TW
dc.subject附加器zh_TW
dc.subject輕量化微調zh_TW
dc.subject元學習zh_TW
dc.subject多任務學習zh_TW
dc.subjectmeta-learningen
dc.subjectnatural language processingen
dc.subjectfew-shot learningen
dc.subjectmulti-task learningen
dc.subjectadapteren
dc.subjectparameter-efficient fine-tuningen
dc.title輕量化微調的預訓練語言模型引導之系統性分析zh_TW
dc.titleSystematic Analysis of Pre-trained Language Model Priming for Parameter-efficient Fine-tuningen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee李琳山;曹昱;蔡宗翰zh_TW
dc.contributor.oralexamcommitteeLin-shan Lee;Yu Tsao;Tzong-Han Tsaien
dc.subject.keyword自然語言處理,附加器,輕量化微調,元學習,多任務學習,少樣本學習,zh_TW
dc.subject.keywordnatural language processing,adapter,parameter-efficient fine-tuning,meta-learning,multi-task learning,few-shot learning,en
dc.relation.page48-
dc.identifier.doi10.6342/NTU202302478-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-08-09-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電信工程學研究所-
顯示於系所單位:電信工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf1.48 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved