Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96041
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳祝嵩zh_TW
dc.contributor.advisorChu-Song Chenen
dc.contributor.author林昱辰zh_TW
dc.contributor.authorYu-Chen Linen
dc.date.accessioned2024-09-25T16:45:17Z-
dc.date.available2025-09-01-
dc.date.copyright2024-09-25-
dc.date.issued2024-
dc.date.submitted2024-08-08-
dc.identifier.citation[1] S. Angelidis, R. K. Amplayo, Y. Suhara, X. Wang,and M. Lapata. Extractive opinion summarization in quantized transformer spaces. Transactions of the Association for Computational Linguistics, 9:277–293, 2021.
[2] A. Asai, M. a. Salehi, M. E. Peters, and H. Hajishirzi. Attempt: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In EMNLP, 2022.
[3] R. Bhardwaj, A. Saha, S. C. Hoi, and S. Poria. Vector-quantized input-contextualized soft prompts for natural language understanding. In EMNLP, 2022.
[4] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. In NeurIPS, 2020.
[5] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia. Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017.
[6] Y.-M. Chou, Y.-M. Chan, J.-H. Lee, C.-Y. Chiu, and C.-S. Chen. Unifying and merging well-trained deep neural networks for inference stage. IJCAI, 2018.
[7] C. Clark, K. Lee, M.-W. Chang, T. Kwiatkowski, M. Collins, and K. Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. In NAACL, 2019.
[8] M.-C. De Marneffe, M. Simons, and J. Tonhauser. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, pages 107–124, 2019.
[9] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
[10] D. Demszky, K. Guu, and P. Liang. Transforming question answering datasets into natural language inference datasets. arXiv preprint arXiv:1809.02922, 2018.
[11] B. Dolan and C. Brockett. Automatically constructing a corpus of sentential para phrases. In Third international workshop on paraphrasing (IWP2005), 2005.
[12] M. Dunn, L. Sagun, M. Higgins, V. U. Guney, V. Cirik, and K. Cho. Searchqa: A new q&a dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179, 2017.
[13] A. Fisch, A. Talmor, R. Jia, M. Seo, E. Choi, and D. Chen. Mrqa 2019 shared task: Evaluating generalization in reading comprehension. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, 2019.
[14] T. Gao, A. Fisch, and D. Chen. Making pre-trained language models better few-shot learners. IJCNLP, 2021.
[15] D. Giampiccolo, B. Magnini, I. Dagan, and W. B. Dolan. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9, 2007.
[16] Y. Gong, L. Liu, M. Yang, and L. Bourdev. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115, 2014.
[17] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Ges mundo, M. Attariyan, and S. Gelly. Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799. PMLR, 2019.
[18] E. J. Hu, Y. Shen, P. Wallis, Z. Allen Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low rank adaptation of large language models. In ICLR, 2021.
[19] H. Ivison and M. E. Peters. Hyperdecoders: Instance-specific decoders for multi-task nlp. In EMNLP, 2022.
[20] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117– 128, 2010.
[21] D. Khashabi, S. Chaturvedi, M. Roth, S. Upadhyay, and D. Roth. Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In NAACL, pages 252–262, 2018.
[22] T. Khot, A. Sabharwal, and P. Clark. Scitail: A textual entailment dataset from science question answering. In AAAI, volume 32, 2018.
[23] T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Ep stein, I. Polosukhin, J. Devlin, K. Lee, et al. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
[24] B. Lester, R. Al Rfou, and N. Constant. The power of scale for parameter efficient prompt tuning. In EMNLP, 2021.
[25] H. Levesque, E. Davis, and L. Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012.
[26] X. L. Li and P. Liang. Prefix tuning: Optimizing continuous prompts for generation. In IJCNLP, 2021.
[27] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. NeurIPS, 35:1950–1965, 2022.
[28] X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang. P tuning v2: Prompt tuning can be comparable to fine tuning universally across scales and tasks. ACL, 2022.
[29] R. K. Mahabadi, S. Ruder, M. Dehghani, and J. Henderson. Parameter-efficient multi-task fine-tuning for transformers via shared hypernetworks. In ACL, 2021.
[30] G. Mercatali and A. Freitas. Disentangling generative factors in natural language with discrete variational autoencoders. In EMNLP, 2021.
[31] D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 277–281, 1999.
[32] M. T. Pilehvar and J. Camacho-Collados. Wic: the word in-context dataset for evaluating context-sensitive meaning representations. In NAACL, 2018.
[33] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text to text transformer. JMLR, 21(140):1–67, 2020.
[34] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehension of text. In EMNLP, 2016.
[35] A. Roy and D. Grangier. Unsupervised paraphrasing without translation. In ACL, 2019.
[36] A. Roy, A. Vaswani, A. Neelakantan, and N. Parmar. Theory and experiments on vector quantized autoencoders. arXiv preprint arXiv:1805.11063, 2018.
[37] A. Rücklé, G. Geigle, M. Glockner, T. Beck, J. Pfeiffer, N. Reimers, and I. Gurevych. Adapterdrop: On the efficiency of adapters in transformers. EMNLP, 2021.
[38] T. Schick and H. Schütze. It’s not just size that matters: Small language models are also few shot learners. In NAACL, 2021.
[39] Z. Shi and A. Lipani. Dept: Decomposed prompt tuning for parameter-efficient fine-tuning. In ICLR, 2024.
[40] T. Shin, Y. Razeghi, R. L. LoganIV, E. Wallace, and S. Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. EMNLP, 2020.
[41] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, pages 1631–1642, 2013.
[42] Y. Su, X. Wang, Y. Qin, C.-M. Chan, Y. Lin, H. Wang, K. Wen, Z. Liu, P. Li, J. Li, et al. On transferability of prompt tuning for natural language processing. In NAACL, 2021.
[43] Y.-L. Sung, J. Cho, and M. Bansal. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. In NeurIPS, volume 35, pages 12991–13005, 2022.
[44] A. Trischler, T. Wang, X. Yuan, J. Harris, A. Sordoni, P. Bachman, and K. Suleman. Newsqa: A machine comprehension dataset. In ACL, 2016.
[45] A. VanDen Oord, O. Vinyals, et al. Neural discrete representation learning. NeurIPS, 30, 2017.
[46] T. Vu, B. Lester, N. Constant, R. Al-Rfou, and D. Cer. Spot: Better frozen model adaptation through soft prompt transfer. In ACL, 2022.
[47] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman. Superglue: A stickier benchmark for general-purpose language understanding systems. NeurIPS, 32, 2019.
[48] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. Glue: A multi task benchmark and analysis platform for natural language understanding. In ICLR, 2018.
[49] S. Wang, H. Fang, M. Khabsa, H. Mao, and H. Ma. Entailment as few-shot learner. arXiv preprint arXiv:2104.14690, 2021.
[50] Z. Wang, R. Panda, L. Karlinsky, R. Feris, H. Sun, and Y. Kim. Multitask prompt tuning enables parameter-efficient transfer learning. In ICLR, 2023.
[51] A. Warstadt, A. Singh, and S. R. Bowman. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641, 2019.
[52] A. Williams, N. Nangia, and S. R. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL, 2018.
[53] J. Wu, C. Leng, Y. Wang, Q. Hu, and J. Cheng. Quantized convolutional neural networks for mobile devices. In CVPR, pages 4820–4828, 2016.
[54] M. Wu, W. Liu, J. Xu, C. Lv, Z. Ling, T. Li, L. Huang, X. Zheng, and X. J. Huang. Parameter efficient multi-task fine tuning by learning to transfer token-wise prompts. In EMNLP findings, pages 8734–8746, 2023.
[55] Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In EMNLP, 2018.
[56] Y.-T. Ye, T.-A. Chen, and M.-S. Chen. Adapq: Adaptive exploration product quantization with adversary-aware block size selection toward compression efficiency. In IJCAI, pages 3–14. Springer, 2024.
[57] T. Yu, J. Yuan, C. Fang, and H. Jin. Product quantization network for fast image retrieval. In ECCV, pages 186–201, 2018.
[58] E. B. Zaken, S. Ravfogel, and Y. Goldberg. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. ACL, 2022.
[59] S. Zhang, X. Liu, J. Liu, J. Gao, K. Duh, and B. Van Durme. Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885, 2018.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96041-
dc.description.abstract提示詞微調是一種熱門的輕量化微調架構,其特點是在預訓練模型的基礎上僅更新少量參數,即可達到亮眼的表現。在過去的方法中,每個提示詞通常被視為一個整體,且各自獨立更新,導致隨著提示詞的增多,更新的參數量也線性增長。為了解決此問題,我們提出了高效率提示詞微調的可適應性編碼簿。我們利用了乘積量化的概念,使提示詞在每個分割後的子空間中共享一組可學習的編碼向量。每個提示詞透過一組自適應權重而有所變化。我們在 17 個自然語言任務中,僅更新預訓練模型 0.3% 的參數,就達到了優異的表現,包括自然語言理解及問答任務。此外,我們的方法在少樣本學習情境以及大型語言模型骨幹下也有良好表現,凸顯了其適應性及可發展性。zh_TW
dc.description.abstractPrompt Tuning has emerged as a popular Parameter-Efficient Fine-Tuning method attributed to its excellent performance with few updated parameters on various large-scale Pretrained Language Models (PLMs). In previous approaches, each prompt has been considered as a whole and updated independently, causing all parameters to depend on prompt length and increase accordingly. To alleviate this problem, we introduce Adaptive Codebook for Composite and Efficient Prompt Tuning (ACCEPT). In our approach, we utilize the concept of product quantization (PQ), enabling all soft prompts to share a common set of learnable codebook vectors within each subspace. Each prompt is then distinguished by a unique set of adaptive weights. We achieve impressive performances on 17 diverse natural language tasks, including natural language understanding (NLU) and question answering (QA), by training only 0.3% of parameters of the PLMs. Additionally, our method excels in fewshot and large model scenarios, highlighting its significant adaptability and potential.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-25T16:45:17Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-09-25T16:45:17Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsAcknowledgements i
摘要 ii
Abstract iii
Contents v
List of Figures viii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Related work 4
2.1 Parameter-Efficient Fine-tuning . . . . . . . . . . . . . . . . . . . . 4
2.2 Prompt Tuning Methods . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Quantization in Model Compression . . . . . . . . . . . . . . . . . 6
2.3.2 Quantization in Representation Learning of NLP Tasks . . . . . . . 6
Chapter 3 Method 8
3.1 Prompt Tuning for Downstream Tasks . . . . . . . . . . . . . . . . . 8
3.2 Review of PQ and Method Motivation . . . . . . . . . . . . . . . . . 8
3.3 Proposed Method - ACCEPT . . . . . . . . . . . . . . . . . . . . . 10
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.1 Numer of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.2 Comparison between LoRA and ACCEPT . . . . . . . . . . . . . . 13
Chapter 4 Experiments 15
4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Results on NLU and QA Tasks . . . . . . . . . . . . . . . . . . . . . 17
4.3 Results on Few-shot Adaptation . . . . . . . . . . . . . . . . . . . . 20
4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4.1 Learnable Codebook and Subdivision . . . . . . . . . . . . . . . . 21
4.4.2 Different Granularity of Subdivision . . . . . . . . . . . . . . . . . 21
4.4.3 Ablation on SCPP and SCAP . . . . . . . . . . . . . . . . . . . . . 22
4.4.4 Model Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.5 Prompt Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 5 Conclusion 26
References 27
Appendix A — Details 35
A.1 Experimental Setting . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.2 More Details of Experiments . . . . . . . . . . . . . . . . . . . . . . 36
A.2.1 Details of Different Granularity of Subdivision. . . . . . . . . . . . 36
A.2.2 Details of Ablation on SCPP and SCAP. . . . . . . . . . . . . . . . 38
A.2.3 Details of Prompt Initialization . . . . . . . . . . . . . . . . . . . . 39
A.3 More Studies on Prompt Length . . . . . . . . . . . . . . . . . . . . 39
A.4 Effectiveness of Soft Weight Mechanism . . . . . . . . . . . . . . . 40
A.5 Task and Dataset Details . . . . . . . . . . . . . . . . . . . . . . . . 41
A.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
-
dc.language.isoen-
dc.title基於可學習編碼簿之大型語言模型提示詞微調zh_TW
dc.titlePrompt Tuning of Large Language Models Based on Learnable Codebooksen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee林守德;陳銘憲;陳駿丞zh_TW
dc.contributor.oralexamcommitteeShou-De Lin;Ming-Syan Chen;Jun-Cheng Chenen
dc.subject.keyword輕量化微調,提示詞微調,乘積量化,深度學習,遷移式學習,zh_TW
dc.subject.keywordParameter Efficient Fine-tuning,Prompt Tuning,Product Quantization,Deep Learning,Transfer Learning,en
dc.relation.page42-
dc.identifier.doi10.6342/NTU202402283-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2024-08-10-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2025-09-01-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
604.8 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved