Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許永真zh_TW
dc.contributor.advisorYung-Jen Hsuen
dc.contributor.author楊敦捷zh_TW
dc.contributor.authorDun-Jie Yangen
dc.date.accessioned2023-09-22T17:23:53Z-
dc.date.available2023-11-09-
dc.date.copyright2023-09-22-
dc.date.issued2023-
dc.date.submitted2023-08-05-
dc.identifier.citation[1] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Superglue: A stickier benchmark for general-purpose language understanding systems,” 2019.
[2] M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. X. Huang, “A systematic study and comprehensive evaluation of chatgpt on benchmark datasets,” 2023.
[3] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022.
[4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. J. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” ArXiv, vol. abs/2005.14165, 2020.
[5] T. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in International Conference on Machine Learning, 2021.
[6] Y. Zhang, S. Feng, and C. Tan, “Active example selection for in-context learning,” ArXiv, vol. abs/2211.04486, 2022.
[7] J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen, “What makes good in-context examples for gpt-3?,” in Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out, 2021.
[8] Z. Zhang, A. Zhang, M. Li, and A. J. Smola, “Automatic chain of thought prompting in large language models,” ArXiv, vol. abs/2210.03493, 2022.
[9] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Conference on Empirical Methods in Natural Language Processing, 2019.
[10] I. Levy, B. Bogin, and J. Berant, “Diverse demonstrations improve in-context compositional generalization,” ArXiv, vol. abs/2212.06800, 2022.
[11] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
[12] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” 2022.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017.
[14] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry, J. A. Fries, R. Teehan, T. Bers, S. Biderman, L. Gao, T. Wolf, and A. M. Rush, “Multitask prompted training enables zero-shot task generalization,” 2021.
[15] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” 2022.
[16] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
[17] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” 2021.
[18] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2022.
[19] T. Schick and H. Schutze, “Exploiting cloze questions for few shot text classification and natural language inference,” arXiv preprint arXiv:2001.07676, 2020.
[20] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, pp. 1 – 35, 2021.
[21] S. C. Y. Chan, A. Santoro, A. K. Lampinen, J. X. Wang, A. K. Singh, P. H. Richemond, J. Mcclelland, and F. Hill, “Data distributional properties drive emergent in-context learning in transformers,” ArXiv, vol. abs/2205.05055, 2022.
[22] S. Shin, S.-W. Lee, H. Ahn, S. Kim, H. Kim, B. Kim, K. Cho, G. Lee, W. Park, J.-W. Ha, et al., “On the effect of pretraining corpora on in-context learning by a large-scale language model,” arXiv preprint arXiv:2204.13509, 2022.
[23] S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, “An explanation of in-context learning as implicit bayesian inference,” ArXiv, vol. abs/2111.02080, 2021.
[24] E. Akyu ̈rek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What learning algorithm is in-context learning? investigations with linear models,” ArXiv, vol. abs/2211.15661, 2022.
[25] D. Dai, Y. Sun, L. Dong, Y. Hao, Z. Sui, and F. Wei, “Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers,” arXiv preprint arXiv:2212.10559, 2022.
[26] J. von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov, “Transformers learn in-context by gradient descent,” arXiv preprint arXiv:2212.07677, 2022.
[27] C. Olsson, N. Elhage, N. Nanda, N. Joseph, N. DasSarma, T. Henighan, B. Mann, A. Askell, Y. Bai, A. Chen, et al., “In-context learning and induction heads,” arXiv preprint arXiv:2209.11895, 2022.
[28] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui, “A survey for in-context learning,” ArXiv, vol. abs/2301.00234, 2022.
[29] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022.
[30] Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp, “Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity,” arXiv preprint arXiv:2104.08786, 2021.
[31] T. Sorensen, J. Robinson, C. M. Rytting, A. G. Shaw, K. Rogers, A. P. Delorey, M. Khalil, N. Fulda, and D. Wingate, “An information-theoretic approach to prompt engineering without ground truth labels,” in Annual Meeting of the Association for Computational Linguistics, 2022.
[32] Z. Wu, Y. Wang, J. Ye, and L. Kong, “Self-adaptive in-context learning,” ArXiv, vol. abs/2212.10375, 2022.
[33] H. Su, J. Kasai, C. H. Wu, W. Shi, T. Wang, J. Xin, R. Zhang, M. Ostendorf, L. Zettlemoyer, N. A. Smith, et al., “Selective annotation makes language models better few-shot learners,” arXiv preprint arXiv:2209.01975, 2022.
[34] O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” ArXiv, vol. abs/2112.08633, 2021.
[35] X. Wang, W. Zhu, and W. Y. Wang, “Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning,” ArXiv, vol. abs/2301.11916, 2023.
[36] S.-Y. Miao, C.-C. Liang, and K.-Y. Su, “A diverse corpus for evaluating and developing english math word problem solvers,” arXiv preprint arXiv:2106.15772, 2021.
[37] A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A question answering challenge targeting commonsense knowledge,” arXiv preprint arXiv:1811.00937, 2018.
[38] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. hsin Chi, F. Xia, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” ArXiv, vol. abs/2201.11903, 2022.
[39] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie, and J. rong Wen, “A survey of large language models,” ArXiv, vol. abs/2303.18223, 2023.
[40] S. Qiao, Y. Ou, N. Zhang, X. Chen, Y. Yao, S. Deng, C. Tan, F. Huang, and H. Chen, “Reasoning with language model prompting: A survey,” ArXiv, vol. abs/2212.09597, 2022.
[41] T. Schick and H. Schu ̈tze, “It’s not just size that matters: Small language models are also few-shot learners,” ArXiv, vol. abs/2009.07118, 2020.
[42] D. Giampiccolo, B. Magnini, I. Dagan, and W. B. Dolan, “The third pascal recognizing textual entailment challenge,” in ACL-PASCAL@ACL, 2007.
[43] L. Bentivogli, P. Clark, I. Dagan, and D. Giampiccolo, “The sixth pascal recognizing textual entailment challenge,” in Text Analysis Conference, 2009.
[44] H. J. Levesque, E. Davis, and L. Morgenstern, “The winograd schema challenge,” in International Conference on Principles of Knowledge Representation and Reasoning, 2011.
[45] M. Roemmele, C. Bejan, and A. Gordon, “Choice of plausible alternatives: An evaluation of commonsense causal reasoning.,” 01 2011.
[46] M.-C. de Marneffe, M. Simons, and J. Tonhauser, “The commitmentbank: Investigating projection in naturally occurring discourse,” 2019.
[47] P. Nakov, A. Barro ́n-Ceden ̃o, G. Da San Martino, F. Alam, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, H. Mubarak, and A. Nikolov, “Overview of the clef-2022 checkthat! lab task 1 on identifying relevant claims in tweets,” 2022.
[48] F. Alam, S. Shaar, A. Nikolov, H. Mubarak, G. D. S. Martino, A. Abdelali, F. Dalvi, N. Durrani, H. Sajjad, K. Darwish, and P. Nakov, “Fighting the covid-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society,” in Conference on Empirical Methods in Natural Language Processing, 2020.
[49] Q. Zhong, L. Ding, Y. Zhan, Y. Qiao, Y. Wen, L. Shen, J. Liu, B. Yu, B. Du, Y. Chen, X. Gao, C. Miao, X. Tang, and D. Tao, “Toward efficient language model pretraining and downstream adaptation via self-evolution: A case study on superglue,” 2022.
[50] D. Tam, R. R. Menon, M. Bansal, S. Srivastava, and C. Raffel, “Improving and simplifying pattern exploiting training,” 2021.
[51] A. Savchev, “Ai rational at checkthat! 2022: using transformer models for tweet classification,” Working Notes of CLEF, 2022.
[52] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pre-training approach,” 2019.
[53] S. Agrestia, A. Hashemianb, and M. Carmanc, “Polimi-flatearthers at check-that! 2022: Gpt-3 applied to claim detection,” Working Notes of CLEF, 2022.
[54] N. Buliga and M. Raschip, “Zorros at checkthat! 2022: Ensemble model for identifying relevant claims in tweets,” 2022.
[55] OpenAI, “Gpt-4 technical report,” 2023.
[56] V. Kocijan, T. Lukasiewicz, E. Davis, G. F. Marcus, and L. Morgenstern, “A review of winograd schema challenge datasets and approaches,” ArXiv, vol. abs/2004.13831, 2020.
[57] J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu, D. Huang, D. Zhou, and T. Ma, “Larger language models do in-context learning differently,” 2023.
[58] Y. Qin, S. Hu, Y. Lin, W. Chen, N. Ding, G. Cui, Z. Zeng, Y. Huang, C. Xiao, C. Han, Y. R. Fung, Y. Su, H. Wang, C. Qian, R. Tian, K. Zhu, S. Liang, X. Shen, B. Xu, Z. Zhang, Y. Ye, B. Li, Z. Tang, J. Yi, Y. Zhu, Z. Dai, L. Yan, X. Cong, Y. Lu, W. Zhao, Y. Huang, J. Yan, X. Han, X. Sun, D. Li, J. Phang, C. Yang, T. Wu, H. Ji, Z. Liu, and M. Sun, “Tool learning with foundation models,” 2023.
[59] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096-
dc.description.abstract大型語言模型近年來已展示出其驚人的能力,特別是在不需要任何參數更新的情況下,能使用少量的範例進行學習來完成各種任務。然而,學習範例的選擇大大影響了模型的表現,進而影響了穩定性。受到相關研究利用大型語言模型的推理能力和選擇多元化範例提升表現的啟發,我們提出了新的學習範例選擇策略,名為步步精心(Let's Select Step by Step)。我們利用大型語言模型進行客製於下游任務的資料分群並生成解釋,接著透過高效率的篩選機制選擇最佳的學習範例。實驗結果表明,我們的方法能在大型語言模型擅長的任務上更進一步地提高了模型表現和穩定性,而此方法的成效更是隨著利用進階模型而增強,在需要進階能力的任務表現上也有所提升。zh_TW
dc.description.abstractLarge Language Models (LLMs) have shown a remarkable ability to perform various tasks using few-shot in-context learning from a limited number of demonstration examples, without requiring parameter updates. However, the performance of such learning is notably inconsistent across different example sets. Inspired by related work highlighting the benefits of utilizing the reasoning ability of LLMs and diverse examples, we propose a method, Let's Select Step by Step (LSSS), for demonstration example selection. By leveraging LLMs, we carry out task-specific clustering and explanation generation, followed by an efficient evaluation to select better demonstration examples. Experimental results indicate that our method enhances both performance and stability on tasks where LLMs typically excel. Moreover, for tasks demanding specific linguistic capabilities, employing more advanced LLMs could further boost the effectiveness of our approach.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:23:53Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-22T17:23:53Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Objective 2
1.3 Thesis Organization 3
2 Related Work 4
2.1 Large Language Models 4
2.2 Prompting 5
2.2.1 Few-shot In-Context Learning 5
2.2.2 Chain-of-thought Prompting 7
3 Problem Definition 9
3.1 Few-shot In-Context Learning 9
3.2 Notations 10
4 Methodology 11
4.1 Demonstration Selection by Clustering 12
4.1.1 Sampling and Batch Formulation 13
4.1.2 Task-Specific Clustering 13
4.1.3 Selection of Representative Examples 14
4.2 Demonstration Explanation 16
4.2.1 CoT Prompting for Explanation Inference 16
4.3 Demonstration Selection by Score 17
4.3.1 Multi-Stage Evaluation Strategy 18
5 Experiments and Analysis 20
5.1 Dataset and Metrics 20
5.1.1 Dataset 21
5.1.2 Metrics 26
5.2 Experiment Setting 26
5.3 Compared Methods 29
5.3.1 Compared With Baselines 29
5.3.2 Compared With Adaptation Of Related Work And State-of-the Arts 29
5.4 Main Experiment Results 31
5.4.1 SuperGLUE 31
5.4.2 CLEF-2022CheckThat!LabTask1 37
5.5 Quantitative Analysis 44
5.5.1 Does the proposed prompt-based task-specific clustering method work? 44
5.5.2 Does providing explanations work? To what extent does the permutation within demonstrations impact overall performance? 49
5.5.3 What are the trade-offs between our multi-stage selection strategy and the alternative approaches? 51
5.5.4 How does the number of the final demonstration set Sopt impact the overall performance? 51
6 Conclusion 54
6.1 Contribution 54
6.2 Limitation and FutureWork 55
Bibliography 56
A Prompting Details 64
A.1 Prompting Details of Demonstration Selection by Clustering 64
A.2 Prompting Details of Demonstration Explanation 64
A.3 Prompting Details of Demonstration Selection by Score and Prediction 64
A.4 TaskDescription 69
B Hand-crafted Examples 72
B.1 CLEF-2022CheckThat!LabTask1 72
B.2 SuperGLUE 72
-
dc.language.isoen-
dc.subject提示工程zh_TW
dc.subject少樣本情境學習zh_TW
dc.subject大型語言模型zh_TW
dc.subject自然語言處理zh_TW
dc.subjectNatural Language Processingen
dc.subjectFew-shot In-context Learningen
dc.subjectPrompt Engineeringen
dc.subjectLarge Language Modelsen
dc.title步步精心:用於少樣本情境學習的階段性學習範例選擇策略zh_TW
dc.titleLet’s Select Step by Step(LSSS): A Demonstration Selection Strategy For Few-Shot In-Context Learningen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蔡宗翰;古倫維;李育杰;郭彥伶zh_TW
dc.contributor.oralexamcommitteeTzong-Han Tsai;Lun-Wei Ku;Yuh-Jye Lee;Yen-Ling Kuoen
dc.subject.keyword少樣本情境學習,大型語言模型,提示工程,自然語言處理,zh_TW
dc.subject.keywordFew-shot In-context Learning,Large Language Models,Prompt Engineering,Natural Language Processing,en
dc.relation.page74-
dc.identifier.doi10.6342/NTU202303119-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2023-08-08-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
dc.date.embargo-lift2027-06-07-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
4.47 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved