步步精心：用於少樣本情境學習的階段性學習範例選擇策略

楊敦捷; Dun-Jie Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許永真	zh_TW
dc.contributor.advisor	Yung-Jen Hsu	en
dc.contributor.author	楊敦捷	zh_TW
dc.contributor.author	Dun-Jie Yang	en
dc.date.accessioned	2023-09-22T17:23:53Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-05	-
dc.identifier.citation	[1] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Superglue: A stickier benchmark for general-purpose language understanding systems,” 2019. [2] M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. X. Huang, “A systematic study and comprehensive evaluation of chatgpt on benchmark datasets,” 2023. [3] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022. [4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. J. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” ArXiv, vol. abs/2005.14165, 2020. [5] T. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in International Conference on Machine Learning, 2021. [6] Y. Zhang, S. Feng, and C. Tan, “Active example selection for in-context learning,” ArXiv, vol. abs/2211.04486, 2022. [7] J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen, “What makes good in-context examples for gpt-3?,” in Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out, 2021. [8] Z. Zhang, A. Zhang, M. Li, and A. J. Smola, “Automatic chain of thought prompting in large language models,” ArXiv, vol. abs/2210.03493, 2022. [9] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Conference on Empirical Methods in Natural Language Processing, 2019. [10] I. Levy, B. Bogin, and J. Berant, “Diverse demonstrations improve in-context compositional generalization,” ArXiv, vol. abs/2212.06800, 2022. [11] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020. [12] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” 2022. [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [14] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry, J. A. Fries, R. Teehan, T. Bers, S. Biderman, L. Gao, T. Wolf, and A. M. Rush, “Multitask prompted training enables zero-shot task generalization,” 2021. [15] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” 2022. [16] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022. [17] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” 2021. [18] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2022. [19] T. Schick and H. Schutze, “Exploiting cloze questions for few shot text classification and natural language inference,” arXiv preprint arXiv:2001.07676, 2020. [20] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, pp. 1 – 35, 2021. [21] S. C. Y. Chan, A. Santoro, A. K. Lampinen, J. X. Wang, A. K. Singh, P. H. Richemond, J. Mcclelland, and F. Hill, “Data distributional properties drive emergent in-context learning in transformers,” ArXiv, vol. abs/2205.05055, 2022. [22] S. Shin, S.-W. Lee, H. Ahn, S. Kim, H. Kim, B. Kim, K. Cho, G. Lee, W. Park, J.-W. Ha, et al., “On the effect of pretraining corpora on in-context learning by a large-scale language model,” arXiv preprint arXiv:2204.13509, 2022. [23] S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, “An explanation of in-context learning as implicit bayesian inference,” ArXiv, vol. abs/2111.02080, 2021. [24] E. Akyu ̈rek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What learning algorithm is in-context learning? investigations with linear models,” ArXiv, vol. abs/2211.15661, 2022. [25] D. Dai, Y. Sun, L. Dong, Y. Hao, Z. Sui, and F. Wei, “Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers,” arXiv preprint arXiv:2212.10559, 2022. [26] J. von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov, “Transformers learn in-context by gradient descent,” arXiv preprint arXiv:2212.07677, 2022. [27] C. Olsson, N. Elhage, N. Nanda, N. Joseph, N. DasSarma, T. Henighan, B. Mann, A. Askell, Y. Bai, A. Chen, et al., “In-context learning and induction heads,” arXiv preprint arXiv:2209.11895, 2022. [28] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui, “A survey for in-context learning,” ArXiv, vol. abs/2301.00234, 2022. [29] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022. [30] Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp, “Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity,” arXiv preprint arXiv:2104.08786, 2021. [31] T. Sorensen, J. Robinson, C. M. Rytting, A. G. Shaw, K. Rogers, A. P. Delorey, M. Khalil, N. Fulda, and D. Wingate, “An information-theoretic approach to prompt engineering without ground truth labels,” in Annual Meeting of the Association for Computational Linguistics, 2022. [32] Z. Wu, Y. Wang, J. Ye, and L. Kong, “Self-adaptive in-context learning,” ArXiv, vol. abs/2212.10375, 2022. [33] H. Su, J. Kasai, C. H. Wu, W. Shi, T. Wang, J. Xin, R. Zhang, M. Ostendorf, L. Zettlemoyer, N. A. Smith, et al., “Selective annotation makes language models better few-shot learners,” arXiv preprint arXiv:2209.01975, 2022. [34] O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” ArXiv, vol. abs/2112.08633, 2021. [35] X. Wang, W. Zhu, and W. Y. Wang, “Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning,” ArXiv, vol. abs/2301.11916, 2023. [36] S.-Y. Miao, C.-C. Liang, and K.-Y. Su, “A diverse corpus for evaluating and developing english math word problem solvers,” arXiv preprint arXiv:2106.15772, 2021. [37] A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A question answering challenge targeting commonsense knowledge,” arXiv preprint arXiv:1811.00937, 2018. [38] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. hsin Chi, F. Xia, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” ArXiv, vol. abs/2201.11903, 2022. [39] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie, and J. rong Wen, “A survey of large language models,” ArXiv, vol. abs/2303.18223, 2023. [40] S. Qiao, Y. Ou, N. Zhang, X. Chen, Y. Yao, S. Deng, C. Tan, F. Huang, and H. Chen, “Reasoning with language model prompting: A survey,” ArXiv, vol. abs/2212.09597, 2022. [41] T. Schick and H. Schu ̈tze, “It’s not just size that matters: Small language models are also few-shot learners,” ArXiv, vol. abs/2009.07118, 2020. [42] D. Giampiccolo, B. Magnini, I. Dagan, and W. B. Dolan, “The third pascal recognizing textual entailment challenge,” in ACL-PASCAL@ACL, 2007. [43] L. Bentivogli, P. Clark, I. Dagan, and D. Giampiccolo, “The sixth pascal recognizing textual entailment challenge,” in Text Analysis Conference, 2009. [44] H. J. Levesque, E. Davis, and L. Morgenstern, “The winograd schema challenge,” in International Conference on Principles of Knowledge Representation and Reasoning, 2011. [45] M. Roemmele, C. Bejan, and A. Gordon, “Choice of plausible alternatives: An evaluation of commonsense causal reasoning.,” 01 2011. [46] M.-C. de Marneffe, M. Simons, and J. Tonhauser, “The commitmentbank: Investigating projection in naturally occurring discourse,” 2019. [47] P. Nakov, A. Barro ́n-Ceden ̃o, G. Da San Martino, F. Alam, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, H. Mubarak, and A. Nikolov, “Overview of the clef-2022 checkthat! lab task 1 on identifying relevant claims in tweets,” 2022. [48] F. Alam, S. Shaar, A. Nikolov, H. Mubarak, G. D. S. Martino, A. Abdelali, F. Dalvi, N. Durrani, H. Sajjad, K. Darwish, and P. Nakov, “Fighting the covid-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society,” in Conference on Empirical Methods in Natural Language Processing, 2020. [49] Q. Zhong, L. Ding, Y. Zhan, Y. Qiao, Y. Wen, L. Shen, J. Liu, B. Yu, B. Du, Y. Chen, X. Gao, C. Miao, X. Tang, and D. Tao, “Toward efficient language model pretraining and downstream adaptation via self-evolution: A case study on superglue,” 2022. [50] D. Tam, R. R. Menon, M. Bansal, S. Srivastava, and C. Raffel, “Improving and simplifying pattern exploiting training,” 2021. [51] A. Savchev, “Ai rational at checkthat! 2022: using transformer models for tweet classification,” Working Notes of CLEF, 2022. [52] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pre-training approach,” 2019. [53] S. Agrestia, A. Hashemianb, and M. Carmanc, “Polimi-flatearthers at check-that! 2022: Gpt-3 applied to claim detection,” Working Notes of CLEF, 2022. [54] N. Buliga and M. Raschip, “Zorros at checkthat! 2022: Ensemble model for identifying relevant claims in tweets,” 2022. [55] OpenAI, “Gpt-4 technical report,” 2023. [56] V. Kocijan, T. Lukasiewicz, E. Davis, G. F. Marcus, and L. Morgenstern, “A review of winograd schema challenge datasets and approaches,” ArXiv, vol. abs/2004.13831, 2020. [57] J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu, D. Huang, D. Zhou, and T. Ma, “Larger language models do in-context learning differently,” 2023. [58] Y. Qin, S. Hu, Y. Lin, W. Chen, N. Ding, G. Cui, Z. Zeng, Y. Huang, C. Xiao, C. Han, Y. R. Fung, Y. Su, H. Wang, C. Qian, R. Tian, K. Zhu, S. Liang, X. Shen, B. Xu, Z. Zhang, Y. Ye, B. Li, Z. Tang, J. Yi, Y. Zhu, Z. Dai, L. Yan, X. Cong, Y. Lu, W. Zhao, Y. Huang, J. Yan, X. Han, X. Sun, D. Li, J. Phang, C. Yang, T. Wu, H. Ji, Z. Liu, and M. Sun, “Tool learning with foundation models,” 2023. [59] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096	-
dc.description.abstract	大型語言模型近年來已展示出其驚人的能力，特別是在不需要任何參數更新的情況下，能使用少量的範例進行學習來完成各種任務。然而，學習範例的選擇大大影響了模型的表現，進而影響了穩定性。受到相關研究利用大型語言模型的推理能力和選擇多元化範例提升表現的啟發，我們提出了新的學習範例選擇策略，名為步步精心（Let's Select Step by Step）。我們利用大型語言模型進行客製於下游任務的資料分群並生成解釋，接著透過高效率的篩選機制選擇最佳的學習範例。實驗結果表明，我們的方法能在大型語言模型擅長的任務上更進一步地提高了模型表現和穩定性，而此方法的成效更是隨著利用進階模型而增強，在需要進階能力的任務表現上也有所提升。	zh_TW
dc.description.abstract	Large Language Models (LLMs) have shown a remarkable ability to perform various tasks using few-shot in-context learning from a limited number of demonstration examples, without requiring parameter updates. However, the performance of such learning is notably inconsistent across different example sets. Inspired by related work highlighting the benefits of utilizing the reasoning ability of LLMs and diverse examples, we propose a method, Let's Select Step by Step (LSSS), for demonstration example selection. By leveraging LLMs, we carry out task-specific clustering and explanation generation, followed by an efficient evaluation to select better demonstration examples. Experimental results indicate that our method enhances both performance and stability on tasks where LLMs typically excel. Moreover, for tasks demanding specific linguistic capabilities, employing more advanced LLMs could further boost the effectiveness of our approach.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:23:53Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T17:23:53Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	1 Introduction 1 1.1 Background and Motivation 1 1.2 Research Objective 2 1.3 Thesis Organization 3 2 Related Work 4 2.1 Large Language Models 4 2.2 Prompting 5 2.2.1 Few-shot In-Context Learning 5 2.2.2 Chain-of-thought Prompting 7 3 Problem Definition 9 3.1 Few-shot In-Context Learning 9 3.2 Notations 10 4 Methodology 11 4.1 Demonstration Selection by Clustering 12 4.1.1 Sampling and Batch Formulation 13 4.1.2 Task-Specific Clustering 13 4.1.3 Selection of Representative Examples 14 4.2 Demonstration Explanation 16 4.2.1 CoT Prompting for Explanation Inference 16 4.3 Demonstration Selection by Score 17 4.3.1 Multi-Stage Evaluation Strategy 18 5 Experiments and Analysis 20 5.1 Dataset and Metrics 20 5.1.1 Dataset 21 5.1.2 Metrics 26 5.2 Experiment Setting 26 5.3 Compared Methods 29 5.3.1 Compared With Baselines 29 5.3.2 Compared With Adaptation Of Related Work And State-of-the Arts 29 5.4 Main Experiment Results 31 5.4.1 SuperGLUE 31 5.4.2 CLEF-2022CheckThat!LabTask1 37 5.5 Quantitative Analysis 44 5.5.1 Does the proposed prompt-based task-specific clustering method work? 44 5.5.2 Does providing explanations work? To what extent does the permutation within demonstrations impact overall performance? 49 5.5.3 What are the trade-offs between our multi-stage selection strategy and the alternative approaches? 51 5.5.4 How does the number of the final demonstration set Sopt impact the overall performance? 51 6 Conclusion 54 6.1 Contribution 54 6.2 Limitation and FutureWork 55 Bibliography 56 A Prompting Details 64 A.1 Prompting Details of Demonstration Selection by Clustering 64 A.2 Prompting Details of Demonstration Explanation 64 A.3 Prompting Details of Demonstration Selection by Score and Prediction 64 A.4 TaskDescription 69 B Hand-crafted Examples 72 B.1 CLEF-2022CheckThat!LabTask1 72 B.2 SuperGLUE 72	-
dc.language.iso	en	-
dc.subject	提示工程	zh_TW
dc.subject	少樣本情境學習	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	Natural Language Processing	en
dc.subject	Few-shot In-context Learning	en
dc.subject	Prompt Engineering	en
dc.subject	Large Language Models	en
dc.title	步步精心：用於少樣本情境學習的階段性學習範例選擇策略	zh_TW
dc.title	Let’s Select Step by Step(LSSS): A Demonstration Selection Strategy For Few-Shot In-Context Learning	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	蔡宗翰;古倫維;李育杰;郭彥伶	zh_TW
dc.contributor.oralexamcommittee	Tzong-Han Tsai;Lun-Wei Ku;Yuh-Jye Lee;Yen-Ling Kuo	en
dc.subject.keyword	少樣本情境學習,大型語言模型,提示工程,自然語言處理,	zh_TW
dc.subject.keyword	Few-shot In-context Learning,Large Language Models,Prompt Engineering,Natural Language Processing,	en
dc.relation.page	74	-
dc.identifier.doi	10.6342/NTU202303119	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-08-08	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2027-06-07	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	4.47 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。