Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096Full metadata record
| ???org.dspace.app.webui.jsptag.ItemTag.dcfield??? | Value | Language |
|---|---|---|
| dc.contributor.advisor | 許永真 | zh_TW |
| dc.contributor.advisor | Yung-Jen Hsu | en |
| dc.contributor.author | 楊敦捷 | zh_TW |
| dc.contributor.author | Dun-Jie Yang | en |
| dc.date.accessioned | 2023-09-22T17:23:53Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-09-22 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-08-05 | - |
| dc.identifier.citation | [1] A. Wang, Y. Pruksachatkun, N. Nangia, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Superglue: A stickier benchmark for general-purpose language understanding systems,” 2019.
[2] M. T. R. Laskar, M. S. Bari, M. Rahman, M. A. H. Bhuiyan, S. Joty, and J. X. Huang, “A systematic study and comprehensive evaluation of chatgpt on benchmark datasets,” 2023. [3] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022. [4] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. J. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” ArXiv, vol. abs/2005.14165, 2020. [5] T. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate before use: Improving few-shot performance of language models,” in International Conference on Machine Learning, 2021. [6] Y. Zhang, S. Feng, and C. Tan, “Active example selection for in-context learning,” ArXiv, vol. abs/2211.04486, 2022. [7] J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen, “What makes good in-context examples for gpt-3?,” in Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out, 2021. [8] Z. Zhang, A. Zhang, M. Li, and A. J. Smola, “Automatic chain of thought prompting in large language models,” ArXiv, vol. abs/2210.03493, 2022. [9] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Conference on Empirical Methods in Natural Language Processing, 2019. [10] I. Levy, B. Bogin, and J. Berant, “Diverse demonstrations improve in-context compositional generalization,” ArXiv, vol. abs/2212.06800, 2022. [11] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020. [12] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” 2022. [13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2017. [14] V. Sanh, A. Webson, C. Raffel, S. H. Bach, L. Sutawika, Z. Alyafeai, A. Chaffin, A. Stiegler, T. L. Scao, A. Raja, M. Dey, M. S. Bari, C. Xu, U. Thakker, S. S. Sharma, E. Szczechla, T. Kim, G. Chhablani, N. Nayak, D. Datta, J. Chang, M. T.-J. Jiang, H. Wang, M. Manica, S. Shen, Z. X. Yong, H. Pandey, R. Bawden, T. Wang, T. Neeraj, J. Rozen, A. Sharma, A. Santilli, T. Fevry, J. A. Fries, R. Teehan, T. Bers, S. Biderman, L. Gao, T. Wolf, and A. M. Rush, “Multitask prompted training enables zero-shot task generalization,” 2021. [15] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” 2022. [16] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022. [17] J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le, “Finetuned language models are zero-shot learners,” 2021. [18] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2022. [19] T. Schick and H. Schutze, “Exploiting cloze questions for few shot text classification and natural language inference,” arXiv preprint arXiv:2001.07676, 2020. [20] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, pp. 1 – 35, 2021. [21] S. C. Y. Chan, A. Santoro, A. K. Lampinen, J. X. Wang, A. K. Singh, P. H. Richemond, J. Mcclelland, and F. Hill, “Data distributional properties drive emergent in-context learning in transformers,” ArXiv, vol. abs/2205.05055, 2022. [22] S. Shin, S.-W. Lee, H. Ahn, S. Kim, H. Kim, B. Kim, K. Cho, G. Lee, W. Park, J.-W. Ha, et al., “On the effect of pretraining corpora on in-context learning by a large-scale language model,” arXiv preprint arXiv:2204.13509, 2022. [23] S. M. Xie, A. Raghunathan, P. Liang, and T. Ma, “An explanation of in-context learning as implicit bayesian inference,” ArXiv, vol. abs/2111.02080, 2021. [24] E. Akyu ̈rek, D. Schuurmans, J. Andreas, T. Ma, and D. Zhou, “What learning algorithm is in-context learning? investigations with linear models,” ArXiv, vol. abs/2211.15661, 2022. [25] D. Dai, Y. Sun, L. Dong, Y. Hao, Z. Sui, and F. Wei, “Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers,” arXiv preprint arXiv:2212.10559, 2022. [26] J. von Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov, “Transformers learn in-context by gradient descent,” arXiv preprint arXiv:2212.07677, 2022. [27] C. Olsson, N. Elhage, N. Nanda, N. Joseph, N. DasSarma, T. Henighan, B. Mann, A. Askell, Y. Bai, A. Chen, et al., “In-context learning and induction heads,” arXiv preprint arXiv:2209.11895, 2022. [28] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui, “A survey for in-context learning,” ArXiv, vol. abs/2301.00234, 2022. [29] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, et al., “Scaling instruction-finetuned language models,” arXiv preprint arXiv:2210.11416, 2022. [30] Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp, “Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity,” arXiv preprint arXiv:2104.08786, 2021. [31] T. Sorensen, J. Robinson, C. M. Rytting, A. G. Shaw, K. Rogers, A. P. Delorey, M. Khalil, N. Fulda, and D. Wingate, “An information-theoretic approach to prompt engineering without ground truth labels,” in Annual Meeting of the Association for Computational Linguistics, 2022. [32] Z. Wu, Y. Wang, J. Ye, and L. Kong, “Self-adaptive in-context learning,” ArXiv, vol. abs/2212.10375, 2022. [33] H. Su, J. Kasai, C. H. Wu, W. Shi, T. Wang, J. Xin, R. Zhang, M. Ostendorf, L. Zettlemoyer, N. A. Smith, et al., “Selective annotation makes language models better few-shot learners,” arXiv preprint arXiv:2209.01975, 2022. [34] O. Rubin, J. Herzig, and J. Berant, “Learning to retrieve prompts for in-context learning,” ArXiv, vol. abs/2112.08633, 2021. [35] X. Wang, W. Zhu, and W. Y. Wang, “Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning,” ArXiv, vol. abs/2301.11916, 2023. [36] S.-Y. Miao, C.-C. Liang, and K.-Y. Su, “A diverse corpus for evaluating and developing english math word problem solvers,” arXiv preprint arXiv:2106.15772, 2021. [37] A. Talmor, J. Herzig, N. Lourie, and J. Berant, “Commonsenseqa: A question answering challenge targeting commonsense knowledge,” arXiv preprint arXiv:1811.00937, 2018. [38] J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. hsin Chi, F. Xia, Q. Le, and D. Zhou, “Chain of thought prompting elicits reasoning in large language models,” ArXiv, vol. abs/2201.11903, 2022. [39] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie, and J. rong Wen, “A survey of large language models,” ArXiv, vol. abs/2303.18223, 2023. [40] S. Qiao, Y. Ou, N. Zhang, X. Chen, Y. Yao, S. Deng, C. Tan, F. Huang, and H. Chen, “Reasoning with language model prompting: A survey,” ArXiv, vol. abs/2212.09597, 2022. [41] T. Schick and H. Schu ̈tze, “It’s not just size that matters: Small language models are also few-shot learners,” ArXiv, vol. abs/2009.07118, 2020. [42] D. Giampiccolo, B. Magnini, I. Dagan, and W. B. Dolan, “The third pascal recognizing textual entailment challenge,” in ACL-PASCAL@ACL, 2007. [43] L. Bentivogli, P. Clark, I. Dagan, and D. Giampiccolo, “The sixth pascal recognizing textual entailment challenge,” in Text Analysis Conference, 2009. [44] H. J. Levesque, E. Davis, and L. Morgenstern, “The winograd schema challenge,” in International Conference on Principles of Knowledge Representation and Reasoning, 2011. [45] M. Roemmele, C. Bejan, and A. Gordon, “Choice of plausible alternatives: An evaluation of commonsense causal reasoning.,” 01 2011. [46] M.-C. de Marneffe, M. Simons, and J. Tonhauser, “The commitmentbank: Investigating projection in naturally occurring discourse,” 2019. [47] P. Nakov, A. Barro ́n-Ceden ̃o, G. Da San Martino, F. Alam, M. Kutlu, W. Zaghouani, C. Li, S. Shaar, H. Mubarak, and A. Nikolov, “Overview of the clef-2022 checkthat! lab task 1 on identifying relevant claims in tweets,” 2022. [48] F. Alam, S. Shaar, A. Nikolov, H. Mubarak, G. D. S. Martino, A. Abdelali, F. Dalvi, N. Durrani, H. Sajjad, K. Darwish, and P. Nakov, “Fighting the covid-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society,” in Conference on Empirical Methods in Natural Language Processing, 2020. [49] Q. Zhong, L. Ding, Y. Zhan, Y. Qiao, Y. Wen, L. Shen, J. Liu, B. Yu, B. Du, Y. Chen, X. Gao, C. Miao, X. Tang, and D. Tao, “Toward efficient language model pretraining and downstream adaptation via self-evolution: A case study on superglue,” 2022. [50] D. Tam, R. R. Menon, M. Bansal, S. Srivastava, and C. Raffel, “Improving and simplifying pattern exploiting training,” 2021. [51] A. Savchev, “Ai rational at checkthat! 2022: using transformer models for tweet classification,” Working Notes of CLEF, 2022. [52] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pre-training approach,” 2019. [53] S. Agrestia, A. Hashemianb, and M. Carmanc, “Polimi-flatearthers at check-that! 2022: Gpt-3 applied to claim detection,” Working Notes of CLEF, 2022. [54] N. Buliga and M. Raschip, “Zorros at checkthat! 2022: Ensemble model for identifying relevant claims in tweets,” 2022. [55] OpenAI, “Gpt-4 technical report,” 2023. [56] V. Kocijan, T. Lukasiewicz, E. Davis, G. F. Marcus, and L. Morgenstern, “A review of winograd schema challenge datasets and approaches,” ArXiv, vol. abs/2004.13831, 2020. [57] J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu, D. Huang, D. Zhou, and T. Ma, “Larger language models do in-context learning differently,” 2023. [58] Y. Qin, S. Hu, Y. Lin, W. Chen, N. Ding, G. Cui, Z. Zeng, Y. Huang, C. Xiao, C. Han, Y. R. Fung, Y. Su, H. Wang, C. Qian, R. Tian, K. Zhu, S. Liang, X. Shen, B. Xu, Z. Zhang, Y. Ye, B. Li, Z. Tang, J. Yi, Y. Zhu, Z. Dai, L. Yan, X. Cong, Y. Lu, W. Zhao, Y. Huang, J. Yan, X. Han, X. Sun, D. Li, J. Phang, C. Yang, T. Wu, H. Ji, Z. Liu, and M. Sun, “Tool learning with foundation models,” 2023. [59] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90096 | - |
| dc.description.abstract | 大型語言模型近年來已展示出其驚人的能力,特別是在不需要任何參數更新的情況下,能使用少量的範例進行學習來完成各種任務。然而,學習範例的選擇大大影響了模型的表現,進而影響了穩定性。受到相關研究利用大型語言模型的推理能力和選擇多元化範例提升表現的啟發,我們提出了新的學習範例選擇策略,名為步步精心(Let's Select Step by Step)。我們利用大型語言模型進行客製於下游任務的資料分群並生成解釋,接著透過高效率的篩選機制選擇最佳的學習範例。實驗結果表明,我們的方法能在大型語言模型擅長的任務上更進一步地提高了模型表現和穩定性,而此方法的成效更是隨著利用進階模型而增強,在需要進階能力的任務表現上也有所提升。 | zh_TW |
| dc.description.abstract | Large Language Models (LLMs) have shown a remarkable ability to perform various tasks using few-shot in-context learning from a limited number of demonstration examples, without requiring parameter updates. However, the performance of such learning is notably inconsistent across different example sets. Inspired by related work highlighting the benefits of utilizing the reasoning ability of LLMs and diverse examples, we propose a method, Let's Select Step by Step (LSSS), for demonstration example selection. By leveraging LLMs, we carry out task-specific clustering and explanation generation, followed by an efficient evaluation to select better demonstration examples. Experimental results indicate that our method enhances both performance and stability on tasks where LLMs typically excel. Moreover, for tasks demanding specific linguistic capabilities, employing more advanced LLMs could further boost the effectiveness of our approach. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T17:23:53Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-09-22T17:23:53Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 1 Introduction 1
1.1 Background and Motivation 1 1.2 Research Objective 2 1.3 Thesis Organization 3 2 Related Work 4 2.1 Large Language Models 4 2.2 Prompting 5 2.2.1 Few-shot In-Context Learning 5 2.2.2 Chain-of-thought Prompting 7 3 Problem Definition 9 3.1 Few-shot In-Context Learning 9 3.2 Notations 10 4 Methodology 11 4.1 Demonstration Selection by Clustering 12 4.1.1 Sampling and Batch Formulation 13 4.1.2 Task-Specific Clustering 13 4.1.3 Selection of Representative Examples 14 4.2 Demonstration Explanation 16 4.2.1 CoT Prompting for Explanation Inference 16 4.3 Demonstration Selection by Score 17 4.3.1 Multi-Stage Evaluation Strategy 18 5 Experiments and Analysis 20 5.1 Dataset and Metrics 20 5.1.1 Dataset 21 5.1.2 Metrics 26 5.2 Experiment Setting 26 5.3 Compared Methods 29 5.3.1 Compared With Baselines 29 5.3.2 Compared With Adaptation Of Related Work And State-of-the Arts 29 5.4 Main Experiment Results 31 5.4.1 SuperGLUE 31 5.4.2 CLEF-2022CheckThat!LabTask1 37 5.5 Quantitative Analysis 44 5.5.1 Does the proposed prompt-based task-specific clustering method work? 44 5.5.2 Does providing explanations work? To what extent does the permutation within demonstrations impact overall performance? 49 5.5.3 What are the trade-offs between our multi-stage selection strategy and the alternative approaches? 51 5.5.4 How does the number of the final demonstration set Sopt impact the overall performance? 51 6 Conclusion 54 6.1 Contribution 54 6.2 Limitation and FutureWork 55 Bibliography 56 A Prompting Details 64 A.1 Prompting Details of Demonstration Selection by Clustering 64 A.2 Prompting Details of Demonstration Explanation 64 A.3 Prompting Details of Demonstration Selection by Score and Prediction 64 A.4 TaskDescription 69 B Hand-crafted Examples 72 B.1 CLEF-2022CheckThat!LabTask1 72 B.2 SuperGLUE 72 | - |
| dc.language.iso | en | - |
| dc.subject | 提示工程 | zh_TW |
| dc.subject | 少樣本情境學習 | zh_TW |
| dc.subject | 大型語言模型 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | Natural Language Processing | en |
| dc.subject | Few-shot In-context Learning | en |
| dc.subject | Prompt Engineering | en |
| dc.subject | Large Language Models | en |
| dc.title | 步步精心:用於少樣本情境學習的階段性學習範例選擇策略 | zh_TW |
| dc.title | Let’s Select Step by Step(LSSS): A Demonstration Selection Strategy For Few-Shot In-Context Learning | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 蔡宗翰;古倫維;李育杰;郭彥伶 | zh_TW |
| dc.contributor.oralexamcommittee | Tzong-Han Tsai;Lun-Wei Ku;Yuh-Jye Lee;Yen-Ling Kuo | en |
| dc.subject.keyword | 少樣本情境學習,大型語言模型,提示工程,自然語言處理, | zh_TW |
| dc.subject.keyword | Few-shot In-context Learning,Large Language Models,Prompt Engineering,Natural Language Processing, | en |
| dc.relation.page | 74 | - |
| dc.identifier.doi | 10.6342/NTU202303119 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2023-08-08 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| dc.date.embargo-lift | 2027-06-07 | - |
| Appears in Collections: | 資訊工程學系 | |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-111-2.pdf Restricted Access | 4.47 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
