拆解上下文學習：透過函數混合式訓練探討其內在機制與分布外泛化能力

黃竑鈞; Hung-Chun Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98578

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德	zh_TW
dc.contributor.advisor	Shou-De Lin	en
dc.contributor.author	黃竑鈞	zh_TW
dc.contributor.author	Hung-Chun Huang	en
dc.date.accessioned	2025-08-18T00:56:57Z	-
dc.date.available	2025-08-18	-
dc.date.copyright	2025-08-15	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-04	-
dc.identifier.citation	[1] Ekin Akyu¨rek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear mod- els. In The Eleventh International Conference on Learning Representations, 2023. [2] Anonymous. Induction heads as a primary mechanism for pattern matching in in-context learning. In Submitted to ACL Rolling Review - June 2024, 2024. under review. [3] Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as statisticians: Provable in-context learning with in-context algorithm selection. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [4] Harmon Bhasin, Timothy Ossowski, Yiqiao Zhong, and Junjie Hu. How does multi-task training affect transformer in-context capabilities? investigations with function classes, 2024. [5] Satwik Bhattamishra, Arkil Patel, Phil Blunsom, and Varun Kanade. Under- standing in-context learning in transformers and LLMs by learning to learn discrete functions. In The Twelfth International Conference on Learning Rep- resentations, 2024. [6] Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCan- dlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020. [7] Xingwu Chen, Lei Zhao, and Difan Zou. How transformers utilize multi-head attention in in-context learning? a case study on sparse linear regression. In ICML 2024 Workshop on Theoretical Foundations of Foundation Models, 2024. [8] Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. [9] Noah Hollmann, Samuel Mu¨ller, Katharina Eggensperger, and Frank Hutter. TabPFN: A transformer that solves small tabular classification problems in a second. In NeurIPS 2022 First Table Representation Workshop, 2022. [10] Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, and Pin-Yu Chen. How do nonlinear transformers learn and generalize in in-context learning? In ICML, 2024. [11] Hongkang Li, Meng Wang, Songtao Lu, Hui Wan, Xiaodong Cui, and Pin-Yu Chen. Transformers as multi-task feature selectors: Generalization analysis of in-context learning. In NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning, 2023. [12] Yingcong Li, M. Emrullah Ildiz, Dimitris Papailiopoulos, and Samet Oymak. Transformers as algorithms: Generalization and stability in in-context learning, 2023. [13] Yingcong Li, Xupeng Wei, Haonan Zhao, and Taigao Ma. Can mamba in- context learn task mixtures? In ICML 2024 Workshop on In-Context Learning, 2024. [14] Samuel Mu¨ller, Noah Hollmann, Sebastian Pineda Arango, Josif Grabocka, and Frank Hutter. Transformers can do bayesian inference. In International Conference on Learning Representations, 2022. [15] Allan Raventos, Mansheej Paul, Feng Chen, and Surya Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regres- sion. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. [16] Lingfeng Shen, Aayush Mishra, and Daniel Khashabi. Do pre-trained trans- formers really learn in-context by gradient descent?, 2024. [17] Jiajun Song, Zhuoyan Xu, and Yiqiao Zhong. Out-of-distribution generaliza- tion via composition: a lens through induction heads in transformers. In The Second Conference on Parsimony and Learning (Recent Spotlight Track), 2025. [18] Nilesh Tripuraneni, Lyric Doshi, and Steve Yadlowsky. Can transformers in- context learn task mixtures? In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models, 2024. [19] Jonas von Oswald, Elias Niklasson, Eric Randazzo, Joa˜o Sacramento, Alexan- der Mordvintsev, Andrey Zhmoginov, and Maxim Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Learning Representations (ICLR), pages 35151–35174, 2023. [20] Qixun Wang, Yifei Wang, Xianghua Ying, and Yisen Wang. Can in-context learning really generalize to out-of-distribution tasks? In The Thirteenth In- ternational Conference on Learning Representations, 2025. [21] Zhijie Wang, Bo Jiang, and Shuai Li. In-context learning on function classes unveiled for transformers. In Forty-first International Conference on Machine Learning, 2024. [22] Zhijie Wang, Bo Jiang, and Shuai Li. Transformers perform in-context learning through neural networks, 2024. [23] Noam Wies, Yoav Levine, and Amnon Shashua. The learnability of in-context learning. In Thirty-seventh Conference on Neural Information Processing Sys- tems, 2023. [24] Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, and Peter Bartlett. How many pretraining tasks are needed for in-context learning of linear regression? In The Twelfth International Conference on Learning Representations, 2024. [25] Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An ex- planation of in-context learning as implicit bayesian inference. In International Conference on Learning Representations, 2022. [26] Steve Yadlowsky, Lyric Doshi, and Nilesh Tripuraneni. Pretraining data mix- tures enable narrow model selection capabilities in transformer models, 2023.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98578	-
dc.description.abstract	當代基於 Transformer 的語言模型在各類真實任務中展現出卓越的表現，但其內部運作機制仍未被完全理解。近期研究逐漸聚焦於「上下文學習」（in-context learning, ICL）現象，以及模型超越訓練分布進行泛化的能力。然而，多數研究是在簡化條件下進行的，訓練與評估皆以單一、明確定義的函數生成提示（Prompt），這使得模型在結構更為多樣或模糊的情境下的表現仍不明朗。本研究探討 ICL 透過「混合訓練」（Blended Training）後的行為，其中每個訓練提示由多個不同類型函數隨機抽樣產生，且不提供任何明確的任務標記或結構線索。我們以標準的 ICL 任務（如線性分類或二次分類... 等）為基礎，透過自行設計的實驗來驗證假說，並評估此訓練方式對模型行為、抗噪性與泛化能力的影響。實驗結果顯示，在混合訓練情境下，模型並不是用單一函數為主軸進行函數選擇，而是展現出更具彈性的模式識別能力、對輸入雜訊的更強韌容忍度，以及更佳的異常情境泛化能力。這些發現指出，訓練中引入結構多樣性的提示，有助於提升模型在未知環境下的適應性。	zh_TW
dc.description.abstract	Transformer-based language models have achieved remarkable success across a wide range of real-world tasks, yet the internal mechanisms that govern their behavior remain only partially understood. Recent research has increasingly focused on the phenomenon of in-context learning (ICL) and its ability to generalize beyond the training distribution. However, many of these studies are conducted under simplified conditions, where both training and evaluation use prompts derived from a single, clearly defined function. As a result, it remains unclear how models behave in more structurally diverse or ambiguous settings. In this study, we examine ICL under a blended training paradigm, in which each training prompt contains examples sampled from multiple function classes, without any explicit task identifiers or structural signals. Using standard ICL benchmarks such as linear and quadratic classification, we assess how this training approach influences model behavior, robustness, and generalization. Our findings indicate that under blended training, the commonly observed function selection behavior, where the model implicitly identifies and applies a single underlying function, plays a less central role. Instead, the model demonstrates more flexible pattern recognition, improved resilience to input noise, and stronger generalization to out-of-distribution tasks. These results suggest that training on structurally mixed prompts can enhance a model’s adaptability in unfamiliar scenarios.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T00:56:57Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-18T00:56:57Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Acknowledgements ii Abstract (Chinese) iii Abstract iv Table of Contents vi List of Tables viii List of Figures ix Chapter 1.Introduction 1 Chapter 2.Background 4 2.1 In-Context Learning as Function Learning 4 2.2 Multi-Function Contexts and Task Mixture 5 2.3 Attention Analysis in ICL 6 2.4 Generalization to Out-of-Distribution Functions 7 Chapter 3.Task Design and Generalization Settings 8 3.1 Category 1: LC vs. CC Binary Task Mixture 8 3.2 Category 2: QC vs. LC vs. R Multiple Task Mixture 10 3.3 Category 3: Functions used to test generalization 10 Chapter 4.Training and Evaluation Setup 12 4.1 Training Detail 12 4.2 Evaluation Method 14 Chapter 5. Experimental Results 15 5.1 Performance Validation 15 5.2 Mechanism Analysis 17 5.2.1 (1) Function Mixture Test 17 5.2.2 (2) Out-Of-Distribution Function Test 19 5.2.3 (3) Model Bias Test 21 5.2.4 (4) Attention Head Analysis 22 5.3 Generalization and Robustness 25 5.3.1 OOD Generalization Comparison with Noise-Augmented Model 25 5.3.2 Robustness Under Noisy Inference 27 Chapter 6. Conclusion 29 Bibliography 30	-
dc.language.iso	en	-
dc.subject	上下文學習	zh_TW
dc.subject	混合訓練	zh_TW
dc.subject	函數混合	zh_TW
dc.subject	函數選擇	zh_TW
dc.subject	分布外泛化	zh_TW
dc.subject	OOD Generalization	en
dc.subject	Blended Training	en
dc.subject	Function Mixture	en
dc.subject	Function Selection	en
dc.subject	In-Context Learning	en
dc.title	拆解上下文學習：透過函數混合式訓練探討其內在機制與分布外泛化能力	zh_TW
dc.title	Unpacking In-Context Learning: Underlying Mechanism and Out-of-Distribution Generalization via Blended Training on Function Mixture	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林軒田;廖耿德;李宏毅	zh_TW
dc.contributor.oralexamcommittee	Hsuan-Tian Lin;Keng-Te Liao;Hung-Yi Lee	en
dc.subject.keyword	上下文學習,混合訓練,函數混合,函數選擇,分布外泛化,	zh_TW
dc.subject.keyword	In-Context Learning,Blended Training,Function Mixture,Function Selection,OOD Generalization,	en
dc.relation.page	33	-
dc.identifier.doi	10.6342/NTU202503867	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2025-08-08	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2025-08-18	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	2.19 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。