賦能大型語言模型多領域資源挑戰

蔡昀達; Yun-Da Tsai

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97381

標題:	賦能大型語言模型多領域資源挑戰 Generalizing Large Language Model Usability Across Resource-Constrained
作者:	蔡昀達 Yun-Da Tsai
指導教授:	林守德 Shou-De Lin
關鍵字:	大語言模型,多模態對齊,程式碼生成,推理模型,提示詞優化,推理階段擴展,語言不確定性,硬體描述語言,Verilog, Large Language Model,Multimodal Alignment,Code Generation,Reasoning Modeling,Prompt Optimization,Inference Scaling,Language Uncertainty,RTL,Verilog,
出版年 :	2025
學位:	博士
摘要:	大型語言模型（Large Language Models, LLMs）在各類自然語言任務上取得了顯著成果，近年來亦積極拓展至多模態領域與資源受限環境。然而，現有方法多仰賴高成本的監督式微調，或假設訓練與推論條件相同，因而在面對未見模態、有限資料或計算資源受限情境時，泛化能力仍存在顯著侷限。本論文系統性地探討提升大型語言模型在現實環境中可用性的途徑，聚焦於泛化能力與資源限制下的適應性。首先，提出一套以文字為中心的多模態對齊框架，將文本、圖像、表格及波形等異質模態轉換為自然語言描述，使模型能夠透過即時提示學習（in-context learning）應對未見或動態變化的模態組合，無需重新訓練。為強化模型在面對噪聲或缺失模態時的魯棒性，本論文亦設計出對抗式提示（adversarial prompting）技術，於提示層級生成語意挑戰性高的擾動資料，以提升模型韌性。除多模態對齊外，論文亦探討推論階段最佳化策略，透過提示搜尋與不確定性量化，於無需額外訓練的情況下提升模型效能，為傳統擴大參數規模或重訓練以外，提供另一種高效途徑。同時，本研究針對資源稀缺領域如 Verilog 程式碼生成，設計出正確性保證的合成資料生成流程及邏輯增強型推理模型，於有限資料條件下達成最新最佳表現。綜合上述，本論文提出的方法在對齊、最佳化與合成資料生成三大面向上，皆展現了在不同模態、資源限制與應用場景下，顯著提升大型語言模型適用性、擴展性與效率的潛力。 Large Language Models (LLMs) have achieved remarkable success across a wide range of natural language tasks, and recent efforts have sought to extend their capabilities to multimodal domains and resource-constrained environments. However, existing approaches often rely on costly supervised fine-tuning or assume fixed training conditions, limiting their generalization when facing unseen modalities, limited data, or restricted compute resources. This dissertation presents a systematic study toward generalizing LLM usability under real-world constraints. First, it introduces a robust text-centric alignment framework that enables LLMs to seamlessly integrate diverse modalities—including text, images, tables, and any modalities — via natural language interfaces. This approach supports in-context adaptation to unseen or dynamically changing modalities without requiring retraining. To enhance robustness against noisy and missing modalities, an adversarial prompting technique is proposed, generating semantically challenging perturbations at the prompt level to stress-test model reliability. Beyond multimodal setting, the dissertation investigates inference-time optimization strategies for LLMs, leveraging prompt search and uncertainty quantification to improve performance without additional model training. This perspective offers an efficient alternative to scaling model parameters or retraining from scratch. Additionally, the work addresses low-resource domains such as Verilog code generation by designing correct-by-construction synthetic data pipelines and logic-enhanced reasoning models, achieving state-of-the-art performance with minimal data. Together, these contributions form a unified effort to enhance the adaptability, scalability, and efficiency of large language models under practical constraints. The results demonstrate that principled methods for alignment, optimization, and synthetic data generation can significantly broaden the usability of LLMs across modalities, resource regimes, and application domains.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97381
DOI:	10.6342/NTU202500894
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-05-23
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf	13.17 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。