外掛式語言模型：利用一個簡單的迴歸模型控制文本生成

楊奈其; Nai-Chi Yang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744

標題:	外掛式語言模型：利用一個簡單的迴歸模型控制文本生成 Plug-in Language Model: Controlling Text Generation with a Simple Regression Model
作者:	楊奈其 Nai-Chi Yang
指導教授:	馬偉雲 Wei-Yun Ma
共同指導教授:	鄭卜壬 Pu-Jen Cheng
關鍵字:	控制生成,自然語言生成,預訓練語言模型,強化學習,機器學習,深度學習, Controlled Generation,Natural Language Generation,Pre-trained Language Model,Reinforcement Learning,Machine Learning,Deep Learning,
出版年 :	2023
學位:	碩士
摘要:	大型預訓練語言模型（LLMs）在海量數據的訓練中展示出無與倫比的能力，已經能夠生成與人類極為相似的文本。然而，在不進行微調或增加額外參數的條件下生成符合特定條件的文本，仍然是一個具有挑戰性的任務。目前避免修改語言模型的策略，主要使用 prompts 或外加的分類器。這些分類器被開發用於決定或預測生成的 token 是否有助於達成所需目標。這些方法通過利用所需屬性的預測分數計算梯度，從而在推理階段改變下一個 token 的輸出分佈。然而，這些分類器模型通常需要使用語言模型的潛在狀態為輸入，這阻礙了使用許多現成的黑盒模型或工具。為了克服這些限制，我們提出了外掛式語言模型（PiLM）作為解決方案。PiLM 利用強化學習直接使用黑盒工具協助調整潛在狀態來達成控制文本生成。同時我們訓練一個簡單的回歸模型取代反向傳播梯度這一緩慢的過程，使PiLM幾乎不會增加生成文本所需時間成本。通過在三種控制生成任務上的驗證，我們的方法展示出優於現有的基於梯度更新、加權解碼或使用prompts的方法的成果。 Large-scale pre-trained language models (LLMs), trained on massive datasets, have displayed unrivaled capacity in generating text that closely resembles human-written text. Nevertheless, generating texts adhering to specific conditions without finetuning or the addition of new parameters proves to be a challenging task. Current strategies, which avoid modifying the language model, typically use either prompts or an auxiliary attribute classifier/predictor. These classifiers are developed to determine or predict if a generated token aids in achieving the desired attribute's requirements. These methods manipulate the token output distribution during the inference phase by utilizing the prediction score of the required attribute to compute gradients. However, these classifier models usually need to have the Language Learning Model's (LLM's) latent states as inputs. This requirement obstructs the use of numerous pre-existing black-box attribute models or tools. To address the limitations, we present the Plug-in Language Model (PiLM) as a solution. PiLM leverages reinforcement learning to directly utilize black-box tools, aiding in the adjustment of the latent state for controlled text generation. Furthermore, by replacing the slow process of backpropagation with a simple regression model, PiLM achieves comparable inference time to the original LLM. Through validation on three controlled generation tasks, our approach demonstrated superior performance compared to existing state-of-the-art methods that rely on gradient-based, weighted decoding, or prompt-based methodologies.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744
DOI:	10.6342/NTU202301380
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-01-01
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	687.71 kB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。