外掛式語言模型：利用一個簡單的迴歸模型控制文本生成

楊奈其; Nai-Chi Yang

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744

Title:	外掛式語言模型：利用一個簡單的迴歸模型控制文本生成 Plug-in Language Model: Controlling Text Generation with a Simple Regression Model
Authors:	楊奈其 Nai-Chi Yang
Advisor:	馬偉雲 Wei-Yun Ma
Co-Advisor:	鄭卜壬 Pu-Jen Cheng
Keyword:	控制生成,自然語言生成,預訓練語言模型,強化學習,機器學習,深度學習, Controlled Generation,Natural Language Generation,Pre-trained Language Model,Reinforcement Learning,Machine Learning,Deep Learning,
Publication Year :	2023
Degree:	碩士
Abstract:	大型預訓練語言模型（LLMs）在海量數據的訓練中展示出無與倫比的能力，已經能夠生成與人類極為相似的文本。然而，在不進行微調或增加額外參數的條件下生成符合特定條件的文本，仍然是一個具有挑戰性的任務。目前避免修改語言模型的策略，主要使用 prompts 或外加的分類器。這些分類器被開發用於決定或預測生成的 token 是否有助於達成所需目標。這些方法通過利用所需屬性的預測分數計算梯度，從而在推理階段改變下一個 token 的輸出分佈。然而，這些分類器模型通常需要使用語言模型的潛在狀態為輸入，這阻礙了使用許多現成的黑盒模型或工具。為了克服這些限制，我們提出了外掛式語言模型（PiLM）作為解決方案。PiLM 利用強化學習直接使用黑盒工具協助調整潛在狀態來達成控制文本生成。同時我們訓練一個簡單的回歸模型取代反向傳播梯度這一緩慢的過程，使PiLM幾乎不會增加生成文本所需時間成本。通過在三種控制生成任務上的驗證，我們的方法展示出優於現有的基於梯度更新、加權解碼或使用prompts的方法的成果。 Large-scale pre-trained language models (LLMs), trained on massive datasets, have displayed unrivaled capacity in generating text that closely resembles human-written text. Nevertheless, generating texts adhering to specific conditions without finetuning or the addition of new parameters proves to be a challenging task. Current strategies, which avoid modifying the language model, typically use either prompts or an auxiliary attribute classifier/predictor. These classifiers are developed to determine or predict if a generated token aids in achieving the desired attribute's requirements. These methods manipulate the token output distribution during the inference phase by utilizing the prediction score of the required attribute to compute gradients. However, these classifier models usually need to have the Language Learning Model's (LLM's) latent states as inputs. This requirement obstructs the use of numerous pre-existing black-box attribute models or tools. To address the limitations, we present the Plug-in Language Model (PiLM) as a solution. PiLM leverages reinforcement learning to directly utilize black-box tools, aiding in the adjustment of the latent state for controlled text generation. Furthermore, by replacing the slow process of backpropagation with a simple regression model, PiLM achieves comparable inference time to the original LLM. Through validation on three controlled generation tasks, our approach demonstrated superior performance compared to existing state-of-the-art methods that rely on gradient-based, weighted decoding, or prompt-based methodologies.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744
DOI:	10.6342/NTU202301380
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2025-01-01
Appears in Collections:	資料科學學位學程

Files in This Item:

File	Size	Format
ntu-111-2.pdf	687.71 kB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets