Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資料科學學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor馬偉雲zh_TW
dc.contributor.advisorWei-Yun Maen
dc.contributor.author楊奈其zh_TW
dc.contributor.authorNai-Chi Yangen
dc.date.accessioned2023-09-20T16:11:42Z-
dc.date.available2025-01-01-
dc.date.copyright2023-09-20-
dc.date.issued2023-
dc.date.submitted2023-08-02-
dc.identifier.citationZ. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov. TransformerXL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988, Florence, Italy, July 2019. Association for Computational Linguistics.
S. Dathathri, A. Madotto, J. Lan, J. Hung, E. Frank, P. Molino, J. Yosinski, and R. Liu. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations, 2020.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
M. Ghazvininejad, X. Shi, J. Priyadarshi, and K. Knight. Hafez: an interactive poetry generation system. In Proceedings of ACL 2017, System Demonstrations, pages 43– 48, Vancouver, Canada, July 2017. Association for Computational Linguistics.
S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy, D. Downey, and N. A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, Online, July 2020. Association for Computational Linguistics.
A. Holtzman, J. Buys, M. Forbes, A. Bosselut, D. Golub, and Y. Choi. Learning to write with cooperative discriminators, 2018.
E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models, 2021.
M. Khalifa, H. Elsahar, and M. Dymetman. A distributional approach to controlled text generation, 2021.
B. Krause, A. D. Gotmare, B. McCann, N. S. Keskar, S. Joty, R. Socher, and N. F. Rajani. GeDi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952, Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics.
X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online, Aug. 2021. Association for Computational Linguistics.
A. Liu, M. Sap, X. Lu, S. Swayamdipta, C. Bhagavatula, N. A. Smith, and Y. Choi. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6691–6706, Online, Aug. 2021. Association for Computational Linguistics.
R. Liu, G. Xu, C. Jia, W. Ma, L. Wang, and S. Vosoughi. Data boost: Text data augmentation through reinforcement learning guided conditional generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9031–9041, Online, Nov. 2020. Association for Computational Linguistics.
X. Liu, L. Mou, F. Meng, H. Zhou, J. Zhou, and S. Song. Unsupervised paraphrasing by simulated annealing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 302–312, Online, July 2020. Association for Computational Linguistics.
F. Mireshghallah, K. Goyal, and T. Berg-Kirkpatrick. Mix and match: Learning-free controllable text generationusing energy language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 401–415, Dublin, Ireland, May 2022. Association for Computational Linguistics.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe. Training language models to follow instructions with human feedback, 2022.
A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al. Improving language understanding by generative pre-training. 2018.
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
K. Yang and D. Klein. FUDGE: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3511–3535, Online, June 2021. Association for Computational Linguistics.
L. Yu, W. Zhang, J. Wang, and Y. Yu. Seqgan: Sequence generative adversarial nets with policy gradient, 2017.
H. Zhang and D. Song. DisCup: Discriminator cooperative unlikelihood prompttuning for controllable text generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3392–3406, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics.
D. M. Ziegler, N. Stiennon, J. Wu, T. B. Brown, A. Radford, D. Amodei, P. Christiano, and G. Irving. Fine-tuning language models from human preferences, 2020.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89744-
dc.description.abstract大型預訓練語言模型(LLMs)在海量數據的訓練中展示出無與倫比的能力,已經能夠生成與人類極為相似的文本。然而,在不進行微調或增加額外參數的條件下生成符合特定條件的文本,仍然是一個具有挑戰性的任務。

目前避免修改語言模型的策略,主要使用 prompts 或外加的分類器。這些分類器被開發用於決定或預測生成的 token 是否有助於達成所需目標。這些方法通過利用所需屬性的預測分數計算梯度,從而在推理階段改變下一個 token 的輸出分佈。然而,這些分類器模型通常需要使用語言模型的潛在狀態為輸入,這阻礙了使用許多現成的黑盒模型或工具。

為了克服這些限制,我們提出了外掛式語言模型(PiLM)作為解決方案。PiLM 利用強化學習直接使用黑盒工具協助調整潛在狀態來達成控制文本生成。同時我們訓練一個簡單的回歸模型取代反向傳播梯度這一緩慢的過程,使PiLM幾乎不會增加生成文本所需時間成本。通過在三種控制生成任務上的驗證,我們的方法展示出優於現有的基於梯度更新、加權解碼或使用prompts的方法的成果。
zh_TW
dc.description.abstractLarge-scale pre-trained language models (LLMs), trained on massive datasets, have displayed unrivaled capacity in generating text that closely resembles human-written text. Nevertheless, generating texts adhering to specific conditions without finetuning or the addition of new parameters proves to be a challenging task.

Current strategies, which avoid modifying the language model, typically use either prompts or an auxiliary attribute classifier/predictor. These classifiers are developed to determine or predict if a generated token aids in achieving the desired attribute's requirements. These methods manipulate the token output distribution during the inference phase by utilizing the prediction score of the required attribute to compute gradients. However, these classifier models usually need to have the Language Learning Model's (LLM's) latent states as inputs. This requirement obstructs the use of numerous pre-existing black-box attribute models or tools.

To address the limitations, we present the Plug-in Language Model (PiLM) as a solution. PiLM leverages reinforcement learning to directly utilize black-box tools, aiding in the adjustment of the latent state for controlled text generation. Furthermore, by replacing the slow process of backpropagation with a simple regression model, PiLM achieves comparable inference time to the original LLM. Through validation on three controlled generation tasks, our approach demonstrated superior performance compared to existing state-of-the-art methods that rely on gradient-based, weighted decoding, or prompt-based methodologies.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-20T16:11:42Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-09-20T16:11:42Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsContents
口試委員審定書 i
摘要 iii
Abstract v
Contents vii
List of Figures ix
List of Tables x
1 Introduction 1
2 Related Work 5
3 Model Frameworks 7
3.1 Reinforcement Learning 8
3.2 Controller 10
4 Experiment 13
4.1 Sentiment Control 14
4.2 Topic Control 17
4.3 Language Detoxification 20
4.4 Compare of three tasks 22
4.5 inference speed 22
5 Analysis 23
5.1 Longer future effect 23
5.2 Trends in control attribute 24
5.3 Reset hidden 24
5.4 Dynamic M 25
6 Conclusion 27
References 29
Appendix A — Hyperparameters 33
A.1 PiLM-RL 33
A.2 Controller 33
Appendix B — Prefix set 35
B.1 Sentiment prefixes 35
B.2 Topic prefixes 35
Appendix C — Wordlists 37
C.1 topic wordlists 37
C.2 heldout wordlists 40
Appendix D — Generated outputs 45
D.1 Sentiment Control 45
D.2 Topic Control 47
D.3 Language Detoxification 50

List of Figures
3.1 Overview of PiLM-RL. 8
3.2 Overview of PiLM-Controller. 10
3.3 Schematic diagram of hidden space. 11
5.1 Controller loss. 24

List of Tables
4.1 The result of Sentiment Control. 15
4.2 The result of Topic Control. 18
4.3 The result of Topic Control with prompts. 18
4.4 The result of Language Detoxification 21
4.5 Comparison of inference speed. 22
5.1 Comparison of different n. 23
5.2 Result of reset hidden. 25
5.3 Result of Dynamic M. 25
A.1 Hyperparameter for PiLM-RL. 33
A.2 Hyperparameter for Controller. 33
D.3 Outputs of sentiment control. 45
D.4 Outputs of sentiment control. 46
D.5 Outputs of topic control. 47
D.6 Outputs of topic control with prompts. 48
D.7 Failure outputs of topic control with prompts. 49
D.8 Outputs of Language detoxification. 50
-
dc.language.isoen-
dc.subject控制生成zh_TW
dc.subject深度學習zh_TW
dc.subject機器學習zh_TW
dc.subject強化學習zh_TW
dc.subject預訓練語言模型zh_TW
dc.subject自然語言生成zh_TW
dc.subjectControlled Generationen
dc.subjectNatural Language Generationen
dc.subjectPre-trained Language Modelen
dc.subjectReinforcement Learningen
dc.subjectMachine Learningen
dc.subjectDeep Learningen
dc.title外掛式語言模型:利用一個簡單的迴歸模型控制文本生成zh_TW
dc.titlePlug-in Language Model: Controlling Text Generation with a Simple Regression Modelen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.coadvisor鄭卜壬zh_TW
dc.contributor.coadvisorPu-Jen Chengen
dc.contributor.oralexamcommittee張嘉惠;陳縕儂zh_TW
dc.contributor.oralexamcommitteeChia-Hui Chang;Yun-Nung Chenen
dc.subject.keyword控制生成,自然語言生成,預訓練語言模型,強化學習,機器學習,深度學習,zh_TW
dc.subject.keywordControlled Generation,Natural Language Generation,Pre-trained Language Model,Reinforcement Learning,Machine Learning,Deep Learning,en
dc.relation.page50-
dc.identifier.doi10.6342/NTU202301380-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-08-04-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資料科學學位學程-
dc.date.embargo-lift2025-01-01-
顯示於系所單位:資料科學學位學程

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf687.71 kBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved