利用通用多重提示下的越獄攻擊

許郁翎; Yu-Ling Hsu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97161

標題:	利用通用多重提示下的越獄攻擊 Jailbreaking with Universal Multi-Prompts
作者:	許郁翎 Yu-Ling Hsu
指導教授:	陳尚澤 Shang-Tse Chen
關鍵字:	越獄攻擊,集束搜索,大型語言模型,自然語言處理,深度學習, Jailbreak,Beam Search,Large Language Model,Natural Language Processing,Deep Learning,
出版年 :	2025
學位:	碩士
摘要:	大型語言模型(LLM)近年來快速發展,革新了各種應用,大大提高了便利性和生產力。然而,隨著其強大功能的出現,倫理問題和新型態的攻擊(如越獄攻擊)也隨之產生。雖然許多現有研究由於簡單性和靈活性而專注於個體攻擊策略,但對尋求提升對未見數據可轉移性的通用方法的研究較少。在本文中,我們設計一個方法,用於針對越獄攻擊在通用設定的情境中優化多重提示。此外,我們對方法的設計延伸到防禦的情境上。實驗結果說明我們的方法可以在控制可讀性的情況下達到高攻擊率。 Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While many existing studies focus on individual attack strategies due to their simplicity and flexibility, there is limited research on universal approaches, which seek to find generalizable checkpoints to optimize across datasets and improve transferability to unseen data. In this paper, we introduce JUMP, a method designed to discover adversarial multi-prompts in a universal setting. We also adapt our approach for defense, which we term DUMP. Experimental results show that our method for optimizing universal multi-prompts surpasses existing techniques.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97161
DOI:	10.6342/NTU202500547
全文授權:	同意授權(全球公開)
電子全文公開日期:	2025-02-28
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf	1.46 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。