利用通用多重提示下的越獄攻擊

許郁翎; Yu-Ling Hsu

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97161

Title:	利用通用多重提示下的越獄攻擊 Jailbreaking with Universal Multi-Prompts
Authors:	許郁翎 Yu-Ling Hsu
Advisor:	陳尚澤 Shang-Tse Chen
Keyword:	越獄攻擊,集束搜索,大型語言模型,自然語言處理,深度學習, Jailbreak,Beam Search,Large Language Model,Natural Language Processing,Deep Learning,
Publication Year :	2025
Degree:	碩士
Abstract:	大型語言模型(LLM)近年來快速發展,革新了各種應用,大大提高了便利性和生產力。然而,隨著其強大功能的出現,倫理問題和新型態的攻擊(如越獄攻擊)也隨之產生。雖然許多現有研究由於簡單性和靈活性而專注於個體攻擊策略,但對尋求提升對未見數據可轉移性的通用方法的研究較少。在本文中,我們設計一個方法,用於針對越獄攻擊在通用設定的情境中優化多重提示。此外,我們對方法的設計延伸到防禦的情境上。實驗結果說明我們的方法可以在控制可讀性的情況下達到高攻擊率。 Large language models (LLMs) have seen rapid development in recent years, revolutionizing various applications and significantly enhancing convenience and productivity. However, alongside their impressive capabilities, ethical concerns and new types of attacks, such as jailbreaking, have emerged. While many existing studies focus on individual attack strategies due to their simplicity and flexibility, there is limited research on universal approaches, which seek to find generalizable checkpoints to optimize across datasets and improve transferability to unseen data. In this paper, we introduce JUMP, a method designed to discover adversarial multi-prompts in a universal setting. We also adapt our approach for defense, which we term DUMP. Experimental results show that our method for optimizing universal multi-prompts surpasses existing techniques.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97161
DOI:	10.6342/NTU202500547
Fulltext Rights:	同意授權(全球公開)
metadata.dc.date.embargo-lift:	2025-02-28
Appears in Collections:	資訊工程學系

Files in This Item:

File	Size	Format
ntu-113-1.pdf	1.46 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets