請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98825| 標題: | SEAL:基於可信執行環境的終端語言模型之安全高效自適應分層 SEAL: Secure and Efficient Adaptive Layering for On-Device Language Models with TEE |
| 作者: | 張紋慈 Wen-Tzu Chang |
| 指導教授: | 陳銘憲 Ming-Syan Chen |
| 關鍵字: | 邊緣運算,裝置上人工智慧,小語言模型,可信任執行環境,安全推論,模型竊取, Edge Computing,On-Device AI,Small Language Models,Trusted Execution Environment (TEE),Secure Inference,Model Stealing, |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 本研究提出 SEAL (安全高效自適應分層技術),一個創新的框架,旨在透過策略性地整合可信執行環境 (Trusted Execution Environments, TEEs),以提升終端小型語言模型 (Small Language Models, SLMs) 推論的安全性與效率。儘管 SLMs 在邊緣部署方面展現巨大潛力,但它們面臨著設備資源有限以及模型智慧財產 (Intellectual Property, IP) 竊取和資料外洩等嚴峻的安全挑戰。現有解決方案往往迫使開發者在安全性或效率上做出選擇,缺乏一個能夠彈性權衡的系統性方法。
為了解決這個問題,我們提出了 SEAL,一個能為終端推論提供基於量化數據,在安全性與效率之間進行權衡的框架。SEAL 引入了兩個關鍵組件:機密層分析 (Confidential Layer Analysis, CLA),它能定量評估每個模型層的機密重要性;以及層重要性引導的自適應分區 (Layer Importance-Guided Adaptive Partition, LIAP)演算法,它根據設備限制,將最敏感且開銷低的層映射到 TEE 中,同時將其他層保留在富執行環境 (Rich Execution Environment, REE) 中以維持效能。此方法將敏感且記憶體佔用量低的層分配至 TEE 中以實現最大安全性,而將非敏感層保留在 REE 中以提升效率。 我們的實驗結果有力地證明,相較於完全在 TEE 中執行,SEAL 能顯著降低推論延遲、記憶體使用量與功耗。以 INT4 量化 Qwen3-0.6B 模型在 WikiQA 上的實驗為例,SEAL 僅需保護一個關鍵層,即可將模型竊取風險降低 65.8%,同時只增加約 22% 的延遲和記憶體開銷。進一步地,當保護經 CLA 選出的前五個最敏感層時,SEAL 可將模型參數重建成功率 (MPRSR) 降低至僅 7.9%,同時相較於完整的 TEE 部署,推論時間與能耗約可減少 50%。在此同時,SEAL 在面對未受保護的 REE 基線時,仍能保持具競爭力的效能。 最終,SEAL 將安全性重新定義為一個優化問題,為邊緣 AI 實現了實用且值得信賴的安全推論。 We introduce SEAL (Secure and Efficient Adaptive Layering), a novel framework designed to enhance the security and efficiency of on-device Small Language Model (SLM) inference by strategically integrating Trusted Execution Environments (TEEs). While SLMs offer great promise for edge deployment, they face significant challenges regarding limited device resources and paramount security concerns, such as model intellectual property (IP) theft and data breaches. Existing solutions often force a difficult trade-off, compelling a choice between security and efficiency rather than enabling a flexible balance. To address this, we propose SEAL (Secure and Efficient Adaptive Layering), a framework that enables informed, quantitative trade-offs between security and efficiency for on-device inference. SEAL introduces two key components: Confidential Layer Analysis (CLA), which quantitatively assesses the confidentiality of each model layer, and the Layer Importance-Guided Adaptive Partition (LIAP) algorithm, which maps the most sensitive, low-overhead layers into the TEE based on device constraints, while retaining others in the REE to preserve performance. This approach assigns sensitive, low-memory footprint layers to the TEE for maximum security, while non-sensitive layers remain in the Rich Execution Environment (REE) for efficiency. Our experimental results robustly demonstrate that SEAL significantly decreases inference latency, memory usage, and power consumption compared to complete TEE execution. Experiments using the INT4-quantized Qwen3-0.6B model on WikiQA demonstrate that SEAL reduces model theft risk by 65.8%—with only a 22% increase in latency and memory—by protecting a single critical layer. When securing the top five layers, SEAL reduces MPRSR to 7.9%, while reducing time and energy consumption by roughly 50% compared to full-TEE deployment. SEAL reframes security as an optimization problem, enabling practical and trustworthy secure inference for edge AI. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98825 |
| DOI: | 10.6342/NTU202504203 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2025-08-20 |
| 顯示於系所單位: | 電機工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 1.37 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
