SEAL：基於可信執行環境的終端語言模型之安全高效自適應分層

張紋慈; Wen-Tzu Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98825

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲	zh_TW
dc.contributor.advisor	Ming-Syan Chen	en
dc.contributor.author	張紋慈	zh_TW
dc.contributor.author	Wen-Tzu Chang	en
dc.date.accessioned	2025-08-19T16:20:49Z	-
dc.date.available	2025-08-20	-
dc.date.copyright	2025-08-19	-
dc.date.issued	2025	-
dc.date.submitted	2025-08-12	-
dc.identifier.citation	References [1] M. Abdin, J. Aneja, H. Behl, S. Bubeck, R. Eldan, S. Gunasekar, M. Harrison, R. J. Hewett, M. Javaheripi, P. Kauffmann, et al. Phi-4 technical report. arXiv preprint arXiv:2412.08905, 2024. [2] F. Al-Doghman, N. Moustafa, I. Khalil, N. Sohrabi, Z. Tari, and A. Y. Zomaya. Ai-enabled secure microservices in edge computing: Opportunities and challenges. IEEE Transactions on Services Computing, 16(2):1485–1504, 2022. [3] J. Bai, M. H. I. Chowdhuryy, J. Li, F. Yao, C. Chakrabarti, and D. Fan. Phantom: Privacy-preserving deep neural network model obfuscation in heterogeneous tee and gpu system. 34th USENIX Security Symposium, 2025. [4] I. Bistritz, A. Mann, and N. Bambos. Distributed distillation for on-device learning. Advances in Neural Information Processing Systems, 33:22593–22604, 2020. [5] N. Carlini, D. Paleka, K. D. Dvijotham, T. Steinke, J. Hayase, A. F. Cooper, K. Lee, M. Jagielski, M. Nasr, A. Conmy, et al. Stealing part of a production language model. arXiv preprint arXiv:2403.06634, 2024. [6] D. Cerdeira, N. Santos, P. Fonseca, and S. Pinto. Sok: Understanding the prevailing security vulnerabilities in trustzone-assisted tee systems. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1416–1432. IEEE, 2020. [7] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, H. Shen, M. Cowan, L. Wang, Y. Hu, L. Ceze, et al. {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578–594, 2018. [8] X. Chen, J. Ding, and Z. Lu. A decentralized trust management system for intelligent transportation environments. IEEE Transactions on Intelligent Transportation Systems, 23(1):558–571, 2020. [9] P.-C. Cheng, W. Ozga, E. Valdez, S. Ahmed, Z. Gu, H. Jamjoom, H. Franke, and J. Bottomley. Intel tdx demystified: A top-down approach. ACM Computing Surveys, 56(9):1–33, 2024. [10] S. Chesterman. Good models borrow, great models steal: intellectual property rights and generative ai. Policy and Society, 44(1):23–37, 2025. [11] G. Dhanuskodi, S. Guha, V. Krishnan, A. Manjunatha, M. O’Connor, R. Nertney, and P. Rogers. Creating the first confidential gpus: The team at nvidia brings confidentiality and integrity to user code and data for accelerated computing. Queue, 21(4):68–93, 2023. [12] Q. Fu, M. Cho, T. Merth, S. Mehta, M. Rastegari, and M. Najibi. Lazyllm: Dynamic token pruning for efficient long context llm inference. arXiv preprint arXiv:2407.14057, 2024. [13] Y. Gao, H. Qiu, Z. Zhang, B. Wang, H. Ma, A. Abuadbba, M. Xue, A. Fu, and S. Nepal. Deeptheft: Stealing dnn model architectures through power side channel. In 2024 IEEE Symposium on Security and Privacy (SP), pages 3311–3326. IEEE, 2024. [14] GitHub. awesome-tee-blockchain. https://github.com/dineshpinto/awesome-tee-blockchain?tab=readme-ov-file#hardware, 2024. Accessed: July 19, 2025. [15] A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. [16] D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. Li, et al. Deepseek-coder: When the large language model meets programming–the rise of code intelligence. arXiv preprint arXiv:2401.14196, 2024. [17] M. Hao, H. Li, H. Chen, P. Xing, G. Xu, and T. Zhang. Iron: Private inference on transformers. Advances in neural information processing systems, 35:15718–15731, 2022. [18] W. Huang, Y. Wang, A. Cheng, A. Zhou, C. Yu, and L. Wang. A fast, performant, secure distributed training framework for large language model. arXiv preprint arXiv:2401.09796, 2024. [19] P. Jauernig, A.-R. Sadeghi, and E. Stapf. Trusted execution environments: properties, applications, and challenges. IEEE Security & Privacy, 18(2):56–60, 2020. [20] U. Kulkarni, S. Meena, S. V. Gurlahosur, P. Benagi, A. Kashyap, A. Ansari, and V. Karnam. Ai model compression for edge devices using optimization techniques. In Modern Approaches in Machine Learning and Cognitive Science: A Walkthrough: Latest Trends in AI, Volume 2, pages 227–240. Springer, 2021. [21] R. Kumar and A. Sharma. Edge ai: A review of machine learning models for resource-constrained devices. Artificial Intelligence and Machine Learning Review, 5(3):1–11, 2024. [22] J. D. B. Lavernelle, P.-F. Bonnefoi, B. Gonzalvo, and D. Sauveron. Dma: A persistent threat to embedded systems isolation. In 2024 IEEE 23rd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 101–108. IEEE, 2024. [23] Q. Li, Z. Shen, Z. Qin, Y. Xie, X. Zhang, T. Du, S. Cheng, X. Wang, and J. Yin. Translinkguard: safeguarding transformer models against model stealing in edge deployment. In Proceedings of the 32nd ACM International Conference on Multimedia, pages 3479–3488, 2024. [24] Y. Li, D. Zeng, L. Gu, Q. Chen, S. Guo, A. Zomaya, and M. Guo. Lasagna: Accelerating secure deep learning inference in sgx-enabled edge cloud. In Proceedings of the ACM symposium on cloud computing, pages 533–545, 2021. [25] Y. Li, D. Zeng, L. Gu, Q. Chen, S. Guo, A. Zomaya, and M. Guo. Efficient and secure deep learning inference in trusted processor enabled edge clouds. IEEE Transactions on Parallel and Distributed Systems, 33(12):4311–4325, 2022. [26] Z. Lin, S. Zhang, X. Wang, Y. Su, Y. Wang, R. Hou, and D. Meng. Loratee: A secure and efficient inference framework for multi-tenant lora llms based on tee. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. [27] Z. Liu, Y. Yuan, Y. Chen, S. Hu, T. Li, and S. Wang. Deepcache: Revisiting cache side-channel attacks in deep neural networks executables. In Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pages 4495–4508, 2024. [28] Z. Lu, X. Li, D. Cai, R. Yi, F. Liu, X. Zhang, N. D. Lane, and M. Xu. Small language models: Survey, measurements, and insights. arXiv preprint arXiv:2409.15790, 2024. [29] A. N. Mazumder, J. Meng, H.-A. Rashid, U. Kallakuri, X. Zhang, J.-S. Seo, and T. Mohsenin. A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 11(4):532–547, 2021. [30] M. Nasir, K. Muhammad, A. Ullah, J. Ahmad, S. W. Baik, and M. Sajjad. Enabling automation and edge intelligence over resource constraint iot devices for smart home. Neurocomputing, 491:494–506, 2022. [31] S. Pinto and N. Santos. Demystifying arm trustzone: A comprehensive survey. ACM computing surveys (CSUR), 51(6):1–36, 2019. [32] G. Qu, Q. Chen, W. Wei, Z. Lin, X. Chen, and K. Huang. Mobile edge intelligence for large language models: A contemporary survey. IEEE Communications Surveys & Tutorials, 2025. [33] M. Sabt, M. Achemlal, and A. Bouabdallah. Trusted execution environment: What it is, and what it is not. In 2015 IEEE Trustcom/BigDataSE/Ispa, volume 1, pages 57–64. IEEE, 2015. [34] A. Sev-Snp. Strengthening vm isolation with integrity protection and more. White Paper, January, 53(2020):1450–1465, 2020. [35] T. Sun, B. Jiang, H. Lin, B. Li, Y. Teng, Y. Gao, and W. Dong. Tensorshield: Safeguarding on-device inference by shielding critical dnn tensors with tee. arXiv preprint arXiv:2505.22735, 2025. [36] G. Team, A. Kamath, J. Ferret, S. Pathak, N. Vieillard, R. Merhej, S. Perrin, T. Matejovicova, A. Ramé, M. Rivière, et al. Gemma 3 technical report. arXiv preprint arXiv:2503.19786, 2025. [37] M. Tong, K. Chen, J. Zhang, Y. Qi, W. Zhang, N. Yu, T. Zhang, and Z. Zhang. Inferdpt: Privacy-preserving inference for black-box large language models. IEEE Transactions on Dependable and Secure Computing, 2025. [38] F. Wang, Z. Zhang, X. Zhang, Z. Wu, T. Mo, Q. Lu, W. Wang, R. Li, J. Xu, X. Tang, et al. A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and trustworthiness. arXiv preprint arXiv:2411.03350, 2024. [39] G. Wang, Y. Yang, and J. Ding. Model privacy: A unified framework to understand model stealing attacks and defenses. arXiv preprint arXiv:2502.15567, 2025. [40] X. Wang, Z. Tang, J. Guo, T. Meng, C. Wang, T. Wang, and W. Jia. Empowering edge intelligence: A comprehensive survey on on-device ai models. ACM Computing Surveys, 57(9):1–39, 2025. [41] T. Wingarz, A. Lauscher, J. Edinger, D. Kaaser, S. Schulte, and M. Fischer. Sok: Towards security and safety of edge ai. arXiv preprint arXiv:2410.05349, 2024. [42] J. Xu, Z. Li, W. Chen, Q. Wang, X. Gao, Q. Cai, and Z. Ling. On-device language models: A comprehensive review. arXiv preprint arXiv:2409.00088, 2024. [43] A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025. [44] Z. Zhang, C. Gong, Y. Cai, Y. Yuan, B. Liu, D. Li, Y. Guo, and X. Chen. No privacy left outside: On the (in-) security of tee-shielded dnn partition for on-device ml. In 2024 IEEE Symposium on Security and Privacy (SP), pages 3327–3345. IEEE, 2024. [45] Z. Zhang, L. K. Ng, B. Liu, Y. Cai, D. Li, Y. Guo, and X. Chen. Teeslice: slicing dnn models for secure and efficient deployment. In Proceedings of the 2nd ACM International Workshop on AI and Software Testing/Analysis, pages 1–8, 2022. [46] Z. Zhang, N. Wang, Z. Zhang, Y. Zhang, T. Zhang, J. Liu, and Y. Wu. Groupcover: a secure, efficient and scalable inference framework for on-device model protection based on tees. In Forty-first international conference on machine learning, 2024. [47] M. Zhou, X. Gao, J. Wu, K. Liu, H. Sun, and L. Li. Investigating white-box attacks for on-device models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–12, 2024.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98825	-
dc.description.abstract	本研究提出 SEAL (安全高效自適應分層技術)，一個創新的框架，旨在透過策略性地整合可信執行環境 (Trusted Execution Environments, TEEs)，以提升終端小型語言模型 (Small Language Models, SLMs) 推論的安全性與效率。儘管 SLMs 在邊緣部署方面展現巨大潛力，但它們面臨著設備資源有限以及模型智慧財產 (Intellectual Property, IP) 竊取和資料外洩等嚴峻的安全挑戰。現有解決方案往往迫使開發者在安全性或效率上做出選擇，缺乏一個能夠彈性權衡的系統性方法。為了解決這個問題，我們提出了 SEAL，一個能為終端推論提供基於量化數據，在安全性與效率之間進行權衡的框架。SEAL 引入了兩個關鍵組件：機密層分析 (Confidential Layer Analysis, CLA)，它能定量評估每個模型層的機密重要性；以及層重要性引導的自適應分區 (Layer Importance-Guided Adaptive Partition, LIAP)演算法，它根據設備限制，將最敏感且開銷低的層映射到 TEE 中，同時將其他層保留在富執行環境 (Rich Execution Environment, REE) 中以維持效能。此方法將敏感且記憶體佔用量低的層分配至 TEE 中以實現最大安全性，而將非敏感層保留在 REE 中以提升效率。我們的實驗結果有力地證明，相較於完全在 TEE 中執行，SEAL 能顯著降低推論延遲、記憶體使用量與功耗。以 INT4 量化 Qwen3-0.6B 模型在 WikiQA 上的實驗為例，SEAL 僅需保護一個關鍵層，即可將模型竊取風險降低 65.8%，同時只增加約 22% 的延遲和記憶體開銷。進一步地，當保護經 CLA 選出的前五個最敏感層時，SEAL 可將模型參數重建成功率 (MPRSR) 降低至僅 7.9%，同時相較於完整的 TEE 部署，推論時間與能耗約可減少 50%。在此同時，SEAL 在面對未受保護的 REE 基線時，仍能保持具競爭力的效能。最終，SEAL 將安全性重新定義為一個優化問題，為邊緣 AI 實現了實用且值得信賴的安全推論。	zh_TW
dc.description.abstract	We introduce SEAL (Secure and Efficient Adaptive Layering), a novel framework designed to enhance the security and efficiency of on-device Small Language Model (SLM) inference by strategically integrating Trusted Execution Environments (TEEs). While SLMs offer great promise for edge deployment, they face significant challenges regarding limited device resources and paramount security concerns, such as model intellectual property (IP) theft and data breaches. Existing solutions often force a difficult trade-off, compelling a choice between security and efficiency rather than enabling a flexible balance. To address this, we propose SEAL (Secure and Efficient Adaptive Layering), a framework that enables informed, quantitative trade-offs between security and efficiency for on-device inference. SEAL introduces two key components: Confidential Layer Analysis (CLA), which quantitatively assesses the confidentiality of each model layer, and the Layer Importance-Guided Adaptive Partition (LIAP) algorithm, which maps the most sensitive, low-overhead layers into the TEE based on device constraints, while retaining others in the REE to preserve performance. This approach assigns sensitive, low-memory footprint layers to the TEE for maximum security, while non-sensitive layers remain in the Rich Execution Environment (REE) for efficiency. Our experimental results robustly demonstrate that SEAL significantly decreases inference latency, memory usage, and power consumption compared to complete TEE execution. Experiments using the INT4-quantized Qwen3-0.6B model on WikiQA demonstrate that SEAL reduces model theft risk by 65.8%—with only a 22% increase in latency and memory—by protecting a single critical layer. When securing the top five layers, SEAL reduces MPRSR to 7.9%, while reducing time and energy consumption by roughly 50% compared to full-TEE deployment. SEAL reframes security as an optimization problem, enabling practical and trustworthy secure inference for edge AI.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-19T16:20:49Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-08-19T16:20:49Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iv Abstract vi Contents viii List of Figures xii List of Tables xiii Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Research Goal 2 1.3 Potential Challenges and Solutions 2 1.4 Core Concepts 3 1.5 Contributions 3 Chapter 2 Related works 5 2.1 Small Language Models (SLM) 5 2.2 Trusted Execution Environment (TEE) 5 2.3 On-Device AI Optimization 6 2.4 TEE-Based Deep Learning Security 7 2.5 Model Stealing 9 Chapter 3 Problem Statement 12 3.1 Scenario 12 3.2 Attack Scenario 13 3.2.1 Model Stealing in SLMs 13 3.2.1.1 White-box/Physical Attack 14 3.2.1.2 Example Scenario 14 3.3 Attack Methods 15 3.3.1 Attack Goal 15 3.3.2 Gradient-based Reconstruction 16 3.3.3 Prediction-based Reconstruction 17 3.4 Solution Requirements 17 Chapter 4 Methodology 19 4.1 SEAL: Secure and Efficient Adaptive Layering 19 4.1.1 Secure Data Flow in SEAL 20 4.2 Confidential Layer Analysis (CLA) 22 4.2.1 Model Confidentiality Score (MCS) 22 4.2.2 Data Confidentiality Score (DCS) 25 4.2.3 Confidentiality Priority Score (CPS) 27 4.2.4 Summary 28 4.3 Layer Importance-Guided Adaptive Partition (LIAP) 28 4.4 Security Quantification 29 Chapter 5 Experiments 31 5.1 Scope and Limitations 31 5.2 Evaluation Methodology 32 5.2.1 Evaluation Metrics 32 5.3 Experimental Setup and Workflow 34 5.3.1 Stage 1: Environment Setup 35 5.3.2 Stage 2: Model Preparation and Partitioning (@ Model Provider) 35 5.3.3 Stage 3: Secure Deployment and Inference (@End User) 36 5.3.4 Stage 4: Attack Simulation and Security Evaluation 36 5.4 Experiment Results 37 5.5 Discussion 39 5.5.1 Quantified Security Efficacy 39 5.5.2 Performance Overhead and Mitigation Strategy 41 5.5.3 Preserving Model Utility 44 5.5.4 Summary: Balancing Security and Efficiency 45 Chapter 6 Conclusion 47 Chapter 7 Future Work 49 References 50 Appendix A — Layer Importance-Guided Adaptive Partition Algorithm 57 A.1 Explanation of the LIAP Algorithm 57 A.1.1 Algorithm Exposition 58 A.1.2 Complexity Analysis 59 A.2 Rationale for Prioritizing Memory and Computational Constraints 60 A.2.1 Memory and Computational Capabilities as Hard Constraints 60 A.2.2 Runtime and Switch Costs as Estimated/Soft Constraints 61 Appendix B — Model Parameter Reconstruction Success Rate 64 B.1 MPRSR Formula 64 B.2 Explanation of Components 65 B.2.1 Output Behavior Similarity (Soutput) 65 B.2.2 Parameter Space Similarity (Sparam) 66 B.2.3 Normalized Performance of the Reconstructed Model (Sperf) 67	-
dc.language.iso	en	-
dc.subject	邊緣運算	zh_TW
dc.subject	裝置上人工智慧	zh_TW
dc.subject	小語言模型	zh_TW
dc.subject	可信任執行環境	zh_TW
dc.subject	安全推論	zh_TW
dc.subject	模型竊取	zh_TW
dc.subject	Small Language Models	en
dc.subject	Edge Computing	en
dc.subject	Model Stealing	en
dc.subject	Secure Inference	en
dc.subject	Trusted Execution Environment (TEE)	en
dc.subject	On-Device AI	en
dc.title	SEAL：基於可信執行環境的終端語言模型之安全高效自適應分層	zh_TW
dc.title	SEAL: Secure and Efficient Adaptive Layering for On-Device Language Models with TEE	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	曹昱;吳齊人;楊得年	zh_TW
dc.contributor.oralexamcommittee	Yu Tsao;Chi-Jen Wu;De-Nian Yang	en
dc.subject.keyword	邊緣運算,裝置上人工智慧,小語言模型,可信任執行環境,安全推論,模型竊取,	zh_TW
dc.subject.keyword	Edge Computing,On-Device AI,Small Language Models,Trusted Execution Environment (TEE),Secure Inference,Model Stealing,	en
dc.relation.page	68	-
dc.identifier.doi	10.6342/NTU202504203	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-08-14	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2025-08-20	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	1.37 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。