Trap-MID: 基於陷阱後門之隱私性防禦用以對抗模型逆向攻擊

劉鎮霆; Zhen-Ting Liu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93095

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳尚澤	zh_TW
dc.contributor.advisor	Shang-Tse Chen	en
dc.contributor.author	劉鎮霆	zh_TW
dc.contributor.author	Zhen-Ting Liu	en
dc.date.accessioned	2024-07-17T16:23:31Z	-
dc.date.available	2024-07-18	-
dc.date.copyright	2024-07-17	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-08	-
dc.identifier.citation	References [1] Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, and Ngai-Man Cheung. Re-thinking model inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16384–16393, 2023. [2] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In 23rd USENIX security symposium (USENIX Security 14), pages 17–32, 2014. [3] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015. [4] Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 253–261, 2020. [5] Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, and Alireza Makhzani. Variational model inversion attacks. Advances in Neural Information Processing Systems, 34:9706–9719, 2021. [6] Si Chen, Mostafa Kahla, Ruoxi Jia, and Guo-Jun Qi. Knowledge-enriched distributional model inversion attacks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16178–16187, 2021. [7] Lukas Struppek, Dominik Hintersdorf, Antonio De Almeida Correira, Antonia Adler, and Kristian Kersting. Plug & play attacks: Towards robust and flexible model inversion attacks. In International Conference on Machine Learning, pages 20522–20545. PMLR, 2022. [8] Xiaojian Yuan, Kejiang Chen, Jie Zhang, Weiming Zhang, Nenghai Yu, and Yang Zhang. Pseudo label-guided model inversion attack via conditional generative adversarial network. AAAI 2023, 2023. [9] Shengwei An, Guanhong Tao, Qiuling Xu, Yingqi Liu, Guangyu Shen, Yuan Yao, Jingwei Xu, and Xiangyu Zhang. Mirror: Model inversion for deep learning network with high fidelity. In Proceedings of the 29th Network and Distributed System Security Symposium, 2022. [10] Gyojin Han, Jaehyun Choi, Haeil Lee, and Junmo Kim. Reinforcement learning-based black-box model inversion attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20504–20513, 2023. [11] Mostafa Kahla, Si Chen, Hoang Anh Just, and Ruoxi Jia. Label-only model inversion attacks via boundary repulsion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15045–15053, 2022. [12] Ziqi Yang, Jiyi Zhang, Ee-Chien Chang, and Zhenkai Liang. Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 225–240, 2019. [13] Rongke Liu, Dong Wang, Yizhi Ren, Zhen Wang, Kaitian Guo, Qianqian Qin, and Xiaolei Liu. Unstoppable attack: Label-only model inversion via conditional diffusion model. IEEE Transactions on Information Forensics and Security, 2024. [14] Tianhao Wang, Yuheng Zhang, and Ruoxi Jia. Improving robustness to model inversion attacks via mutual information regularization. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13):11666–11673, 2021. [15] Xiong Peng, Feng Liu, Jingfeng Zhang, Long Lan, Junjie Ye, Tongliang Liu, and Bo Han. Bilateral dependency optimization: Defending against modelinversion attacks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1358–1367, 2022. [16] Lukas Struppek, Dominik Hintersdorf, and Kristian Kersting. Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=1SbkubNdbW. [17] Xueluan Gong, Ziyao Wang, Yanjiao Chen, Qian Wang, Cong Wang, and Chao Shen. Netguard: Protecting commercial web apis from model inversion attacks using gan-generated fake samples. In Proceedings of the ACM Web Conference 2023, pages 2045–2053, 2023. [18] Xueluan Gong, Ziyao Wang, Shuaike Li, Yanjiao Chen, and Qian Wang. A ganbased defense framework against model inversion attacks. IEEE Transactions on Information Forensics and Security, 2023. [19] Shawn Shan, Emily Wenger, Bolun Wang, Bo Li, Haitao Zheng, and Ben Y Zhao. Gotta catch’em all: Using honeypots to catch adversarial attacks on neural networks. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 67–83, 2020. [20] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020. [21] Ziqi Yang, Bin Shao, Bohan Xuan, Ee-Chien Chang, and Fan Zhang. Defending model inversion and membership inference attacks via prediction purification. arXiv preprint arXiv:2005.03915, 2020. [22] Jing Wen, Siu-Ming Yiu, and Lucas CK Hui. Defending against model inversion attack by adversarial examples. In 2021 IEEE International Conference on Cyber Security and Resilience (CSR), pages 551–556. IEEE, 2021. [23] Yiming Li, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022. [24] Yiming Li, Tongqing Zhai, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. Rethinking the trigger of backdoor attack. arXiv preprint arXiv:2004.04692, 2020. [25] Han Qiu, Yi Zeng, Shangwei Guo, Tianwei Zhang, Meikang Qiu, and Bhavani Thuraisingham. Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 363–377, 2021. [26] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017. [27] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015. [28] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [29] Yu Cheng, Jian Zhao, Zhecan Wang, Yan Xu, Karlekar Jayashree, Shengmei Shen, and Jiashi Feng. Know you at one glance: A compact vector representation for low-shot learning. In Proceedings of the IEEE international conference on computer vision workshops, pages 1924–1932, 2017. [30] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. [31] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. [32] Kota Yoshida and Takeshi Fujino. Countermeasure against backdoor attack on neural networks utilizing knowledge distillation. Journal of Signal Processing, 24(4):141–144, 2020. [33] Kota Yoshida and Takeshi Fujino. Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks. In Proceedings of the 13th ACM workshop on artificial intelligence and security, pages 117–127, 2020. [34] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020. [35] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019. [36] Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690– 4699, 2019. [37] Brianna Maze, Jocelyn Adams, James A Duncan, Nathan Kalka, Tim Miller, Charles Otto, Anil K Jain, W Tyler Niggel, Janet Anderson, Jordan Cheney, et al. Iarpa janus benchmark-c: Face dataset and protocol. In 2018 international conference on biometrics (ICB), pages 158–165. IEEE, 2018. [38] Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. Msceleb-1m: A dataset and benchmark for large-scale face recognition. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pages 87–102. Springer, 2016. [39] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93095	-
dc.description.abstract	模型逆向攻擊 (Model Inversion attacks，MI attacks) 能夠利用神經網路模型還原出訓練資料集的分佈，進而對資料集的隱私造成重大威脅。雖然現有的防禦措施通常利用正則化技術減少資訊洩漏，但仍然無法抵擋較新的攻擊方法。本文提出了一種基於陷阱後門的防禦策略 (Trapdoor-based Model Inversion Defense attacks，Trap-MID) 以誤導模型逆向攻擊。該方法將陷阱後門集成到模型中，當輸入資料被注入特定觸發器時，模型便會預測出相對應的分類。因此，這種陷阱後門將會成為模型逆向攻擊的「捷徑」，誘導其專注於提取陷阱後門的觸發器而非隱私資訊。我們透過理論分析展示陷阱後門的有效性和隱蔽性對於誘騙模型逆向攻擊的影響。此外，實驗結果佐證了 Trap-MID 能在無需額外資料或大量計算資訊的情況下，針對模型逆向攻擊提供超越以往的防禦表現。	zh_TW
dc.description.abstract	Model Inversion (MI) attacks pose a significant threat to the privacy of Deep Neural Networks by recovering training data distribution from well-trained models. While existing defenses often rely on regularization techniques to reduce information leakage, they remain vulnerable to recent attacks. In this paper, we propose the Trapdoor-based Model Inversion Defense (Trap-MID) to mislead MI attacks. A trapdoor is integrated into the model to predict a specific label when the input is injected with the corresponding trigger. Consequently, this trapdoor information serves as the "shortcut" for MI attacks, leading them to extract trapdoor triggers rather than private data. We provide theoretical insights into the impacts of trapdoor's effectiveness and invisibility on deceiving MI attacks. In addition, empirical experiments demonstrate the state-of-the-art defense performance of Trap-MID against various MI attacks without the requirements for extra data or large computational overhead.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-17T16:23:31Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-17T16:23:31Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Contents Page Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xiii List of Tables xv Denotation xvii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Model Inversion Attacks . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Defenses against Model Inversion Attacks . . . . . . . . . . . . . 7 2.3 Backdoor Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3 Methodology 11 3.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Target Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2 Adversary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Motivation and Overview . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 4 Experiments 19 4.1 Experimental Setups . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.2 Target Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.3 Attack Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.4 Baseline Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 Comparison with Baselines . . . . . . . . . . . . . . . . . . . . . 22 4.2.2 Synthetic Distribution Analysis . . . . . . . . . . . . . . . . . . 24 4.2.3 Trapdoor Recovery Analysis . . . . . . . . . . . . . . . . . . . . 25 4.2.4 Adversarial Detection . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.5 Adaptive Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 5 Conclusion 29 5.1 Broader Impacts . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Limitations and Future Works . . . . . . . . . . . . . . . . . . . . 29 5.2.1 Experimental Limitations . . . . . . . . . . . . . . . . . . . . . 29 5.2.2 Exploring Different Scenarios . . . . . . . . . . . . . . . . . . . 30 5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 References 33 Appendix A — Proof of Theorem 1 41 Appendix B — Experimental Details 43 B.1 Hardware and Software Details . . . . . . . . . . . . . . . . . . . 43 B.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 B.2.1 CelebA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 B.2.2 FFHQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.3 Target Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.4 Trap-MID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.5 Attack Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 B.6 Baseline Defenses . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B.6.1 MID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B.6.2 BiDO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 B.6.3 NegLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 B.7 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 47 B.7.1 Attack accuracy (AA) . . . . . . . . . . . . . . . . . . . . . . . 47 B.7.2 K-Nearest Neighbor Distance (KNN Dist) . . . . . . . . . . . . 47 B.7.3 Fréchet Inception Distance (FID) . . . . . . . . . . . . . . . . . 48 Appendix C — Ablation Studies 49 C.1 TeD’s Trapdoor Injection Strategy . . . . . . . . . . . . . . . . . 49 C.2 Trigger Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 53 C.3 Blend Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 C.4 Loss Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 C.5 Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Appendix D — Additional Experiments 59 D.1 Defense Performance on Different Architectures . . . . . . . . . . 59 D.2 Distributional Shifts in Auxiliary Dataset . . . . . . . . . . . . . 62 D.3 Trapdoor Recovery Analysis against Different MI Attacks . . . . 63 D.4 Adaptive Attacks without Trapdoor Signatures . . . . . . . . . . 64 Appendix E — Additional Visualization 67	-
dc.language.iso	en	-
dc.title	Trap-MID: 基於陷阱後門之隱私性防禦用以對抗模型逆向攻擊	zh_TW
dc.title	Trap-MID: Trapdoor-based Defense against Model Inversion Attacks	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	林守德;游家牧;陳駿丞	zh_TW
dc.contributor.oralexamcommittee	Shou-De Lin;Chia-Mu Yu;Jun-Cheng Chen	en
dc.subject.keyword	模型逆向攻擊,隱私,防禦,陷阱後門,後門,	zh_TW
dc.subject.keyword	model inversion attacks,privacy,defense,trapdoor,backdoor,	en
dc.relation.page	68	-
dc.identifier.doi	10.6342/NTU202401537	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-07-08	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 此日期後於網路公開 2029-07-05	16.91 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。