有限查詢存取下的文本嵌入逆推攻擊

黃昱翔; Yu-Hsiang Huang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93476

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德	zh_TW
dc.contributor.advisor	Shou-De Lin	en
dc.contributor.author	黃昱翔	zh_TW
dc.contributor.author	Yu-Hsiang Huang	en
dc.date.accessioned	2024-08-01T16:19:32Z	-
dc.date.available	2024-08-02	-
dc.date.copyright	2024-08-01	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-29	-
dc.identifier.citation	1. A. Alishahi, G. Chrupała, and T. Linzen. Analyzing and interpreting neural networks for nlp: A report on the first blackboxnlp workshop. Natural Language Engineering, 25(4):543–557, 2019 2. A. Alishahi, G. Chrupała, and T. Linzen. Analyzing and interpreting neural networks for nlp: A report on the first blackboxnlp workshop. Natural Language Engineering, 25(4):543–557, 2019 3. Bordes, R. Balestriero, and P. Vincent. High fidelity visualization of what your self supervised representation knows about. Transactions on Machine Learning Research, 2022. 4. T. Cong, X. He, and Y. Zhang. Sslguard: A watermarking scheme for self-supervised learning pre-trained encoders. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 579–593, 2022./ 5. A. Dosovitskiy and T. Brox. Inverting visual representations with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4829–4837, 2016. 6. Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, 26 M. March, and V. Lempitsky. Domain-adversarial training of neural networks. Journal of machine learning research, 17(59):1–35, 2016. 7. X. He, L. Lyu, L. Sun, and Q. Xu. Model extraction and adversarial transfer ability, your bert is vulnerable! In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2006–2012, 2021. 8. A. E. W. Johnson, D. J. Stone, L. A. Celi, and T. J. Pollard. The mimic code repos itory: enabling reproducibility in critical care research. Journal of the American Medical Informatics Association, 25(1):32–39, 2018. 9. J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019. 10. K. Krishna, G. S. Tomar, A. P. Parikh, N. Papernot, and M. Iyyer. Thieves on sesame street! model extraction of bert-based apis. In International Conference on Learning Representations, 2020. 11. K. Kugler, S. Münker, J. Höhmann, and A. Rettinger. Invbert: Reconstructing text from contextualized word embeddings by inverting the bert pipeline. arXiv preprint arXiv:2109.10104, 2021. 12. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advancesin Neural Information Processing Systems, 33:9459–9474, 2020. 13. H. Li, M. Xu, and Y. Song. Sentence embedding leaks more information than you expect: Generative embedding inversion attack to recover the whole sentence. In Findings of the Association for Computational Linguistics: ACL 2023, pages 14022–14040, 2023. 14. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004. 15. Y.-T. Lin and Y.-N. Chen. LLM-eval: Unified multi-dimensional automatic evalua tion for open-domain conversations with large language models. In Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), pages 47–58, Toronto, Canada, July 2023. Association for Computational Linguistics. 16. Y. Liu, J. Jia, H. Liu, and N. Z. Gong. Stolenencoder: stealing pre-trained encoders in self-supervised learning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2115–2128, 2022. 17. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018. 18. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. 19. A. Mahendran and A. Vedaldi. Understanding deep image representations by invert ing them. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5188–5196, 2015. 20. J. Morris, V. Kuleshov, V. Shmatikov, and A. M. Rush. Text embeddings reveal (al 28 most) as much as text. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12448–12460, 2023. 21. A. Naseh, K. Krishna, M. Iyyer, and A. Houmansadr. Stealing the decoding algo rithms of language models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 1835–1849, 2023. 22. J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, and Y. Yang. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864–1874, 2022. 23. J. Ni, C. Qu, J. Lu, Z. Dai, G. H. Abrego, J. Ma, V. Zhao, Y. Luan, K. Hall, M.-W. Chang, et al. Large dual encoders are generalizable retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855, 2022. 24. X. Pan, M. Zhang, S. Ji, and M. Yang. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331. IEEE, 2020. 25. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from nat ural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. 26. S. Raza, D. J. Reji, F. Shajan, and S. R. Bashir. Large-scale application of named entity recognition to biomedicine and epidemiology. PLOS Digital Health, 1(12):e0000152, 2022. 27. N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019. 28. C. Song and A. Raghunathan. Information leakage in embedding mod els. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 377–390, 2020. 29. P. Teterwak, C. Zhang, D. Krishnan, and M. C. Mozer. Understanding invariance via feedforward inversion of discriminatively trained classifiers. In International Conference on Machine Learning, pages 10225–10235. PMLR, 2021. 30. R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280, 1989. 31. S. Zanella-Beguelin, S. Tople, A. Paverd, and B. Köpf. Grey-box extraction of natural language models. In International Conference on Machine Learning, pages 12278–12286. PMLR, 2021. 32. X. Zhang, J. J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. 33. Y. Zhang, S. Sun, M. Galley, Y.-C. Chen, C. Brockett, X. Gao, J. Gao, J. Liu, and W. B. Dolan. Dialogpt: Large-scale generative pre-training for conversational re sponse generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 270–278, 2020.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93476	-
dc.description.abstract	本研究調查了與文本嵌入相關的隱私風險，重點關注於攻擊者無法訪問原始嵌入模型的情境。我們的方法與過去需要直接訪問模型的研究不同，我們通過開發一種轉移攻擊方法，探索了更為現實的威脅模型。此方法使用一個代理模型來模仿目標嵌入模型的行為，使攻擊者在不需要直接訪問目標嵌入模型的情況下從文本嵌入中推斷出敏感信息。我們在各種嵌入模型和一個臨床數據集上的實驗表明，我們的轉移攻擊方法顯著優於傳統方法，揭示了嵌入技術潛在的隱私漏洞，並強調了加強安全措施的必要性。	zh_TW
dc.description.abstract	This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model’s behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-01T16:19:32Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-08-01T16:19:32Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 ii 摘要 iv Abstract v Contents vi List of Figures viii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Problem Definition 6 3.1 Embedding inversion attack . . . . . . . . . . . . . . . . . . . . . . 6 3.2 Transferable embedding inversion attack . . . . . . . . . . . . . . . 7 Chapter 4 Methodology 9 4.1 Encoder Stealing with a Surrogate Model . . . . . . . . . . . . . . . 9 4.2 Adversarial Threat Model Transferability . . . . . . . . . . . . . . . 11 4.3 Training Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 5 Experiments 13 5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Attack Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.2.1 In-domain Text Reconstruction . . . . . . . . . . . . . . . . . . . . 14 5.2.2 Out-of-domain Text Reconstruction . . . . . . . . . . . . . . . . . 16 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3.1 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.3.2 Size of the Leaked Dataset . . . . . . . . . . . . . . . . . . . . . . 17 5.3.3 Size of the External Dataset . . . . . . . . . . . . . . . . . . . . . . 19 5.3.4 Choice of a Surrogate Embedding Model . . . . . . . . . . . . . . . 20 5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4.1 Embedding inversion on MIMIC dataset. . . . . . . . . . . . . . . . 21 5.4.2 Recovery Rate on Named Entities . . . . . . . . . . . . . . . . . . 22 5.5 Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 6 Conclusion 25 References 26 Appendix A — Detailed Dataset Statistics 31 Appendix B — Hyperparameters 32 Appendix C — Full Out-of-Domain Experiment 33 Appendix D — Comparison of Augmentation Strategies 34 Appendix E — Prompts for LLM Data Augmentation 36 Appendix F — Details of LLM Evaluation 38 Appendix G — More Case Study 39	-
dc.language.iso	en	-
dc.subject	生成式嵌入逆推攻擊	zh_TW
dc.subject	文本嵌入	zh_TW
dc.subject	大型語言模型	zh_TW
dc.subject	代理模型	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Deep Learning	en
dc.subject	Generative Embedding Inversion Attack	en
dc.subject	Sentence Embedding	en
dc.subject	Large language model	en
dc.subject	Surrogate Model	en
dc.subject	Natural Language Processing	en
dc.title	有限查詢存取下的文本嵌入逆推攻擊	zh_TW
dc.title	Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳縕儂;陳尚澤;李政德	zh_TW
dc.contributor.oralexamcommittee	Yun-Nung Chen;Shang-Tse Chen;Cheng-Te Li	en
dc.subject.keyword	生成式嵌入逆推攻擊,文本嵌入,大型語言模型,代理模型,自然語言處理,深度學習,	zh_TW
dc.subject.keyword	Generative Embedding Inversion Attack,Sentence Embedding,Large language model,Surrogate Model,Natural Language Processing,Deep Learning,	en
dc.relation.page	40	-
dc.identifier.doi	10.6342/NTU202402606	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-08-01	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	1.52 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。