有限查詢存取下的文本嵌入逆推攻擊

陳韋恩; Wei-En Chen

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林守德	zh_TW
dc.contributor.advisor	Shou-De Lin	en
dc.contributor.author	陳韋恩	zh_TW
dc.contributor.author	Wei-En Chen	en
dc.date.accessioned	2023-08-09T16:42:17Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-09	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-26	-
dc.identifier.citation	[1] E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245–250, 2001. [2] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289, 2020. [3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007. [4] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1–10, 2018. [5] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. arXiv preprint arXiv:1808.09408, 2018. [6] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016. [7] M. Grootendorst. Keybert: Minimal keyword extraction with bert., 2020. [8] Z. S. Harris. Distributional structure. Word, 10(2-3):146–162, 1954. [9] J. Höhmann, A. Rettinger, and K. Kugler. Invbert: Text reconstruction from contextualized embeddings used for derived text formats of literary works. arXiv preprint arXiv:2109.10104, 2021. [10] S. Ilić, E. Marrese-Taylor, J. A. Balazs, and Y. Matsuo. Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795, 2018. [11] J. D. M.-W. C. Kenton and L. K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019. [12] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019. [13] K. Lang. Newsweeder: Learning to filter netnews. In in Proceedings of the 12th International Machine Learning Conference (ML95, 1995. [14] Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014. [15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. [16] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. [17] A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5188–5196, 2015. [18] F. Meng, M. Richer, A. Tehrani, J. La, T. D. Kim, P. W. Ayers, and F. Heidar-Zadeh. Procrustes: A python library to find transformations that maximize the similarity between matrices. Computer Physics Communications, 276:108334, 2022. [19] S. Merity, C. Xiong, J. Bradbury, and R. Socher. Pointer sentinel mixture models, 2016. [20] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [21] X. Pan, M. Zhang, S. Ji, and M. Yang. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331. IEEE, 2020. [22] E. Papagiannopoulou and G. Tsoumakas. A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2):e1339, 2020. [23] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017. [24] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [25] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [26] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019. [27] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017. [28] C. Song and A. Raghunathan. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 377–390, 2020. [29] C. Song and V. Shmatikov. Overlearning reveals sensitive attributes. arXiv preprint arXiv:1905.11742, 2019. [30] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019. [31] X. Zhang, J. J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. [32] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019. [33] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi. Bdpl: A boundary differentially private layer against machine learning model extraction attacks. In Computer Security–ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, September 23–27, 2019, Proceedings, Part I 24, pages 66–83. Springer, 2019.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357	-
dc.description.abstract	本文提出了一個針對文本嵌入逆推攻擊的解決方法，適用於當查詢數量有限的情境。為了應對這個挑戰，我們提出了一個名為"輔助文檔嵌入攻擊與查詢選擇（SADE）"的模型，旨在適應實際應用的場景。具體而言，我們引入了一種利用外部文檔的方法來克服當私有文檔非常有限的情況，這在實際場景中是一個常見的問題。此外，我們提出了一種新穎的查詢策略，解決了當私有編碼器的查詢數量非常有限的情況。與之前的方法不同的是，我們的方法利用少量的查詢即可實現高度的準確性，而不需要大量的私有文檔和無限制的查詢私有編碼器。整體來說，我們提出的模型和查詢策略證明了我們的方法在文本嵌入逆推攻擊中的有效性和實用性。	zh_TW
dc.description.abstract	This paper proposes a solution for embedding inversion attack of textual embedding in scenarios where the number of queries is limited. To address this challenge, we propose a model called Surrogate-Assisted Document Embedding Attack with Query Selection (SADE) that is designed to fit practical scenarios. Specifically, we introduce a means to exploit external documents to overcome the challenge of limited private documents, which is a common issue in practice. Additionally, we propose a novel query strategy that addresses the issue of limited queries to the private encoder. Unlike previous works that require a large number of private documents and unlimited query access, our approach utilizes a small number of queries to achieve high retrieval accuracy. Overall, our proposed model and query strategy demonstrate the effectiveness and practicality of our approach for embedding inversion attacks of textual embedding.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:42:17Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-09T16:42:17Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	口試委員審定書 i 致謝 iii 摘要 v Abstract vi Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 6 Chapter 3 Problem Definition 8 Chapter 4 Methodology 10 4.0.1 Query Selection Strategy 11 4.0.2 Surrogate Model Training 15 4.0.3 Adversarial Representation Training 17 Chapter 5 Experiments 23 5.0.1 Experimental Settings 24 5.0.2 Attack performance analysis 27 5.0.2.1 Comparing on different threat models (RQ1) 27 5.0.2.2 Comparing on different retrieval targets (RQ2) 29 5.0.2.3 Comparing on different embedding algorithms (RQ3) 29 5.0.3 Attack performance varying number of queries 30 5.0.4 Attack performance varying number of private documents 30 5.0.5 Effectiveness of surrogate model structure with different embeddings algorithm (RQ4) 32 5.0.6 Ablation Study (RQ5) 34 5.0.6.1 Analysis on different query strategies 34 5.0.6.2 Effectiveness of the ranking objective 35 5.0.6.3 Effectiveness of training the surrogate model 37 5.0.6.4 Impact of the adversarial training during 2nd stage 37 5.0.6.5 Comprehensive Analysis 38 Chapter 6 Defense Approach 40 6.0.1 Laplace noise 40 6.0.2 Random Projection 41 6.0.3 PCA 42 6.0.4 Autoencoder 42 Chapter 7 Conclusion 44 References 45 Appendix A- Dimension Reduction and Transformation 50 Appendix B- Query Strategy Comparison 52 Appendix C- Impact of Diversity in Selected Documents 54 Appendix D- Case Study 56	-
dc.language.iso	en	-
dc.subject	嵌入逆推攻擊	zh_TW
dc.subject	有線查詢	zh_TW
dc.subject	文本嵌入	zh_TW
dc.subject	代理模型	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	Document Embedding	en
dc.subject	Limited Query	en
dc.subject	Surrogate Model	en
dc.subject	Embedding Inversion Attack	en
dc.subject	Deep Learning	en
dc.title	有限查詢存取下的文本嵌入逆推攻擊	zh_TW
dc.title	Open-World Document Embedding Inversion Attack under Limited Query Access	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳縕儂;陳尚澤;李政德;彭文志	zh_TW
dc.contributor.oralexamcommittee	Yun-Nung Chen;Shang-Tse Chen;Cheng-Te Li;Wen-Chih Peng	en
dc.subject.keyword	嵌入逆推攻擊,文本嵌入,有線查詢,代理模型,深度學習,	zh_TW
dc.subject.keyword	Embedding Inversion Attack,Document Embedding,Limited Query,Surrogate Model,Deep Learning,	en
dc.relation.page	58	-
dc.identifier.doi	10.6342/NTU202302014	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-07-28	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	5.03 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。