Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor林守德zh_TW
dc.contributor.advisorShou-De Linen
dc.contributor.author陳韋恩zh_TW
dc.contributor.authorWei-En Chenen
dc.date.accessioned2023-08-09T16:42:17Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-09-
dc.date.issued2023-
dc.date.submitted2023-07-26-
dc.identifier.citation[1] E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245–250, 2001.
[2] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289, 2020.
[3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007.
[4] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1–10, 2018.
[5] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. arXiv preprint arXiv:1808.09408, 2018.
[6] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
[7] M. Grootendorst. Keybert: Minimal keyword extraction with bert., 2020.
[8] Z. S. Harris. Distributional structure. Word, 10(2-3):146–162, 1954.
[9] J. Höhmann, A. Rettinger, and K. Kugler. Invbert: Text reconstruction from contextualized embeddings used for derived text formats of literary works. arXiv preprint arXiv:2109.10104, 2021.
[10] S. Ilić, E. Marrese-Taylor, J. A. Balazs, and Y. Matsuo. Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795, 2018.
[11] J. D. M.-W. C. Kenton and L. K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
[12] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
[13] K. Lang. Newsweeder: Learning to filter netnews. In in Proceedings of the 12th International Machine Learning Conference (ML95, 1995.
[14] Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014.
[15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
[16] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[17] A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5188–5196, 2015.
[18] F. Meng, M. Richer, A. Tehrani, J. La, T. D. Kim, P. W. Ayers, and F. Heidar-Zadeh. Procrustes: A python library to find transformations that maximize the similarity between matrices. Computer Physics Communications, 276:108334, 2022.
[19] S. Merity, C. Xiong, J. Bradbury, and R. Socher. Pointer sentinel mixture models, 2016.
[20] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[21] X. Pan, M. Zhang, S. Ji, and M. Yang. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331. IEEE, 2020.
[22] E. Papagiannopoulou and G. Tsoumakas. A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2):e1339, 2020.
[23] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017.
[24] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
[25] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[26] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019.
[27] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
[28] C. Song and A. Raghunathan. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 377–390, 2020.
[29] C. Song and V. Shmatikov. Overlearning reveals sensitive attributes. arXiv preprint arXiv:1905.11742, 2019.
[30] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
[31] X. Zhang, J. J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015.
[32] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
[33] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi. Bdpl: A boundary differentially private layer against machine learning model extraction attacks. In Computer Security–ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, September 23–27, 2019, Proceedings, Part I 24, pages 66–83. Springer, 2019.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357-
dc.description.abstract本文提出了一個針對文本嵌入逆推攻擊的解決方法,適用於當查詢數量有限的情境。為了應對這個挑戰,我們提出了一個名為"輔助文檔嵌入攻擊與查詢選擇(SADE)"的模型,旨在適應實際應用的場景。具體而言,我們引入了一種利用外部文檔的方法來克服當私有文檔非常有限的情況,這在實際場景中是一個常見的問題。此外,我們提出了一種新穎的查詢策略,解決了當私有編碼器的查詢數量非常有限的情況。與之前的方法不同的是,我們的方法利用少量的查詢即可實現高度的準確性,而不需要大量的私有文檔和無限制的查詢私有編碼器。整體來說,我們提出的模型和查詢策略證明了我們的方法在文本嵌入逆推攻擊中的有效性和實用性。zh_TW
dc.description.abstractThis paper proposes a solution for embedding inversion attack of textual embedding in scenarios where the number of queries is limited. To address this challenge, we propose a model called Surrogate-Assisted Document Embedding Attack with Query Selection (SADE) that is designed to fit practical scenarios. Specifically, we introduce a means to exploit external documents to overcome the challenge of limited private documents, which is a common issue in practice. Additionally, we propose a novel query strategy that addresses the issue of limited queries to the private encoder. Unlike previous works that require a large number of private documents and unlimited query access, our approach utilizes a small number of queries to achieve high retrieval accuracy. Overall, our proposed model and query strategy demonstrate the effectiveness and practicality of our approach for embedding inversion attacks of textual embedding.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:42:17Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-09T16:42:17Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員審定書 i
致謝 iii
摘要 v
Abstract vi
Contents vii
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Related Work 6
Chapter 3 Problem Definition 8
Chapter 4 Methodology 10
4.0.1 Query Selection Strategy 11
4.0.2 Surrogate Model Training 15
4.0.3 Adversarial Representation Training 17
Chapter 5 Experiments 23
5.0.1 Experimental Settings 24
5.0.2 Attack performance analysis 27
5.0.2.1 Comparing on different threat models (RQ1) 27
5.0.2.2 Comparing on different retrieval targets (RQ2) 29
5.0.2.3 Comparing on different embedding algorithms (RQ3) 29
5.0.3 Attack performance varying number of queries 30
5.0.4 Attack performance varying number of private documents 30
5.0.5 Effectiveness of surrogate model structure with different embeddings algorithm (RQ4) 32
5.0.6 Ablation Study (RQ5) 34
5.0.6.1 Analysis on different query strategies 34
5.0.6.2 Effectiveness of the ranking objective 35
5.0.6.3 Effectiveness of training the surrogate model 37
5.0.6.4 Impact of the adversarial training during 2nd stage 37
5.0.6.5 Comprehensive Analysis 38
Chapter 6 Defense Approach 40
6.0.1 Laplace noise 40
6.0.2 Random Projection 41
6.0.3 PCA 42
6.0.4 Autoencoder 42
Chapter 7 Conclusion 44
References 45
Appendix A- Dimension Reduction and Transformation 50
Appendix B- Query Strategy Comparison 52
Appendix C- Impact of Diversity in Selected Documents 54
Appendix D- Case Study 56
-
dc.language.isoen-
dc.subject嵌入逆推攻擊zh_TW
dc.subject有線查詢zh_TW
dc.subject文本嵌入zh_TW
dc.subject代理模型zh_TW
dc.subject深度學習zh_TW
dc.subjectDocument Embeddingen
dc.subjectLimited Queryen
dc.subjectSurrogate Modelen
dc.subjectEmbedding Inversion Attacken
dc.subjectDeep Learningen
dc.title有限查詢存取下的文本嵌入逆推攻擊zh_TW
dc.titleOpen-World Document Embedding Inversion Attack under Limited Query Accessen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳縕儂;陳尚澤;李政德;彭文志zh_TW
dc.contributor.oralexamcommitteeYun-Nung Chen;Shang-Tse Chen;Cheng-Te Li;Wen-Chih Pengen
dc.subject.keyword嵌入逆推攻擊,文本嵌入,有線查詢,代理模型,深度學習,zh_TW
dc.subject.keywordEmbedding Inversion Attack,Document Embedding,Limited Query,Surrogate Model,Deep Learning,en
dc.relation.page58-
dc.identifier.doi10.6342/NTU202302014-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-07-28-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf5.03 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved