請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林守德 | zh_TW |
| dc.contributor.advisor | Shou-De Lin | en |
| dc.contributor.author | 陳韋恩 | zh_TW |
| dc.contributor.author | Wei-En Chen | en |
| dc.date.accessioned | 2023-08-09T16:42:17Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-08-09 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-07-26 | - |
| dc.identifier.citation | [1] E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 245–250, 2001.
[2] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt. Yake! keyword extraction from single documents using multiple local features. Information Sciences, 509:257–289, 2020. [3] Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on Machine learning, pages 129–136, 2007. [4] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1–10, 2018. [5] M. Coavoux, S. Narayan, and S. B. Cohen. Privacy-preserving neural representations of text. arXiv preprint arXiv:1808.09408, 2018. [6] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016. [7] M. Grootendorst. Keybert: Minimal keyword extraction with bert., 2020. [8] Z. S. Harris. Distributional structure. Word, 10(2-3):146–162, 1954. [9] J. Höhmann, A. Rettinger, and K. Kugler. Invbert: Text reconstruction from contextualized embeddings used for derived text formats of literary works. arXiv preprint arXiv:2109.10104, 2021. [10] S. Ilić, E. Marrese-Taylor, J. A. Balazs, and Y. Matsuo. Deep contextualized word representations for detecting sarcasm and irony. arXiv preprint arXiv:1809.09795, 2018. [11] J. D. M.-W. C. Kenton and L. K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019. [12] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019. [13] K. Lang. Newsweeder: Learning to filter netnews. In in Proceedings of the 12th International Machine Learning Conference (ML95, 1995. [14] Q. Le and T. Mikolov. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014. [15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. [16] A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. [17] A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5188–5196, 2015. [18] F. Meng, M. Richer, A. Tehrani, J. La, T. D. Kim, P. W. Ayers, and F. Heidar-Zadeh. Procrustes: A python library to find transformations that maximize the similarity between matrices. Computer Physics Communications, 276:108334, 2022. [19] S. Merity, C. Xiong, J. Bradbury, and R. Socher. Pointer sentinel mixture models, 2016. [20] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [21] X. Pan, M. Zhang, S. Ji, and M. Yang. Privacy risks of general-purpose language models. In 2020 IEEE Symposium on Security and Privacy (SP), pages 1314–1331. IEEE, 2020. [22] E. Papagiannopoulou and G. Tsoumakas. A review of keyphrase extraction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(2):e1339, 2020. [23] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pages 506–519, 2017. [24] J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. [25] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [26] N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019. [27] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017. [28] C. Song and A. Raghunathan. Information leakage in embedding models. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 377–390, 2020. [29] C. Song and V. Shmatikov. Overlearning reveals sensitive attributes. arXiv preprint arXiv:1905.11742, 2019. [30] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019. [31] X. Zhang, J. J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. In NIPS, 2015. [32] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019. [33] H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi. Bdpl: A boundary differentially private layer against machine learning model extraction attacks. In Computer Security–ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, September 23–27, 2019, Proceedings, Part I 24, pages 66–83. Springer, 2019. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88357 | - |
| dc.description.abstract | 本文提出了一個針對文本嵌入逆推攻擊的解決方法,適用於當查詢數量有限的情境。為了應對這個挑戰,我們提出了一個名為"輔助文檔嵌入攻擊與查詢選擇(SADE)"的模型,旨在適應實際應用的場景。具體而言,我們引入了一種利用外部文檔的方法來克服當私有文檔非常有限的情況,這在實際場景中是一個常見的問題。此外,我們提出了一種新穎的查詢策略,解決了當私有編碼器的查詢數量非常有限的情況。與之前的方法不同的是,我們的方法利用少量的查詢即可實現高度的準確性,而不需要大量的私有文檔和無限制的查詢私有編碼器。整體來說,我們提出的模型和查詢策略證明了我們的方法在文本嵌入逆推攻擊中的有效性和實用性。 | zh_TW |
| dc.description.abstract | This paper proposes a solution for embedding inversion attack of textual embedding in scenarios where the number of queries is limited. To address this challenge, we propose a model called Surrogate-Assisted Document Embedding Attack with Query Selection (SADE) that is designed to fit practical scenarios. Specifically, we introduce a means to exploit external documents to overcome the challenge of limited private documents, which is a common issue in practice. Additionally, we propose a novel query strategy that addresses the issue of limited queries to the private encoder. Unlike previous works that require a large number of private documents and unlimited query access, our approach utilizes a small number of queries to achieve high retrieval accuracy. Overall, our proposed model and query strategy demonstrate the effectiveness and practicality of our approach for embedding inversion attacks of textual embedding. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:42:17Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-08-09T16:42:17Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員審定書 i
致謝 iii 摘要 v Abstract vi Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 Chapter 2 Related Work 6 Chapter 3 Problem Definition 8 Chapter 4 Methodology 10 4.0.1 Query Selection Strategy 11 4.0.2 Surrogate Model Training 15 4.0.3 Adversarial Representation Training 17 Chapter 5 Experiments 23 5.0.1 Experimental Settings 24 5.0.2 Attack performance analysis 27 5.0.2.1 Comparing on different threat models (RQ1) 27 5.0.2.2 Comparing on different retrieval targets (RQ2) 29 5.0.2.3 Comparing on different embedding algorithms (RQ3) 29 5.0.3 Attack performance varying number of queries 30 5.0.4 Attack performance varying number of private documents 30 5.0.5 Effectiveness of surrogate model structure with different embeddings algorithm (RQ4) 32 5.0.6 Ablation Study (RQ5) 34 5.0.6.1 Analysis on different query strategies 34 5.0.6.2 Effectiveness of the ranking objective 35 5.0.6.3 Effectiveness of training the surrogate model 37 5.0.6.4 Impact of the adversarial training during 2nd stage 37 5.0.6.5 Comprehensive Analysis 38 Chapter 6 Defense Approach 40 6.0.1 Laplace noise 40 6.0.2 Random Projection 41 6.0.3 PCA 42 6.0.4 Autoencoder 42 Chapter 7 Conclusion 44 References 45 Appendix A- Dimension Reduction and Transformation 50 Appendix B- Query Strategy Comparison 52 Appendix C- Impact of Diversity in Selected Documents 54 Appendix D- Case Study 56 | - |
| dc.language.iso | en | - |
| dc.subject | 嵌入逆推攻擊 | zh_TW |
| dc.subject | 有線查詢 | zh_TW |
| dc.subject | 文本嵌入 | zh_TW |
| dc.subject | 代理模型 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | Document Embedding | en |
| dc.subject | Limited Query | en |
| dc.subject | Surrogate Model | en |
| dc.subject | Embedding Inversion Attack | en |
| dc.subject | Deep Learning | en |
| dc.title | 有限查詢存取下的文本嵌入逆推攻擊 | zh_TW |
| dc.title | Open-World Document Embedding Inversion Attack under Limited Query Access | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳縕儂;陳尚澤;李政德;彭文志 | zh_TW |
| dc.contributor.oralexamcommittee | Yun-Nung Chen;Shang-Tse Chen;Cheng-Te Li;Wen-Chih Peng | en |
| dc.subject.keyword | 嵌入逆推攻擊,文本嵌入,有線查詢,代理模型,深度學習, | zh_TW |
| dc.subject.keyword | Embedding Inversion Attack,Document Embedding,Limited Query,Surrogate Model,Deep Learning, | en |
| dc.relation.page | 58 | - |
| dc.identifier.doi | 10.6342/NTU202302014 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-07-28 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf | 5.03 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
