防禦針對文字分類模型的對抗性攻擊

林俊燁; Chun-Yeh Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	雷欽隆	zh_TW
dc.contributor.advisor	Chin-Laung Lei	en
dc.contributor.author	林俊燁	zh_TW
dc.contributor.author	Chun-Yeh Lin	en
dc.date.accessioned	2023-05-10T16:07:28Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-05-10	-
dc.date.issued	2023	-
dc.date.submitted	2023-02-10	-
dc.identifier.citation	Imdb sentiment analysis dataset. https://www.imdb.com/interfaces/. D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751, 2017. J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018. S. Garg and G. Ramakrishnan. Bae: Bert-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970, 2020. D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, may 2017. J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271, 2018. L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984, 2020. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. S. Loria. textblob documentation. Release 0.15, 2, 2018. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. N. Mrkšić, D. O. Séaghdha, B. Thomson, M. Gašić, L. Rojas-Barahona, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016. J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. L. Sun, K. Hashimoto, W. Yin, A. Asai, J. Li, P. Yu, and C. Xiong. Adv-bert: Bert is not robust on misspellings! generating nature adversarial samples on bert. arXiv preprint arXiv:2003.04985, 2020. Y. Zhou, J.-Y. Jiang, K.-W. Chang, and W. Wang. Learning to discriminate perturbations for blocking adversarial attacks in text classification. arXiv preprint arXiv:1909.03084, 2019.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129	-
dc.description.abstract	近年來深度學習在自然語言處理的問題上取得了卓越的成果，然而日常生活中常見的應用，例如垃圾訊息過濾以及情緒分析等等，都很容易受到對抗性攻擊，導致安全性上的疑慮。本文提出兩個方法，干擾偵測可以判斷文字是否受到字元修改的攻擊，並接著基於上下文將受到修改的文字恢復成可能的替代字詞。在字詞替換攻擊上，藉由將重要的字詞替換成數個可能的替代文字以增加樣本數量，且預測結果為所有增加的樣本中最多數被分到的類別。本文提出的方法可以在不需要知道模型參數以及調整模型架構的條件下抵禦對抗式攻擊。在IMDb資料集上所完成的實驗證明，本文的方法可以有效防禦在文字分類上的字元替換及字詞替換攻擊，並展現比比較基準更好的成果。	zh_TW
dc.description.abstract	In recent years, deep learning models have achieved prominent success on NLP tasks. However, widely used real-world applications such as spam filter and sentiment analysis are vulnerable to adversarial attacks. This thesis proposes two methods to defend against adversarial attacks on the sentiment analysis task. Perturbation detector detects if a token in the sample is perturbed through character level attacks, and the recovery process recovers the words from the perturbed ones to possible substitutions based on the context. For word level attacks, augmenting inputs by replacing important words to their possible substitutions and the result of the original sample is the majority class among all the augmented samples. Our methods can block adversarial attacks without knowing the model parameters and modifying model structures. Experiments on IMDb dataset demonstrate that our methods can effectively block both character level and word level attacks and outperform baseline method on text classification task.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-10T16:07:28Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-05-10T16:07:28Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 TextBugger 3 2.2 TextFooler 5 2.3 Discriminate Perturbations 7 Chapter3 Methodology 9 3.1 Character level defense 9 3.1.1 Perturbation Detector 9 3.1.2 Perturbation Recovery 11 3.2 Word level defense 12 3.2.1 Important word replacing 13 3.3 Overall attack defense 14 Chapter 4 Experiments 17 4.1 Experiment settings 17 4.1.1 Dataset 17 4.1.2 Adversarial attacks 17 4.1.3 Base model and baseline 19 4.1.4 Evaluation metric 19 4.2 Experimental results 20 4.2.1 Performance of perturbation detector 20 4.2.2 Effectiveness of augmenting input 21 4.2.3 Defend against overall attacks 22 Chapter 5 Conclusion 23 References 25	-
dc.language.iso	en	-
dc.subject	防禦對抗性攻擊	zh_TW
dc.subject	文字分類	zh_TW
dc.subject	text classification	en
dc.subject	adversarial attack defense	en
dc.title	防禦針對文字分類模型的對抗性攻擊	zh_TW
dc.title	Defend against adversarial attacks in text classification	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	紀博文;顏嗣鈞	zh_TW
dc.contributor.oralexamcommittee	Po-Wen Chi;Hsu-chun Yen	en
dc.subject.keyword	防禦對抗性攻擊,文字分類,	zh_TW
dc.subject.keyword	adversarial attack defense,text classification,	en
dc.relation.page	27	-
dc.identifier.doi	10.6342/NTU202300303	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-02-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf	1.81 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。