請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 雷欽隆 | zh_TW |
| dc.contributor.advisor | Chin-Laung Lei | en |
| dc.contributor.author | 林俊燁 | zh_TW |
| dc.contributor.author | Chun-Yeh Lin | en |
| dc.date.accessioned | 2023-05-10T16:07:28Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-05-10 | - |
| dc.date.issued | 2023 | - |
| dc.date.submitted | 2023-02-10 | - |
| dc.identifier.citation | Imdb sentiment analysis dataset. https://www.imdb.com/interfaces/.
D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751, 2017. J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018. S. Garg and G. Ramakrishnan. Bae: Bert-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970, 2020. D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, may 2017. J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271, 2018. L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984, 2020. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. S. Loria. textblob documentation. Release 0.15, 2, 2018. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. N. Mrkšić, D. O. Séaghdha, B. Thomson, M. Gašić, L. Rojas-Barahona, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016. J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014. L. Sun, K. Hashimoto, W. Yin, A. Asai, J. Li, P. Yu, and C. Xiong. Adv-bert: Bert is not robust on misspellings! generating nature adversarial samples on bert. arXiv preprint arXiv:2003.04985, 2020. Y. Zhou, J.-Y. Jiang, K.-W. Chang, and W. Wang. Learning to discriminate perturbations for blocking adversarial attacks in text classification. arXiv preprint arXiv:1909.03084, 2019. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129 | - |
| dc.description.abstract | 近年來深度學習在自然語言處理的問題上取得了卓越的成果,然而日常生活中常見的應用,例如垃圾訊息過濾以及情緒分析等等,都很容易受到對抗性攻擊,導致安全性上的疑慮。
本文提出兩個方法,干擾偵測可以判斷文字是否受到字元修改的攻擊,並接著基於上下文將受到修改的文字恢復成可能的替代字詞。在字詞替換攻擊上,藉由將重要的字詞替換成數個可能的替代文字以增加樣本數量,且預測結果為所有增加的樣本中最多數被分到的類別。 本文提出的方法可以在不需要知道模型參數以及調整模型架構的條件下抵禦對抗式攻擊。在IMDb資料集上所完成的實驗證明,本文的方法可以有效防禦在文字分類上的字元替換及字詞替換攻擊,並展現比比較基準更好的成果。 | zh_TW |
| dc.description.abstract | In recent years, deep learning models have achieved prominent success on NLP tasks. However, widely used real-world applications such as spam filter and sentiment analysis are vulnerable to adversarial attacks.
This thesis proposes two methods to defend against adversarial attacks on the sentiment analysis task. Perturbation detector detects if a token in the sample is perturbed through character level attacks, and the recovery process recovers the words from the perturbed ones to possible substitutions based on the context. For word level attacks, augmenting inputs by replacing important words to their possible substitutions and the result of the original sample is the majority class among all the augmented samples. Our methods can block adversarial attacks without knowing the model parameters and modifying model structures. Experiments on IMDb dataset demonstrate that our methods can effectively block both character level and word level attacks and outperform baseline method on text classification task. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-10T16:07:28Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-05-10T16:07:28Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements iii 摘要 v Abstract vii Contents ix List of Figures xi List of Tables xiii Chapter 1 Introduction 1 Chapter 2 Related Work 3 2.1 TextBugger 3 2.2 TextFooler 5 2.3 Discriminate Perturbations 7 Chapter3 Methodology 9 3.1 Character level defense 9 3.1.1 Perturbation Detector 9 3.1.2 Perturbation Recovery 11 3.2 Word level defense 12 3.2.1 Important word replacing 13 3.3 Overall attack defense 14 Chapter 4 Experiments 17 4.1 Experiment settings 17 4.1.1 Dataset 17 4.1.2 Adversarial attacks 17 4.1.3 Base model and baseline 19 4.1.4 Evaluation metric 19 4.2 Experimental results 20 4.2.1 Performance of perturbation detector 20 4.2.2 Effectiveness of augmenting input 21 4.2.3 Defend against overall attacks 22 Chapter 5 Conclusion 23 References 25 | - |
| dc.language.iso | en | - |
| dc.subject | 防禦對抗性攻擊 | zh_TW |
| dc.subject | 文字分類 | zh_TW |
| dc.subject | text classification | en |
| dc.subject | adversarial attack defense | en |
| dc.title | 防禦針對文字分類模型的對抗性攻擊 | zh_TW |
| dc.title | Defend against adversarial attacks in text classification | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 紀博文;顏嗣鈞 | zh_TW |
| dc.contributor.oralexamcommittee | Po-Wen Chi;Hsu-chun Yen | en |
| dc.subject.keyword | 防禦對抗性攻擊,文字分類, | zh_TW |
| dc.subject.keyword | adversarial attack defense,text classification, | en |
| dc.relation.page | 27 | - |
| dc.identifier.doi | 10.6342/NTU202300303 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-02-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-1.pdf | 1.81 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
