Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor雷欽隆zh_TW
dc.contributor.advisorChin-Laung Leien
dc.contributor.author林俊燁zh_TW
dc.contributor.authorChun-Yeh Linen
dc.date.accessioned2023-05-10T16:07:28Z-
dc.date.available2023-11-09-
dc.date.copyright2023-05-10-
dc.date.issued2023-
dc.date.submitted2023-02-10-
dc.identifier.citationImdb sentiment analysis dataset. https://www.imdb.com/interfaces/.
D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. Hotflip: White-box adversarial examples for text classification. arXiv preprint arXiv:1712.06751, 2017.
J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE, 2018.
S. Garg and G. Ramakrishnan. Bae: Bert-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970, 2020.
D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90, may 2017.
J. Li, S. Ji, T. Du, B. Li, and T. Wang. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271, 2018.
L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu. Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984, 2020.
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-moyer, and V. Stoyanov. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
S. Loria. textblob documentation. Release 0.15, 2, 2018.
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
N. Mrkšić, D. O. Séaghdha, B. Thomson, M. Gašić, L. Rojas-Barahona, P.-H. Su, D. Vandyke, T.-H. Wen, and S. Young. Counter-fitting word vectors to linguistic constraints. arXiv preprint arXiv:1603.00892, 2016.
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
L. Sun, K. Hashimoto, W. Yin, A. Asai, J. Li, P. Yu, and C. Xiong. Adv-bert: Bert is not robust on misspellings! generating nature adversarial samples on bert. arXiv preprint arXiv:2003.04985, 2020.
Y. Zhou, J.-Y. Jiang, K.-W. Chang, and W. Wang. Learning to discriminate perturbations for blocking adversarial attacks in text classification. arXiv preprint arXiv:1909.03084, 2019.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87129-
dc.description.abstract近年來深度學習在自然語言處理的問題上取得了卓越的成果,然而日常生活中常見的應用,例如垃圾訊息過濾以及情緒分析等等,都很容易受到對抗性攻擊,導致安全性上的疑慮。

本文提出兩個方法,干擾偵測可以判斷文字是否受到字元修改的攻擊,並接著基於上下文將受到修改的文字恢復成可能的替代字詞。在字詞替換攻擊上,藉由將重要的字詞替換成數個可能的替代文字以增加樣本數量,且預測結果為所有增加的樣本中最多數被分到的類別。

本文提出的方法可以在不需要知道模型參數以及調整模型架構的條件下抵禦對抗式攻擊。在IMDb資料集上所完成的實驗證明,本文的方法可以有效防禦在文字分類上的字元替換及字詞替換攻擊,並展現比比較基準更好的成果。
zh_TW
dc.description.abstractIn recent years, deep learning models have achieved prominent success on NLP tasks. However, widely used real-world applications such as spam filter and sentiment analysis are vulnerable to adversarial attacks.

This thesis proposes two methods to defend against adversarial attacks on the sentiment analysis task. Perturbation detector detects if a token in the sample is perturbed through character level attacks, and the recovery process recovers the words from the perturbed ones to possible substitutions based on the context. For word level attacks, augmenting inputs by replacing important words to their possible substitutions and the result of the original sample is the majority class among all the augmented samples.

Our methods can block adversarial attacks without knowing the model parameters and modifying model structures. Experiments on IMDb dataset demonstrate that our methods can effectively block both character level and word level attacks and outperform baseline method on text classification task.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-10T16:07:28Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-05-10T16:07:28Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements iii
摘要 v
Abstract vii
Contents ix
List of Figures xi
List of Tables xiii
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 TextBugger 3
2.2 TextFooler 5
2.3 Discriminate Perturbations 7
Chapter3 Methodology 9
3.1 Character level defense 9
3.1.1 Perturbation Detector 9
3.1.2 Perturbation Recovery 11
3.2 Word level defense 12
3.2.1 Important word replacing 13
3.3 Overall attack defense 14
Chapter 4 Experiments 17
4.1 Experiment settings 17
4.1.1 Dataset 17
4.1.2 Adversarial attacks 17
4.1.3 Base model and baseline 19
4.1.4 Evaluation metric 19
4.2 Experimental results 20
4.2.1 Performance of perturbation detector 20
4.2.2 Effectiveness of augmenting input 21
4.2.3 Defend against overall attacks 22
Chapter 5 Conclusion 23
References 25
-
dc.language.isoen-
dc.subject防禦對抗性攻擊zh_TW
dc.subject文字分類zh_TW
dc.subjecttext classificationen
dc.subjectadversarial attack defenseen
dc.title防禦針對文字分類模型的對抗性攻擊zh_TW
dc.titleDefend against adversarial attacks in text classificationen
dc.typeThesis-
dc.date.schoolyear111-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee紀博文;顏嗣鈞zh_TW
dc.contributor.oralexamcommitteePo-Wen Chi;Hsu-chun Yenen
dc.subject.keyword防禦對抗性攻擊,文字分類,zh_TW
dc.subject.keywordadversarial attack defense,text classification,en
dc.relation.page27-
dc.identifier.doi10.6342/NTU202300303-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-02-13-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-111-1.pdf1.81 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved