Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor王凡zh_TW
dc.contributor.advisorFarn Wangen
dc.contributor.author柯以恆zh_TW
dc.contributor.authorYi-Heng Koen
dc.date.accessioned2024-07-23T16:32:26Z-
dc.date.available2024-07-24-
dc.date.copyright2024-07-23-
dc.date.issued2024-
dc.date.submitted2024-07-17-
dc.identifier.citation[1] W. contributors, Shellshock (software bug), https://en.wikipedia.org/wiki/Shellshock_(software_bug).
[2] CVE Details, Cve details - the ultimate security vulnerability datasource, https://www.cvedetails.com/.
[3] R. Russell, L. Kim, L. Hamilton, et al., “Automated vulnerability detection in source code using deep representation learning,” Dec. 2018, pp. 757–762. DOI: 10.1109/ICMLA.2018.00120.
[4] Z. Li, D. Zou, S. Xu, et al., “Vuldeepecker: A deep learning-based system for vulnerability detection,” in Proceedings 2018 Network and Distributed System Security Symposium, ser. NDSS 2018, Internet Society, 2018. DOI: 10.14722/ndss.2018.23158. [Online]. Available: http://dx.doi.org/10.14722/ndss.2018.23158.
[5] B. Aloraini, M. Nagappan, D. M. German, S. Hayashi, and Y. Higo, “An empirical study of security warnings from static application security testing tools,” Journal of Systems and Software, vol. 158, p. 110427, 2019.
[6] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static code analysis to detect software security vulnerabilities-does experience matter?” In 2009 International Conference on Availability, Reliability and Security, IEEE, 2009, pp. 804–810.
[7] OWASP Foundation, Owasp testing guide v4.1, 2021. [Online]. Available: https://owasp.org/www-project-web-security-testing-guide/v41/.
[8] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in neural information processing systems, vol. 32, 2019.
[9] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in 2014 IEEE symposium on security and privacy, IEEE, 2014, pp. 590–604.
[10] N. Ziems and S. Wu, Security vulnerability detection using deep learning natural language processing, 2021. arXiv:2105.02388.
[11] F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in 2017 3rd IEEE international conference on computer and communications (ICCC), IEEE, 2017, pp. 1298–1302.
[12] S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” CoRR, vol. abs/2009.07235, 2020. arXiv:2009.07235. [Online]. Available: https://arxiv.org/abs/2009.07235.
[13] L. Wartschinski, Y. Noller, T. Vogel, T. Kehrer, and L. Grunske, “Vudenc: Vulnerability detection with deep learning on a natural codebase for python,” Information and Software Technology, vol. 144, p. 106809, 2022.
[14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[15] R. Wang, S. Xu, X. Ji, Y. Tian, L. Gong, and K. Wang, “An extensive study of the effects of different deep learning models on code vulnerability detection in python code,” Automated Software Engg., vol. 31, no. 1, Jan. 2024, ISSN: 0928-8910. DOI: 10.1007/s10515-024-00413-4. [Online]. Available: https://doi.org/10.1007/s10515-024-00413-4.
[16] PyCQA, Bandit. [Online]. Available: https://github.com/PyCQA/bandit.
[17] Checkmarx. [Online]. Available: https://checkmarx.com/.
[18] C. Parsing, “Speech and language processing,” Power Point Slides, 2009.
[19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[20] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018.
[21] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
[22] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
[23] Y. Goldberg, “A primer on neural network models for natural language processing,” Journal of Artificial Intelligence Research, vol. 57, pp. 345–420, 2016.
[24] A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, “Pre-trained contextual embedding of source code,” CoRR, vol. abs/2001.00059, 2020. arXiv:2001.00059. [Online]. Available: http://arxiv.org/abs/2001.00059.
[25] E. N. Akimova, A. Y. Bersenev, A. A. Deikov, et al., “A survey on software defect prediction using deep learning,” Mathematics, vol. 9, no. 11, p. 1180, 2021.
[26] Z. Feng, D. Guo, D. Tang, et al., Codebert: A pre-trained model for programming and natural languages, 2020. arXiv:2002.08155.
[27] OWASP. [Online]. Available: https://owasp.org/Top10/.
[28] N. I. of Standards and T. (NIST). [Online]. Available: https://nvd.nist.gov/.
[29] GitHub. [Online]. Available: https://github.com/advisories.
[30] A. Hovsepyan, R. Scandariato, W. Joosen, and J. Walden, “Software vulnerability prediction using text analysis techniques,” in Proceedings of the 4th international workshop on Security measurements and metrics, 2012, pp. 7–10.
[31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[32] H. K. Dam, T. Tran, and T. Pham, “A deep language model for software code,” arXiv preprint arXiv:1608.02715, 2016.
[33] F. Chollet, Deep learning with Python. Simon and Schuster, 2021.
[34] Markdown library for python, https://pypi.org/project/Markdown/.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257-
dc.description.abstract為了確保應用程式中不存在能夠被有心人士利用的漏洞,程式碼安全檢測在 軟體開發中一直扮演一個重要角色。傳統的程式碼安全測試通常依賴手動檢查或 基於規則的方法,這樣的方法可能相當耗時且容易出現人為錯誤。近年來隨著自 然語言處理的發展,深度學習儼然成為程式碼安全測試的一種手段,我們將在這 篇論文研究將深度學習技術應用於程式碼安全測試的可能性,目標是能夠提高軟 體開發流程中安全分析的效率和效力。在本篇研究中,我們以長短期記憶模型作 為模型架構對資料集進行訓練,並測試了兩種嵌入方法在生成程式碼向量表示上 的效能以提高訓練效率。此外,我們還將在 GitHub 上蒐集的多個專案應用於論 文中所提出的模型上,再把掃描結果與現有的靜態測試工具做比較並對其性能進 行評估,結果顯示我們的研究成果比起市售的靜態安全測試軟體能達到更好的表 現,最後透過分析實驗的數據,提出可能改進的方法。zh_TW
dc.description.abstractTo avoid the existence of exploitable vulnerabilities within applications, security test- ing has always played a crucial role in software development. Traditional code security testing methods often rely on manual inspection or rule-based approaches, which can be time-consuming and prone to human error. With the recent advancements in natural lan- guage processing, deep learning has emerged as a viable approach for code security testing. In this thesis, we investigate the application of deep learning techniques to code security testing with the aim of enhancing the efficiency and effectiveness of security analysis in the software development process. In our study, we train our dataset using a Long Short-Term Memory (LSTM) model as the architecture and evaluate the performance of two embedding methods in generating code vector representations to increase training efficiency. Additionally, we apply our proposed models to multiple projects collected from GitHub, compare the scan results with existing static testing tools, and evaluate their performance. The results demonstrate that our research outcomes are perform better than commercially available static application security testing (SAST) tools. Through the analysis of experimental data, we propose potential improvements and future work for research.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-23T16:32:26Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-07-23T16:32:26Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
誌謝 iii
摘要 iv
ABSTRACT v
CONTENTS vii
LIST OF FIGURES ix
LIST OF TABLES xi
Chapter 1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 Related Work 7
2.1 Machine Learning for Vulnerability Detection . . . . . . . . . . . . . 7
2.2 Deep Learning for Vulnerability Detection . . . . . . . . . . . . . . 7
2.3 Static Application Security Testing Tool . . . . . . . . . . . . . . . . 9
Chapter 3 Preliminaries 11
3.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . 11
3.2 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Source Code Embedding . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 4 Methodology 13
4.1 Vulnerability Type Selection . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Ground-truth Labeling . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Embedding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.5 Hyperparameter Tuning of LSTM Model Using Simulated Annealing 16
4.5.1 Simulated Annealing Overview . . . . . . . . . . . . . . . . . . . 1

4.5.2 Implementation of Simulated Annealing for LSTM Hyperparame-
ter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 5 Experiment 21
5.1 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Comparison with SAST tools . . . . . . . . . . . . . . . . . . . . . 27
5.4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 6 Conclusion 33
References 35
-
dc.language.isoen-
dc.subject弱點偵測zh_TW
dc.subject靜態分析zh_TW
dc.subject自然語言處理zh_TW
dc.subject深度學習zh_TW
dc.subject程式碼嵌入zh_TW
dc.subjectCode embeddingen
dc.subjectDeep learningen
dc.subjectNatural language processingen
dc.subjectVulnerability detectionen
dc.subjectStatic analysisen
dc.title使用深度學習技術進行自動化的程式碼弱點偵測zh_TW
dc.titleAutomated Vulnerable Code Detection Using Deep Learning Techniqueen
dc.typeThesis-
dc.date.schoolyear112-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳銘憲;林宗男;葉國暉zh_TW
dc.contributor.oralexamcommitteeMing-Syan Chen;Tsung-Nan Lin;Kuo-Hui Yehen
dc.subject.keyword靜態分析,弱點偵測,程式碼嵌入,深度學習,自然語言處理,zh_TW
dc.subject.keywordStatic analysis,Vulnerability detection,Code embedding,Deep learning,Natural language processing,en
dc.relation.page37-
dc.identifier.doi10.6342/NTU202401783-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2024-07-18-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電機工程學系-
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-112-2.pdf1.97 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved