使用深度學習技術進行自動化的程式碼弱點偵測

柯以恆; Yi-Heng Ko

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王凡	zh_TW
dc.contributor.advisor	Farn Wang	en
dc.contributor.author	柯以恆	zh_TW
dc.contributor.author	Yi-Heng Ko	en
dc.date.accessioned	2024-07-23T16:32:26Z	-
dc.date.available	2024-07-24	-
dc.date.copyright	2024-07-23	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-17	-
dc.identifier.citation	[1] W. contributors, Shellshock (software bug), https://en.wikipedia.org/wiki/Shellshock_(software_bug). [2] CVE Details, Cve details - the ultimate security vulnerability datasource, https://www.cvedetails.com/. [3] R. Russell, L. Kim, L. Hamilton, et al., “Automated vulnerability detection in source code using deep representation learning,” Dec. 2018, pp. 757–762. DOI: 10.1109/ICMLA.2018.00120. [4] Z. Li, D. Zou, S. Xu, et al., “Vuldeepecker: A deep learning-based system for vulnerability detection,” in Proceedings 2018 Network and Distributed System Security Symposium, ser. NDSS 2018, Internet Society, 2018. DOI: 10.14722/ndss.2018.23158. [Online]. Available: http://dx.doi.org/10.14722/ndss.2018.23158. [5] B. Aloraini, M. Nagappan, D. M. German, S. Hayashi, and Y. Higo, “An empirical study of security warnings from static application security testing tools,” Journal of Systems and Software, vol. 158, p. 110427, 2019. [6] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static code analysis to detect software security vulnerabilities-does experience matter?” In 2009 International Conference on Availability, Reliability and Security, IEEE, 2009, pp. 804–810. [7] OWASP Foundation, Owasp testing guide v4.1, 2021. [Online]. Available: https://owasp.org/www-project-web-security-testing-guide/v41/. [8] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in neural information processing systems, vol. 32, 2019. [9] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in 2014 IEEE symposium on security and privacy, IEEE, 2014, pp. 590–604. [10] N. Ziems and S. Wu, Security vulnerability detection using deep learning natural language processing, 2021. arXiv:2105.02388. [11] F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in 2017 3rd IEEE international conference on computer and communications (ICCC), IEEE, 2017, pp. 1298–1302. [12] S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” CoRR, vol. abs/2009.07235, 2020. arXiv:2009.07235. [Online]. Available: https://arxiv.org/abs/2009.07235. [13] L. Wartschinski, Y. Noller, T. Vogel, T. Kehrer, and L. Grunske, “Vudenc: Vulnerability detection with deep learning on a natural codebase for python,” Information and Software Technology, vol. 144, p. 106809, 2022. [14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [15] R. Wang, S. Xu, X. Ji, Y. Tian, L. Gong, and K. Wang, “An extensive study of the effects of different deep learning models on code vulnerability detection in python code,” Automated Software Engg., vol. 31, no. 1, Jan. 2024, ISSN: 0928-8910. DOI: 10.1007/s10515-024-00413-4. [Online]. Available: https://doi.org/10.1007/s10515-024-00413-4. [16] PyCQA, Bandit. [Online]. Available: https://github.com/PyCQA/bandit. [17] Checkmarx. [Online]. Available: https://checkmarx.com/. [18] C. Parsing, “Speech and language processing,” Power Point Slides, 2009. [19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [20] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018. [21] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013. [22] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [23] Y. Goldberg, “A primer on neural network models for natural language processing,” Journal of Artificial Intelligence Research, vol. 57, pp. 345–420, 2016. [24] A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, “Pre-trained contextual embedding of source code,” CoRR, vol. abs/2001.00059, 2020. arXiv:2001.00059. [Online]. Available: http://arxiv.org/abs/2001.00059. [25] E. N. Akimova, A. Y. Bersenev, A. A. Deikov, et al., “A survey on software defect prediction using deep learning,” Mathematics, vol. 9, no. 11, p. 1180, 2021. [26] Z. Feng, D. Guo, D. Tang, et al., Codebert: A pre-trained model for programming and natural languages, 2020. arXiv:2002.08155. [27] OWASP. [Online]. Available: https://owasp.org/Top10/. [28] N. I. of Standards and T. (NIST). [Online]. Available: https://nvd.nist.gov/. [29] GitHub. [Online]. Available: https://github.com/advisories. [30] A. Hovsepyan, R. Scandariato, W. Joosen, and J. Walden, “Software vulnerability prediction using text analysis techniques,” in Proceedings of the 4th international workshop on Security measurements and metrics, 2012, pp. 7–10. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [32] H. K. Dam, T. Tran, and T. Pham, “A deep language model for software code,” arXiv preprint arXiv:1608.02715, 2016. [33] F. Chollet, Deep learning with Python. Simon and Schuster, 2021. [34] Markdown library for python, https://pypi.org/project/Markdown/.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257	-
dc.description.abstract	為了確保應用程式中不存在能夠被有心人士利用的漏洞，程式碼安全檢測在軟體開發中一直扮演一個重要角色。傳統的程式碼安全測試通常依賴手動檢查或基於規則的方法，這樣的方法可能相當耗時且容易出現人為錯誤。近年來隨著自然語言處理的發展，深度學習儼然成為程式碼安全測試的一種手段，我們將在這篇論文研究將深度學習技術應用於程式碼安全測試的可能性，目標是能夠提高軟體開發流程中安全分析的效率和效力。在本篇研究中，我們以長短期記憶模型作為模型架構對資料集進行訓練，並測試了兩種嵌入方法在生成程式碼向量表示上的效能以提高訓練效率。此外，我們還將在 GitHub 上蒐集的多個專案應用於論文中所提出的模型上，再把掃描結果與現有的靜態測試工具做比較並對其性能進行評估，結果顯示我們的研究成果比起市售的靜態安全測試軟體能達到更好的表現，最後透過分析實驗的數據，提出可能改進的方法。	zh_TW
dc.description.abstract	To avoid the existence of exploitable vulnerabilities within applications, security test- ing has always played a crucial role in software development. Traditional code security testing methods often rely on manual inspection or rule-based approaches, which can be time-consuming and prone to human error. With the recent advancements in natural lan- guage processing, deep learning has emerged as a viable approach for code security testing. In this thesis, we investigate the application of deep learning techniques to code security testing with the aim of enhancing the efficiency and effectiveness of security analysis in the software development process. In our study, we train our dataset using a Long Short-Term Memory (LSTM) model as the architecture and evaluate the performance of two embedding methods in generating code vector representations to increase training efficiency. Additionally, we apply our proposed models to multiple projects collected from GitHub, compare the scan results with existing static testing tools, and evaluate their performance. The results demonstrate that our research outcomes are perform better than commercially available static application security testing (SAST) tools. Through the analysis of experimental data, we propose potential improvements and future work for research.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-23T16:32:26Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-23T16:32:26Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i 誌謝 iii 摘要 iv ABSTRACT v CONTENTS vii LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Work 7 2.1 Machine Learning for Vulnerability Detection . . . . . . . . . . . . . 7 2.2 Deep Learning for Vulnerability Detection . . . . . . . . . . . . . . 7 2.3 Static Application Security Testing Tool . . . . . . . . . . . . . . . . 9 Chapter 3 Preliminaries 11 3.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . 11 3.2 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Source Code Embedding . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 4 Methodology 13 4.1 Vulnerability Type Selection . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Ground-truth Labeling . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Embedding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 Hyperparameter Tuning of LSTM Model Using Simulated Annealing 16 4.5.1 Simulated Annealing Overview . . . . . . . . . . . . . . . . . . . 1 4.5.2 Implementation of Simulated Annealing for LSTM Hyperparame- ter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.6 LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 5 Experiment 21 5.1 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.4 Comparison with SAST tools . . . . . . . . . . . . . . . . . . . . . 27 5.4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 6 Conclusion 33 References 35	-
dc.language.iso	en	-
dc.subject	弱點偵測	zh_TW
dc.subject	靜態分析	zh_TW
dc.subject	自然語言處理	zh_TW
dc.subject	深度學習	zh_TW
dc.subject	程式碼嵌入	zh_TW
dc.subject	Code embedding	en
dc.subject	Deep learning	en
dc.subject	Natural language processing	en
dc.subject	Vulnerability detection	en
dc.subject	Static analysis	en
dc.title	使用深度學習技術進行自動化的程式碼弱點偵測	zh_TW
dc.title	Automated Vulnerable Code Detection Using Deep Learning Technique	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陳銘憲;林宗男;葉國暉	zh_TW
dc.contributor.oralexamcommittee	Ming-Syan Chen;Tsung-Nan Lin;Kuo-Hui Yeh	en
dc.subject.keyword	靜態分析,弱點偵測,程式碼嵌入,深度學習,自然語言處理,	zh_TW
dc.subject.keyword	Static analysis,Vulnerability detection,Code embedding,Deep learning,Natural language processing,	en
dc.relation.page	37	-
dc.identifier.doi	10.6342/NTU202401783	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-07-18	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	1.97 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。