請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 王凡 | zh_TW |
| dc.contributor.advisor | Farn Wang | en |
| dc.contributor.author | 柯以恆 | zh_TW |
| dc.contributor.author | Yi-Heng Ko | en |
| dc.date.accessioned | 2024-07-23T16:32:26Z | - |
| dc.date.available | 2024-07-24 | - |
| dc.date.copyright | 2024-07-23 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-07-17 | - |
| dc.identifier.citation | [1] W. contributors, Shellshock (software bug), https://en.wikipedia.org/wiki/Shellshock_(software_bug).
[2] CVE Details, Cve details - the ultimate security vulnerability datasource, https://www.cvedetails.com/. [3] R. Russell, L. Kim, L. Hamilton, et al., “Automated vulnerability detection in source code using deep representation learning,” Dec. 2018, pp. 757–762. DOI: 10.1109/ICMLA.2018.00120. [4] Z. Li, D. Zou, S. Xu, et al., “Vuldeepecker: A deep learning-based system for vulnerability detection,” in Proceedings 2018 Network and Distributed System Security Symposium, ser. NDSS 2018, Internet Society, 2018. DOI: 10.14722/ndss.2018.23158. [Online]. Available: http://dx.doi.org/10.14722/ndss.2018.23158. [5] B. Aloraini, M. Nagappan, D. M. German, S. Hayashi, and Y. Higo, “An empirical study of security warnings from static application security testing tools,” Journal of Systems and Software, vol. 158, p. 110427, 2019. [6] D. Baca, K. Petersen, B. Carlsson, and L. Lundberg, “Static code analysis to detect software security vulnerabilities-does experience matter?” In 2009 International Conference on Availability, Reliability and Security, IEEE, 2009, pp. 804–810. [7] OWASP Foundation, Owasp testing guide v4.1, 2021. [Online]. Available: https://owasp.org/www-project-web-security-testing-guide/v41/. [8] Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” Advances in neural information processing systems, vol. 32, 2019. [9] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in 2014 IEEE symposium on security and privacy, IEEE, 2014, pp. 590–604. [10] N. Ziems and S. Wu, Security vulnerability detection using deep learning natural language processing, 2021. arXiv:2105.02388. [11] F. Wu, J. Wang, J. Liu, and W. Wang, “Vulnerability detection with deep learning,” in 2017 3rd IEEE international conference on computer and communications (ICCC), IEEE, 2017, pp. 1298–1302. [12] S. Chakraborty, R. Krishna, Y. Ding, and B. Ray, “Deep learning based vulnerability detection: Are we there yet?” CoRR, vol. abs/2009.07235, 2020. arXiv:2009.07235. [Online]. Available: https://arxiv.org/abs/2009.07235. [13] L. Wartschinski, Y. Noller, T. Vogel, T. Kehrer, and L. Grunske, “Vudenc: Vulnerability detection with deep learning on a natural codebase for python,” Information and Software Technology, vol. 144, p. 106809, 2022. [14] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. [15] R. Wang, S. Xu, X. Ji, Y. Tian, L. Gong, and K. Wang, “An extensive study of the effects of different deep learning models on code vulnerability detection in python code,” Automated Software Engg., vol. 31, no. 1, Jan. 2024, ISSN: 0928-8910. DOI: 10.1007/s10515-024-00413-4. [Online]. Available: https://doi.org/10.1007/s10515-024-00413-4. [16] PyCQA, Bandit. [Online]. Available: https://github.com/PyCQA/bandit. [17] Checkmarx. [Online]. Available: https://checkmarx.com/. [18] C. Parsing, “Speech and language processing,” Power Point Slides, 2009. [19] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. [20] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, et al., “Improving language understanding by generative pre-training,” 2018. [21] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013. [22] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543. [23] Y. Goldberg, “A primer on neural network models for natural language processing,” Journal of Artificial Intelligence Research, vol. 57, pp. 345–420, 2016. [24] A. Kanade, P. Maniatis, G. Balakrishnan, and K. Shi, “Pre-trained contextual embedding of source code,” CoRR, vol. abs/2001.00059, 2020. arXiv:2001.00059. [Online]. Available: http://arxiv.org/abs/2001.00059. [25] E. N. Akimova, A. Y. Bersenev, A. A. Deikov, et al., “A survey on software defect prediction using deep learning,” Mathematics, vol. 9, no. 11, p. 1180, 2021. [26] Z. Feng, D. Guo, D. Tang, et al., Codebert: A pre-trained model for programming and natural languages, 2020. arXiv:2002.08155. [27] OWASP. [Online]. Available: https://owasp.org/Top10/. [28] N. I. of Standards and T. (NIST). [Online]. Available: https://nvd.nist.gov/. [29] GitHub. [Online]. Available: https://github.com/advisories. [30] A. Hovsepyan, R. Scandariato, W. Joosen, and J. Walden, “Software vulnerability prediction using text analysis techniques,” in Proceedings of the 4th international workshop on Security measurements and metrics, 2012, pp. 7–10. [31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. [32] H. K. Dam, T. Tran, and T. Pham, “A deep language model for software code,” arXiv preprint arXiv:1608.02715, 2016. [33] F. Chollet, Deep learning with Python. Simon and Schuster, 2021. [34] Markdown library for python, https://pypi.org/project/Markdown/. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93257 | - |
| dc.description.abstract | 為了確保應用程式中不存在能夠被有心人士利用的漏洞,程式碼安全檢測在 軟體開發中一直扮演一個重要角色。傳統的程式碼安全測試通常依賴手動檢查或 基於規則的方法,這樣的方法可能相當耗時且容易出現人為錯誤。近年來隨著自 然語言處理的發展,深度學習儼然成為程式碼安全測試的一種手段,我們將在這 篇論文研究將深度學習技術應用於程式碼安全測試的可能性,目標是能夠提高軟 體開發流程中安全分析的效率和效力。在本篇研究中,我們以長短期記憶模型作 為模型架構對資料集進行訓練,並測試了兩種嵌入方法在生成程式碼向量表示上 的效能以提高訓練效率。此外,我們還將在 GitHub 上蒐集的多個專案應用於論 文中所提出的模型上,再把掃描結果與現有的靜態測試工具做比較並對其性能進 行評估,結果顯示我們的研究成果比起市售的靜態安全測試軟體能達到更好的表 現,最後透過分析實驗的數據,提出可能改進的方法。 | zh_TW |
| dc.description.abstract | To avoid the existence of exploitable vulnerabilities within applications, security test- ing has always played a crucial role in software development. Traditional code security testing methods often rely on manual inspection or rule-based approaches, which can be time-consuming and prone to human error. With the recent advancements in natural lan- guage processing, deep learning has emerged as a viable approach for code security testing. In this thesis, we investigate the application of deep learning techniques to code security testing with the aim of enhancing the efficiency and effectiveness of security analysis in the software development process. In our study, we train our dataset using a Long Short-Term Memory (LSTM) model as the architecture and evaluate the performance of two embedding methods in generating code vector representations to increase training efficiency. Additionally, we apply our proposed models to multiple projects collected from GitHub, compare the scan results with existing static testing tools, and evaluate their performance. The results demonstrate that our research outcomes are perform better than commercially available static application security testing (SAST) tools. Through the analysis of experimental data, we propose potential improvements and future work for research. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-23T16:32:26Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-07-23T16:32:26Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
誌謝 iii 摘要 iv ABSTRACT v CONTENTS vii LIST OF FIGURES ix LIST OF TABLES xi Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter 2 Related Work 7 2.1 Machine Learning for Vulnerability Detection . . . . . . . . . . . . . 7 2.2 Deep Learning for Vulnerability Detection . . . . . . . . . . . . . . 7 2.3 Static Application Security Testing Tool . . . . . . . . . . . . . . . . 9 Chapter 3 Preliminaries 11 3.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . 11 3.2 Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Source Code Embedding . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 4 Methodology 13 4.1 Vulnerability Type Selection . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Dataset Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Ground-truth Labeling . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.4 Embedding Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.5 Hyperparameter Tuning of LSTM Model Using Simulated Annealing 16 4.5.1 Simulated Annealing Overview . . . . . . . . . . . . . . . . . . . 1 4.5.2 Implementation of Simulated Annealing for LSTM Hyperparame- ter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.6 LSTM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 5 Experiment 21 5.1 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.2.2 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2.3 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.4 Comparison with SAST tools . . . . . . . . . . . . . . . . . . . . . 27 5.4.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 6 Conclusion 33 References 35 | - |
| dc.language.iso | en | - |
| dc.subject | 弱點偵測 | zh_TW |
| dc.subject | 靜態分析 | zh_TW |
| dc.subject | 自然語言處理 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | 程式碼嵌入 | zh_TW |
| dc.subject | Code embedding | en |
| dc.subject | Deep learning | en |
| dc.subject | Natural language processing | en |
| dc.subject | Vulnerability detection | en |
| dc.subject | Static analysis | en |
| dc.title | 使用深度學習技術進行自動化的程式碼弱點偵測 | zh_TW |
| dc.title | Automated Vulnerable Code Detection Using Deep Learning Technique | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳銘憲;林宗男;葉國暉 | zh_TW |
| dc.contributor.oralexamcommittee | Ming-Syan Chen;Tsung-Nan Lin;Kuo-Hui Yeh | en |
| dc.subject.keyword | 靜態分析,弱點偵測,程式碼嵌入,深度學習,自然語言處理, | zh_TW |
| dc.subject.keyword | Static analysis,Vulnerability detection,Code embedding,Deep learning,Natural language processing, | en |
| dc.relation.page | 37 | - |
| dc.identifier.doi | 10.6342/NTU202401783 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2024-07-18 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 電機工程學系 | - |
| 顯示於系所單位: | 電機工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf | 1.97 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
