以程式呼叫 API 之時間序列檢測惡意程式

YAo-Wen Xu; 許耀文

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15279

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	雷欽隆
dc.contributor.author	YAo-Wen Xu	en
dc.contributor.author	許耀文	zh_TW
dc.date.accessioned	2021-06-07T17:29:17Z	-
dc.date.copyright	2020-02-17
dc.date.issued	2019
dc.date.submitted	2020-02-07
dc.identifier.citation	[1] Ki, Y., Kim, E., & Kim, H. (2015, June). A Novel Approach to Detect Malware Based on API Call Sequence Analysis. In International Journal of Distributed Sensor Networks. Xx(xx), 1-9 [2] Fujino, A., Murakami, J.& Mori, T. (2015, July). Discovering Similar Malware Samples Using API Call Topics. In IEEE Consumer Communications and Networking Conference (CCNC) [3] Pektaş, A. & Acarman, T. (2017, September). Malware classification based on API calls and behavior analysis. In IET Information Security. (pp. 107 – 117) [4] Li, Q. & Li, X. (2015, September). Android Malware Detection Based on Static Analysis of Characteristic Tree. In Cyber-Enabled Distributed Computing and Knowledge Discovery, 2015 International Conference. [5] Kolbitsch, C., Comparetti, P. M., Kruegel, C., Kirda, E., Zhou, X.-y., & Wang, X. (2009). “Effective and efficient malware detection at the end host.” in USENIX Security Symposium, 2009, pp. 351–366. [6] Tian, R., Islam, R., Batten, L., & Versteeg, S. (2010). “Differentiating malware from cleanware using behavioural analysis,” in Malicious and Unwanted Software (MALWARE), 2010 5th International Conference on, 2010, pp. 23–30. [7] Salehi, Z., Ghiasi, M. & A. Sami, “A miner for malware detection based on api function calls and their arguments,” in Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on, 2012, pp. 563–568. [8] Trinius, P., Willems, C., Holz, T., & Rieck, K., “A malware instruction set for behavior-based analysis,” University of Mannheim, Tech. Rep., 2011. [9] Fan, C., Hsiao, H., Chou, C. & Tseng, Y. (2015, July). Malware Detection System Based on API Log Data Mining. In Annual Computer Software and Applications Conference. [10] Tran, T. & Sato, H. (2017). NLP-based Approaches for Malware Classification from API Sequences. In Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES). [11] Salehi, Z., Ghiasi, M. & Sami, A. (2012, May). A Miner for Malware Detection Based on API Function Calls and Their Arguments. In International Symposium on Artificial Intelligence and Signal Processing (AISP 2012). [12] Kim, Y. (2014, October). Convolutional Neural Networks for Sentence Classification. In Association for Computational Linguistics. [13] Zhang, Y.& Wallace, C. (2016, April). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification. (2015, October). [14] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., …& Liu, T. Y.(2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems (pp. 3146-3154). [15] L. Nataraj, V. Yegneswaran, P. Porras, J. Zhang, 'A Comparative Assessment of Malware Classification Using Binary Texture Analysis and Dynamic Analysis', Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 21-30, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/15279	-
dc.description.abstract	網際網路上充斥著各種病毒、木馬和惡意程式，因此極有可能在使用者毫無察覺的情況下，遭受病毒、木馬和惡意程式的攻擊。市面上有許多的防毒軟體、木馬清除程式和惡意軟體移除程式等，這些工具雖然能夠防止已知的病毒、木馬和惡意程式等，但對未知的惡意程式則無法有效的防範，不同使用者電腦上所執行的軟體也會不同，要想完全依靠防毒軟體掃描惡意程式的存在並不是那麼容易。靜態檢測的方法越來越困難，因為可以透過混淆的技術，例如加密，使用無用的代碼使其像是正常的程式，繞過等等，為了克服這個問題，使用動態檢測技術，在程式執行時監控他所呼叫的API，檢測是否為惡意程式。在這篇論文研究中，程式執行時呼叫的應用程式介面(API, application programming interface)是我們主要要分析的，每個程式在呼叫API時, 呼叫的API有時間序列關係, 我們利用這個時間序列來檢測惡意程式。	zh_TW
dc.description.abstract	The Internet is full of viruses, Trojans, and malware, so it's highly likely to be attacked by viruses, Trojans, and malware without the user's awareness. There are many anti-virus software, Trojan removal programs and malicious software removal programs on the market. Although these tools can prevent known viruses, Trojans and malicious programs, they cannot effectively prevent unknown malicious programs. Different users The software that is executed on the computer will be different. It is not so easy to completely rely on the anti-virus software to scan for malicious programs. The method of static detection is more and more difficult because it can use obfuscated techniques, such as encryption, useless code to make it look like a normal program, bypass, etc. In order to overcome this problem, use dynamic detection technology during program execution. Monitor the API he is calling to detect if it is a malicious program. In this paper study, the application interface (API, application programming interface) that the program executes when calling is the main analysis. When each program calls the API, the API of the call has a time series relationship. We use this time. Sequence to detect malware.	en
dc.description.provenance	Made available in DSpace on 2021-06-07T17:29:17Z (GMT). No. of bitstreams: 1 ntu-108-R06921075-1.pdf: 1269188 bytes, checksum: 3115520f477d80a85000f50b838408e4 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 Chapter 2 Related Work 4 Chapter 3 Background 6 3.1 Windows API 6 3.2 Representation of text 6 3.2.1 One-hot encoding 6 3.2.2 Word embedding 7 3.3 Keras 8 3.4 Scikit-learn 8 Chapter 4 Datasets and Feature engineering 10 4.1 Datasets 10 4.2 Feature engineering 13 4.2.1 n-gram tf-idf 13 4.2.2 Feature extend 14 4.2.3 word embedding 17 Chapter 5 Methodology 18 5.1 Overall architecture 18 5.2 Used Models 18 5.2.1 XGBoost 18 5.2.2 LightGBM 19 5.2.3 TextCNN 20 Chapter 6 Evaluation8 22 6.1 Evaluation metrics 22 6.2 Evaluation dataset 23 6.3 Feature analysis 25 6.3.1 NLP feature 25 6.3.2 Statistical feature 26 6.3.3 Combined with NLP features and statistical features 27 6.4 Model analysis 28 6.4.1 Performance comparison 28 6.4.2 Ensemble 28 6.5 Compare with previous paper 29 Chapter 7 Conclusion 30 REFERENCE 31
dc.language.iso	en
dc.subject	自然語言處理	zh_TW
dc.subject	惡意程式	zh_TW
dc.subject	時間序列	zh_TW
dc.subject	應用程式介面	zh_TW
dc.subject	malware	en
dc.subject	natural language processing	en
dc.subject	application interface	en
dc.subject	time series	en
dc.title	以程式呼叫 API 之時間序列檢測惡意程式	zh_TW
dc.title	Detecting malware in the time sequence of the program API call	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	郭斯彥,顏嗣鈞(hcyen@ntu.edu.tw),王銘宏,紀博文
dc.subject.keyword	惡意程式,時間序列,應用程式介面,自然語言處理,	zh_TW
dc.subject.keyword	malware,time series,application interface,natural language processing,	en
dc.relation.page	32
dc.identifier.doi	10.6342/NTU202000344
dc.rights.note	未授權
dc.date.accepted	2020-02-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	1.24 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。