惡意PowerShell腳本的自動解混淆方法

Meng-Chiao Hsieh; 謝孟橋

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74584

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	雷欽隆
dc.contributor.author	Meng-Chiao Hsieh	en
dc.contributor.author	謝孟橋	zh_TW
dc.date.accessioned	2021-06-17T08:44:07Z	-
dc.date.available	2020-08-15
dc.date.copyright	2019-08-15
dc.date.issued	2019
dc.date.submitted	2019-08-07
dc.identifier.citation	[1] McAfee, 'McAfee Labs Threat Report,' 2017. [2] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert, 'ZOZZLE: fast and precise in-browser JavaScript malware detection,' in SEC'11 Proceedings of the 20th USENIX conference on Security, 2011. [3] I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier, 'Clone Detection Using Abstract Syntax Trees,' in International Conference on Software Maintenance, 1998. [4] lzybkr. ShowPSAst. Available: https://github.com/lzybkr/ShowPSAst [5] LeeHolmes. Revoke-Obfuscation. Available: https://github.com/danielbohannon/Revoke-Obfuscation [6] D. Hendler, S. Kels, and A. Rubin, 'Detecting Malicious PowerShell Commands using Deep Neural Networks,' Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 187-197, June 4 2018. [7] D. Hendler, S. Kels, and A. Rubin, 'Detecting Malicious PowerShell Scripts Using Contextual Embeddings,' arXiv preprint arXiv:1905.09538, 2019. [8] G. Rusak, A. Al-Dujaili, and U.-M. O’Reilly. (2018) AST-Based Deep Learning for Detecting Malicious PowerShell. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security - CCS 18. 2276-2278. [9] A. Rousseau, 'Hijacking .NET to Defend PowerShell,' arXiv preprint arXiv:1709.07508, 2017. [10] C. Liu, B. Xia, M. Yu, and Y. Liu, 'PSDEM: A Feasible De-Obfuscation Method for Malicious PowerShell Detection,' in IEEE Symposium on Computers and Communications, 2018, pp. 817-824. [11] D. Ugarte, D. Maiorca, F. Cara, and G. Giacinto, 'PowerDrive: Accurate De-Obfuscation and Analysis of PowerShell Malware,' arXiv preprint arXiv:1904.10270, 2019. [12] S. Aebersold, K. Kryszczuk, S. Paganoni, B. Tellenbach, and T. Trowbridge, 'Detecting Obfuscated JavaScripts using Machine Learning,' in ICIMP 2016 : The Eleventh International Conference on Internet Monitoring and Protection, 2016. [13] S. Kim, S. Hong, J. Oh, and H. Lee, 'Obfuscated VBA Macro Detection Using Machine Learning,' in 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2018. [14] M. AbdelKhalek and A. Shosha, 'JSDES - An Automated De-Obfuscation System for Malicious JavaScript,' in Proceedings of the 12th International Conference on Availability, Reliability and Security, 2017. [15] L. Mou et al., 'Building program vector representations for deep learning,' arXiv preprint arXiv:1409.3358, 2014. [16] L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, 'Convolutional Neural Networks over Tree Structures for Programming Language Processing,' Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 1287-1293, 2016. [17] D. Cournapeau. Scikit-Learn. Available: https://scikit-learn.org/stable/ [18] T. Chen and C. Guestrin, 'XGBoost: A Scalable Tree Boosting System,' in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 16, 2016. [19] Facebook. PyTorch. Available: https://pytorch.org/
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74584	-
dc.description.abstract	多年來，惡意程式一直是攻擊者用以大範圍攻擊的好選擇之一。有鑑於此，許多使用者對於副檔名為執行檔(exe)的程式也會特別有警覺，避免下載、點擊執行該類檔案，避免電腦中毒。但是對於許多文字檔(doc)、表格檔(xls)等卻不會有同等的警覺心，所以攻擊者開始將惡意程式碼夾帶於此類檔案中，用此類檔案降低使用者的戒心來增加攻擊的成功率。夾帶於上述文字檔、表格檔中的惡意程式碼可用多種語言編寫，但其中PowerShell語言因為有著強大的功能也易於混淆，且PowerShell語言是Windows作業系統中的預設安裝的殼程式，因此有許多的攻擊者便選用PowerShell語言來編寫這些惡意程式。這些惡意程式碼為了繞過防毒軟體，避免被其偵測到，同時為了避免被研究人員輕易分析，因此攻擊者多會使用一至數種的混淆方式將程式碼混淆，以此繞過防毒軟體的偵測，並增加研究人員分析的困難度。我們提出了一個自動化去除這些經混淆的惡意PowerShell腳本的方法。在我們的架構中，會先利用機器學習為基底的分類器來分辨此混淆的程式碼使用了哪種混淆方式，再根據分類結果來解混淆。由於一個混淆的程式碼可能會使用不只一種混淆方法，所以我們會將解過混淆之程式碼再重新送至分類器中做分類及解混淆，直到我們的分類器認為該程式碼已經不再有混淆為止。在我們的資料集中，我們將混淆方法分為8種，並針對每種混淆方法開發解混淆方式；在分類器方面也選用多種機器學習方法，來分析和比較各種機器學習模型的準確度。	zh_TW
dc.description.abstract	For many years, malware is a good choice for attackers to launch a large-scale attack. Therefore, many users are particularly alert to programs whose extensions are .exe, avoiding downloading or executing such files to protect their computer from this kind of attack. On the contrary, users do not have the same alertness to Microsoft Word documents or Microsoft Excel documents, since they use such files too often, and they look very harmless. As a result, attackers are increasingly utilizing Word, Excel etc. documents as a malicious script carrier. There are many options for attackers to write these malicious scripts; however, PowerShell is favored by many attackers since it is a very powerful language and it is a pre-installed shell scripting language on Windows machine. To bypass anti-virus system, and to increase the analysis difficulty for the researcher, attackers usually use many methods to obfuscate their malicious script. We proposed a system to de-obfuscate these obfuscated malicious PowerShell scripts. In our system, we use machine learning-based classifier to identify which obfuscation method is use in the target script, and then de-obfuscate it. However, attackers may use many obfuscation methods in one file, so we will send the result back to the classifier and de-obfuscate it until our classifier think that file is totally de-obfuscated. In our dataset, we divide the obfuscation methods into 8 types, and develop a de-obfuscation method for each type. Besides, a variety of machine learning methods are used in the classifier to analyze and compare the accuracy of various machine learning models.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:44:07Z (GMT). No. of bitstreams: 1 ntu-108-R06921109-1.pdf: 1948346 bytes, checksum: fd01f5a55d69076907b08c2e4074cebb (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	List of Figures c List of Tables d Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Background 2 1.2.1 Abstract Syntax Tree 2 1.2.2 PowerShell 2 1.3 Drawback of the Existing Methods 3 1.4 Our Contribution 4 1.5 Thesis Organization 4 Chapter 2 Related Work 6 Chapter 3 Methodology 9 3.1 The Overall Framework of Automated De-Obfuscation System 9 3.2 Data Preprocessing 9 3.2.1 Real-valued Representation of Abstract Syntax Tree Nodes 12 3.2.2 Real-valued Representation of an Abstract Syntax Tree 12 3.3 Machine Learning-based Classifier 12 3.3.1 Support Vector Machine 12 3.3.2 XGBoost 13 3.3.3 Random Forest 13 3.3.4 Deep Neural Network 13 3.4 De-Obfuscating Stage 14 3.4.1 Indexing Obfuscation 14 3.4.2 Encoding Obfuscation 15 3.4.3 String Split Obfuscation 15 3.4.4 String Rearrange Obfuscation 16 3.4.5 Replacing Obfuscation 16 3.4.6 String Concatenate Obfuscation 17 3.4.7 Reverse String Obfuscation 18 3.4.8 Other Obfuscation 19 Chapter 4 Evaluation 21 4.1 Dataset 21 4.2 Performance matrices 23 4.3 Performance of Different Classifiers 25 4.3.1 Support Vector Machine 25 4.3.2 Random Forest 27 4.3.3 XGBoost 29 4.3.4 Deep Neural Network 32 4.3.5 Final Classifier 34 4.4 Performance of the Overall Framework 35 4.4.1 Discussion 36 Chapter 5 Conclusion 40 5.1 Future Work 40 Bibliography 42
dc.language.iso	en
dc.subject	混淆程式碼	zh_TW
dc.subject	惡意程式	zh_TW
dc.subject	PowerShell	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	自動化	zh_TW
dc.subject	解混淆	zh_TW
dc.subject	obfuscated script	en
dc.subject	de-obfuscation	en
dc.subject	automatic	en
dc.subject	machine learning	en
dc.subject	PowerShell	en
dc.subject	malware	en
dc.title	惡意PowerShell腳本的自動解混淆方法	zh_TW
dc.title	An Automated De-Obfuscation System for Malicious PowerShell Scripts	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	郭斯彥,顏嗣鈞,王銘宏,紀博文
dc.subject.keyword	惡意程式,解混淆,自動化,機器學習,PowerShell,混淆程式碼,	zh_TW
dc.subject.keyword	malware,de-obfuscation,automatic,machine learning,PowerShell,obfuscated script,	en
dc.relation.page	43
dc.identifier.doi	10.6342/NTU201902702
dc.rights.note	有償授權
dc.date.accepted	2019-08-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	1.9 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。