基於深度學習之惡意流量偵測

Wei-Chieh Tseng; 曾煒傑

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21479

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	林宗男
dc.contributor.author	Wei-Chieh Tseng	en
dc.contributor.author	曾煒傑	zh_TW
dc.date.accessioned	2021-06-08T03:35:17Z	-
dc.date.copyright	2019-08-05
dc.date.issued	2019
dc.date.submitted	2019-07-31
dc.identifier.citation	[1] Adguard. Mining Report. https://adguard.com/en/blog/crypto-mining-fever.html, 2017. [Online; accessed 07-May-2019]. [2] F. V. Alejandre, N. C. Cortes, and E. A. Anaya. Feature selection to detect botnets using machine learning algorithms. In 2017 International Conference on Electronics, Communications and Computers (CONIELECOMP), pages 1–7. IEEE, 2017. [3] Alexa. Alexa Top Sites. https://www.alexa.com/topsites, 2019. [Online; accessed 07-May-2019]. [4] B. Anderson and D. McGrew. Identifying encrypted malware traffic with contextual flow data. In Proceedings of the 2016 ACM workshop on artificial intelligence and security, pages 35–46. ACM, 2016. [5] B. Anderson, S. Paul, and D. McGrew. Deciphering malware’s use of tls (without decryption). Journal of Computer Virology and Hacking Techniques, pages 1–17, 2016. [6] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. [7] D. Balzarotti, M. Cova, C. Karlberger, E. Kirda, C. Kruegel, and G. Vigna. Efficient detection of split personalities in malware. In NDSS. Citeseer, 2010. [8] E. B. Beigi, H. H. Jazi, N. Stakhanova, and A. A. Ghorbani. Towards effective feature selection in machine learning-based botnet detection approaches. In 2014 IEEE Conference on Communications and Network Security, pages 247–255. IEEE, 2014. [9] R. Bortolameotti, T. van Ede, M. Caselli, M. H. Everts, P. Hartel, R. Hofstede, W. Jonker, and A. Peter. Decanter: Detection of anomalous outbound http traffic by passive application fingerprinting. In Proceedings of the 33rd Annual Computer Security Applications Conference, pages 373–386. ACM, 2017. [10] L. Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996. [11] Bro. Bro. https://www.zeek.org/, 2019. [Online; accessed 07-May-2019]. [12] CAPE. CAPE sandbox. https://github.com/ctxis/CAPE, 2019. [Online; accessed 07-May-2019]. [13] C.-K. Chiu, H.-H. Chang, C.-H. Mao, and T.-E. Wei. Counterfeit fingerprint detection of outbound http traffic with graph edit distance. In 2018 IEEE Conference on Dependable and Secure Computing (DSC), pages 1–2. IEEE, 2018. [14] Cisco. Joy. https://github.com/cisco/joy, 2019. [Online; accessed 07-May-2019]. [15] R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167. ACM, 2008. [16] Cuckoo. Cuckoo sandbox. https://cuckoosandbox.org/, 2019. [Online; accessed 07-May-2019]. [17] Cyren. HTTPS Report. https://www.cyren.com/blog/articles/over-one-third-of-malware-uses-https, 2019. [Online; accessed 07-May-2019]. [18] A. Dainotti, A. Pescape, and K. C. Claffy. Issues and future directions in traffic classification. IEEE network, 26(1):35–40, 2012. [19] L. Deri, M. Martinelli, T. Bujlow, and A. Cardigliano. ndpi: Open-source highspeed deep packet inspection. In 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), pages 617–622. IEEE, 2014. [20] T. G. Dietterich. Machine-learning research. AI magazine, 18(4):97–97, 1997. [21] distilnetworks. Bad bot report. https://resources.distilnetworks.com/white-paper-reports/bad-bot-report-2019, 2019. [Online; accessed 07-May-2019]. [22] G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd international conference on information systems security and privacy (ICISSP), pages 407–414, 2016. [23] M. Finsterbusch, C. Richter, E. Rocha, J.-A. Muller, and K. Hanssgen. A survey of payload-based traffic classification approaches. IEEE Communications Surveys & Tutorials, 16(2):1135–1156, 2013. [24] Y. Freund, R. E. Schapire, et al. Experiments with a new boosting algorithm. In icml, volume 96, pages 148–156. Citeseer, 1996. [25] S. Garcia, M. Grill, J. Stiborek, and A. Zunino. An empirical comparison of botnet detection methods. computers & security, 45:100–123, 2014. [26] Google. Transparency Report. https://transparencyreport.google.com/https/overview?hl=en/, 2019. [Online; accessed 07-May-2019]. [27] G. Gu, P. A. Porras, V. Yegneswaran, M. W. Fong, and W. Lee. Bothunter: Detecting malware infection through ids-driven dialog correlation. In USENIX Security Symposium, volume 7, pages 1–16, 2007. [28] Helpnet. DoS Report. https://www.helpnetsecurity.com/2019/02/11/ddos-attack-volumes-grew-by-194-in-12-months/, 2019.[Online; accessed 07-May-2019]. [29] T. Holz, M. Engelberth, and F. Freiling. Learning more about the underground economy: A case-study of keyloggers and dropzones. In European Symposium on Research in Computer Security, pages 1–18. Springer, 2009. [30] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence, (11):1254–1259, 1998. [31] H. H. Jazi, H. Gonzalez, N. Stakhanova, and A. A. Ghorbani. Detecting http-based application layer dos attacks on web servers in the presence of sampling. Computer Networks, 121:25–36, 2017. [32] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Largescale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725–1732, 2014. [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [34] A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani. Characterization of tor traffic using time based features. In ICISSP, pages 253–262, 2017. [35] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. [36] M. Lin, Q. Chen, and S. Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013. [37] S.-T. Liu, Y.-M. Chen, and S.-J. Lin. A novel search engine to uncover potential victims for apt investigations. In IFIP International Conference on Network and Parallel Computing, pages 405–416. Springer, 2013. [38] L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(Nov):2579–2605, 2008. [39] malwaretraffic. Malware Traffic Analysis Net. http://www.malware-traffic-analysis.net/, 2019. [Online; accessed 07-May2019]. [40] A. W. Moore and K. Papagiannaki. Toward the accurate identification of network applications. In International Workshop on Passive and Active Network Measurement, pages 41–54. Springer, 2005. [41] D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of artificial intelligence research, 11:169–198, 1999. [42] PIL. PIL libary. https://pillow.readthedocs.io/en/stable/, 2019. [Online; accessed 07-May-2019]. [43] R. Polikar. Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3):21–45, 2006. [44] L. Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2):1–39, 2010. [45] R. E. Schapire. The strength of weak learnability. Machine learning, 5(2):197–227, 1990. [46] K. Singh, P. Singh, and K. Kumar. User behavior analytics-based classification of application layer http-get flood attacks. Journal of Network and Computer Applications, 112:97–114, 2018. [47] Snort. Snort Rule. https://www.snort.org/, 2019. [Online; accessed 07- May-2019]. [48] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. [49] F. Stˇrasak. Detection of HTTPS MalwareTraffic. ´ https://dspace.cvut.cz/bitstream/handle/10467/68528/F3-BP-2017-Strasak-Frantisek-strasak_thesis_2017.pdf, 2019. [Online; accessed 07-May-2019]. [50] Suricata. Suricata Rule. https://suricata-ids.org/, 2019. [Online; accessed 07-May-2019]. [51] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. [52] G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler. Convolutional learning of spatiotemporal features. In European conference on computer vision, pages 140–153. Springer, 2010. [53] Verizon. Data breach Report. https://enterprise.verizon.com/resources/reports/DBIR_2018_Report.pdf, 2018. [Online; accessed 07-May-2019]. [54] J. Wang and I. C. Paschalidis. Botnet detection based on anomaly and community detection. IEEE Transactions on Control of Network Systems, 4(2):392–404, 2016. [55] T. Windeatt and G. Ardeshir. Decision tree simplification for classifier ensembles. International Journal of Pattern Recognition and Artificial Intelligence, 18(05):749–776, 2004. [56] D. H. Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992. [57] H. Yakura, S. Shinozaki, R. Nishimura, Y. Oyama, and J. Sakuma. Malware analysis of imaged binary samples by convolutional neural network with attention mechanism. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pages 127–134. ACM, 2018.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21479	-
dc.description.abstract	互聯網已成為大規模全球通訊的關鍵推動因素，每天提供穩定的網路服務非常重要，隨著互聯網的使用不斷增長，有效管理融合它的底層網路至關重要。網路流量分類在此管理中是很重要的課題，包含提供服務質量（QoS）與預測未來趨勢以及檢測潛在的安全威脅。出於這些原因，準確的網路流量分類對於網路服務提供商（ISP），大型企業公司和政府機構而言非常重要。近年來，由於加密網路流量的增加趨勢，無論是出於安全性還是隱藏惡意的目的，當前的網路流量分類方法已經變得不那麼有效。因此，在當今的網路中，需要更有效的分類演算法來處理這個問題。越來越多的人建議使用機器學習來對加密的網絡流量進行分類，雖然有許多技術可用於應用機器學習來實現網路流量分類，但大多數工作都嚴重依賴於手工選取的特徵，或者只能處理離線流量分類。為了擺脫上述弱點，在本篇論文中，我提出了一個基於卷積神經網絡（CNNs）與集成學習(Ensemble Learning)的架構，Packet2Img。此架構將網路流量轉換為圖片，可以完全獲取不同應用程式或惡意攻擊的靜態和動態行為，因此可以避免手工取特徵可能導致重要訊息遺失的現象。在本篇論文中有使用的資料集包含ISCX VPN-nonVPN資料集，CTU-13資料集和CAPE沙箱所收集的惡意程式網路流量。在所有實驗中，實驗結果證明Packet2Img此方法在100*100的最佳圖像尺寸和調整過的實驗架構中能夠滿足實際應用的精確度要求，也有很高的可擴展性。從實驗結果來看，該方法的分類精準度比使用手工選取特徵的傳統方法高出約10％。	zh_TW
dc.description.abstract	The Internet has become a key enabler of large-scale global communications, and it is important to provide an immeasurable number of services every day. As the use of the Internet continues to grow, it is critical to effectively manage the underlying network that converges it. Network traffic classification plays a vital role in this management, providing quality of service(QoS), predicting future trends, and detecting potential security threats. For these reasons, accurate network traffic classification is important for Internet Service Providers (ISPs), large enterprise companies, and government agencies. Current network traffic classification methods have become less effective in recent years due to the increasing trend of encrypted network traffic, whether for security, priority or malicious purposes. Therefore, in today's networks, more efficient classification algorithms are needed to handle these conditions. More and more people are proposing to use machine learning to classify encrypted network traffic. While there are many techniques for applying machine learning to implement IP traffic classification, most works are heavily dependent on handcrafted features or can only handle offline traffic classification. In order to get rid of the above weaknesses, in this thesis, we present a convolutional neural networks (CNNs) with ensemble on traffic classification framework named Packet2Img. This framework converts network flows into images, fully capturing the static and dynamic behavior of different applications or malicious attack, and avoiding the use of handcrafted features that can lead to information loss. The method is validated with dataset which contains ISCX VPN-nonVPN dataset, CTU-13 dataset and malicious flows collected by CAPE sandbox. Among all of the experiments, with the best image size chosen and the fine-tuned model, the experiment results show that the method can satisfy the accuracy requirement of practical application and has high scalability. From the experimental results, the classification accuracy of this method is about 10 percent higher than the traditional method of using handcrafted features.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:35:17Z (GMT). No. of bitstreams: 1 ntu-108-R06942062-1.pdf: 6090776 bytes, checksum: e4bad513ad22faaeb5cd6b9afe4b4c21 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	口試委員會審定書 i 致謝 iii 中文摘要 v Abstract vii 1 Introduction 1 2 Related Works 5 2.1 Joy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Packet Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Sequence of Packet Lengths and Times (SPLT) . . . . . . . . . . 6 2.1.3 Byte Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.4 TLS Information . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Cicflowmeter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Bro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Deep Learning Method on Malware Detection . . . . . . . . . . . . . . . 11 3 Methodology 13 3.1 Malware’s Network Behaviors . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.3 Fully Connected layer . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Ensemble learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.2 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.3 Stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Packet2Img . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4.1 Session/Flow Generation . . . . . . . . . . . . . . . . . . . . . . 23 3.4.2 Background Traffic Remove . . . . . . . . . . . . . . . . . . . . 25 3.4.3 Image Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4.4 10 - class Ensemble Classifier . . . . . . . . . . . . . . . . . . . 26 4 Dataset Introduction 31 4.1 Botnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Data Exfiltration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 DoS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Exploit Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5 Generic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 Malspam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.7 Miner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.8 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.9 Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.10 Trojan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5 Experimental results 57 5.1 Botnet Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.1.1 CTU Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.1.2 UNB Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 Data Exifiltration Detection . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3 DoS Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.4 Malicious Benign Binary Classification . . . . . . . . . . . . . . . . . . 60 5.5 10 class classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.6 Visualization of features in CNN using t-SNE and PCA . . . . . . . . . . 66 5.7 Saliency Map Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.8 Zero-shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6 Conclusion 79 Bibliography 81
dc.language.iso	en
dc.title	基於深度學習之惡意流量偵測	zh_TW
dc.title	Deep Learning for Malicious Flow Detection	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	鄧惟中,陳俊良,蔡子傑
dc.subject.keyword	深度學習,惡意流量偵測,網路流量分類,卷積神經網絡,集成學習,惡意程式,	zh_TW
dc.subject.keyword	Deep Learning,Malicious Flow Detection,IP Traffic Classification,Convolutional Neural Networks,Ensemble Learning,Malware,	en
dc.relation.page	85
dc.identifier.doi	10.6342/NTU201902120
dc.rights.note	未授權
dc.date.accepted	2019-07-31
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	5.95 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。