進階持續性威脅之偵測方法改進

范建達; Chien-Ta Fan

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87236

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝宏昀	zh_TW
dc.contributor.advisor	Hung-Yun Hsieh	en
dc.contributor.author	范建達	zh_TW
dc.contributor.author	Chien-Ta Fan	en
dc.date.accessioned	2023-05-18T16:32:33Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-05-11	-
dc.date.issued	2023	-
dc.date.submitted	2023-02-14	-
dc.identifier.citation	[1] T.-H. TSENG, “Detection of advanced persistent threat and reconstruction of its attack scenario using graph convolutional recurrent networks,” Master’s thesis, 2021. [2] D. J. Pohly, S. McLaughlin, P. McDaniel, and K. Butler, “Hi-fi: Collecting high-fidelity whole-system provenance,” in Proceedings of the 28th Annual Computer Security Applications Conference, ser. ACSAC ’12. New York, NY, USA: Association for Computing Machinery, 2012, p. 259–268. Online Available at: https://doi.org/10.1145/2420950.2420989 [3] B. Binde, R. McRee, and T. J. O’Connor, “Assessing outbound traffic to uncover advanced persistent threat,” SANS Institute. Whitepaper, vol. 16, 2011. [4] “Mandiant: Exposing one of china’s cyber espionage units,” https://www.mandiant.com/resources/ apt1-exposing-one-of-chinas-cyber-espionage-units, 2016-11-10. [5] “Chimera apt threat report,” https://cycraft.com/download/CyCraft-Whitepaper-Chimera V4.2.pdf, 2022.[6] M. N. Hossain, S. M. Milajerdi, J. Wang, B. Eshete, R. Gjomemo, R. Sekar, S. Stoller, and V. Venkatakrishnan, “SLEUTH: Real-time attack scenario reconstruction from COTS audit data,” in 26th USENIX Security Symposium (USENIX Security 17). Vancouver, BC: USENIX Association, Aug. 2017, pp. 487–504. Online Available at: https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/hossain [7] M. N. Hossain, S. Sheikhi, and R. Sekar, “Combating dependence explosion in forensic analysis using alternative tag propagation semantics,” in 2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 1139–1155. [8] S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V. Venkatakrishnan, “Holmes: Real-time apt detection through correlation of suspicious informa-tion flows,” in 2019 IEEE Symposium on Security and Privacy (SP), 2019, pp. 1137–1152. [9] “Mitre att&ck® matrix for enterprise,” https://attack.mitre.org/matrices/enterprise/, 2022. [10] X. Han, T. F. J. Pasquier, A. Bates, J. Mickens, and M. I. Seltzer, “UNICORN: runtime provenance-based detector for advanced persistent threats,” CoRR, vol. abs/2001.01525, 2020. Online Available at: http://arxiv.org/abs/2001.01525 [11] A. Alsaheel, Y. Nan, S. Ma, L. Yu, G. Walkup, Z. B. Celik, X. Zhang, and D. Xu, “ATLAS: A sequence-based learning approach for attack investigation,” in 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, Aug. 2021, pp. 3005–3022. Online Available at: https://www.usenix.org/conference/usenixsecurity21/presentation/alsaheel [12] “Apt notes,” https://github.com/kbandla/APTnotes, 2022. [13] M. Li, W. Huang, Y. Wang, W. Fan, and J. Li, “The study of apt attack stage model,” in 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), 2016, pp. 1–5. [14] “Hydra irc bot, the 25 minute overview of the kit.” http://insecurety.net/hydra-irc-bot-the-25-minute-overview-of-the-kit/, 2022. [15] “Simile (computer virus),” https://en.wikipedia.org/wiki/Simile (computer virus), 2022. [16] “Devnull,” https://en.wikipedia.org/wiki/Devnull, 2022. [17] “Snort,” https://www.snort.org/, 2022. [18] “Suricata,” https://suricata.io/, 2022. [19] C.-H. Wu, “Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks,” Expert systems with Applications, vol. 36, no. 3, pp. 4321–4330, 2009. [20] D. S. Terzi, R. Terzi, and S. Sagiroglu, “Big data analytics for network anomaly detection from netflow data,” in 2017 International Conference on Computer Science and Engineering (UBMK). IEEE, 2017, pp. 592–597. [21] L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data mining and knowledge discovery, vol. 29, no. 3, pp. 626–688, 2015. [22] A. Bates, D. J. Tian, K. R. Butler, and T. Moyer, “Trustworthy {Whole-System} provenance for the linux kernel,” in 24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 319–334. [23] T. Pasquier, X. Han, M. Goldstein, T. Moyer, D. Eyers, M. Seltzer, and J. Bacon, “Practical whole-system provenance capture,” in Proceedings of the 2017 Symposium on Cloud Computing, 2017, pp. 405–418. [24] C. Wright, C. Cowan, J. Morris, S. Smalley, and G. Kroah-Hartman, “Linux security module framework,” in Ottawa Linux Symposium, vol. 8032. Citeseer, 2002, pp. 6–16. [25] “Linux security module usage,” https://www.kernel.org/doc/html/latest/admin-guide/LSM/index.html, 2022. [26] S. M. Milajerdi, B. Eshete, R. Gjomemo, and V. Venkatakrishnan, “Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 1795–1812. Online Available at: https://doi.org/10.1145/3319535.3363217 [27] Y. Liu, M. Zhang, D. Li, K. Jee, Z. Li, Z. Wu, J. Rhee, and P. Mittal, “Towards a timely causality analysis for enterprise security.” in NDSS, 2018. [28] W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage,” in Network and Distributed Systems Security Symposium, 2019. [29] “Splunk,” https://www.splunk.com/, 2022. [30] K. Pei, Z. Gu, B. Saltaformaggio, S. Ma, F. Wang, Z. Zhang, L. Si, X. Zhang, and D. Xu, “Hercule: Attack story reconstruction via community discovery on correlated log graph,” in Proceedings of the 32nd Annual Conference on Computer Security Applications, ser. ACSAC ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 583–595. Online Available at: https://doi.org/10.1145/2991079.2991122 [31] “Transparent computing engagement 3 data release,” https://github.com/darpa-i2o/Transparent-Computing/blob/master/README-E3.md, 2020. [32] “The wiki for apache avro,” https://en.wikipedia.org/wiki/Apache Avro, 2021. [33] T. Pasquier, X. Han, M. Goldstein, T. Moyer, D. Eyers, M. Seltzer, and J. Bacon, “Practical whole-system provenance capture,” in Proceedings of the 2017 Symposium on Cloud Computing, ser. SoCC ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 405–418. Online Available at: https://doi.org/10.1145/3127479.3129249 [34] “Unicorn parsers’ github repository,” https://github.com/crimson-unicorn/parsers/, 2020. [35] “Networkx documentation,” https://networkx.org/. [36] A. Yenter and A. Verma, “Deep cnn-lstm with combined kernels from multiple branches for imdb review sentiment analysis,” in 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON). IEEE, 2017, pp. 540–546. [37] P. Rodr ́ıguez, M. A. Bautista, J. Gonzalez, and S. Escalera, “Beyond one-hot encoding: Lower dimensional target embedding,” Image and Vision Computing, vol. 75, pp. 21–31, 2018. [38] F. Chollet et al. (2015) Keras. Online Available at: https://github.com/fchollet/keras [39] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Man ́e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vi ́egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from tensorflow.org. Online Available at: https://www.tensorflow.org/ [40] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of physiology, vol. 160, no. 1, p. 106, 1962. [41] K. Fukushima and S. Miyake, “Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition,” in Competition and cooperation in neural nets. Springer, 1982, pp. 267–285. [42] S.-C. B. Lo, H.-P. Chan, J.-S. Lin, H. Li, M. T. Freedman, and S. K. Mun, “Artificial convolution neural network for medical image pattern recognition,” Neural networks, vol. 8, no. 7-8, pp. 1201–1214, 1995. [43] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [44] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,” The Journal of physiology, vol. 148, no. 3, p. 574, 1959. [45] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015. [46] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997. [47] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing. Ieee, 2013, pp. 6645–6649. [48] C. Olah, “Understanding lstm networks,” Tech. Rep. Online Available at: https://colah.github.io/posts/2015-08-Understanding-LSTMs [49] J. Plisson, N. Lavrac, D. Mladenic, et al., “A rule based approach to word lemmatization,” in Proceedings of IS, vol. 3, 2004, pp. 83–86. [50] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll ́ar, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988. [51] K. Beijering, C. Gooskens, and W. Heeringa, “Predicting intelligibility and perceived linguistic distance by means of the levenshtein algorithm,” Linguistics in the Netherlands, vol. 25, no. 1, pp. 13–24, 2008. [52] “Darpa transparent computing,” https://www.darpa.mil/program/transparent-computing, accessed: 2022-08-20. [53] “Tc ground truth report e3,” https://drive.google.com/file/d/1mrs4LWkGk-3zA7t7v8zrhm0yEDHe57QU/view?usp=sharing, accessed: 2022-08-20. [54] J. Zeng, Z. L. Chua, Y. Chen, K. Ji, Z. Liang, and J. Mao, “Watson: Abstracting behaviors from audit logs via aggregation of contextual semantics.” in NDSS, 2021. [55] L. Yu, S. Ma, Z. Zhang, G. Tao, X. Zhang, D. Xu, V. E. Urias, H. W. Lin, G. F. Ciocarlie, V. Yegneswaran, et al., “Alchemist: Fusing application and audit logs for precise attack provenance without instrumentation.” in NDSS, 2021.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/87236	-
dc.description.abstract	高級持續性威脅 (APT) 需要分析大量日誌以確定其攻擊步驟，這些步驟是在很長一段時間內執行的一組活動。然而，常見的入侵檢測系統（IDS）從可疑網絡流量產生大量可疑事件的威脅警報，以及異常檢測器缺乏足夠的APT負面數據進行訓練，導致大量誤報，同時他們無法提供關於APT攻擊的具體訊息。安全分析師必須花費大量時間調查這些大量警報以確定該事件是否是攻擊的一部分。在本文中，我們提出了一種基於序列學習的機器學習方法來檢測 APT 並從現有審計日誌構建攻擊故事。我們的觀察是 APT 攻擊可能共享相似的攻擊策略。我們提出了一種基於學習的模型，結合使用圖形分析、詞形還原和機器學習技術，從起源圖中提取攻擊和非攻擊行為的模式。我們使用採樣策略生成相等數量的攻擊和非攻擊序列來解決 APT 攻擊數據不平衡的問題，然後使用訓練有素的模型檢測促成攻擊的節點。基於這些惡意節點，我們重構場景圖來重現攻擊者的行為。從實驗結果來看，所提出的方法可以達到 0.91 的AUC分數。重構的場景圖捕獲了76%的惡意實體，可以捕獲所有入口點和惡意網絡連接，高於異常檢測器的結果。與相關文獻異常檢測器相比，我們的圖形大小可以小十倍，以允許安全分析師進一步調查攻擊。	zh_TW
dc.description.abstract	Advanced persistent threats (APT) require the analysis of numerous logs to determine their attack steps, which are a set of activities carried out over a long period of time. However, common intrusion detection systems (IDS) generate a large number of threat alerts of suspicious events on suspicious network traffic, and anomaly detectors lack sufficient APT negative data for training, resulting in a large number of false alarms, while they cannot provide specific information about APT attacks. Security analysts have to spend a lot of time investigating these high volumes of alerts to determine if the incident is part of an attack. In this thesis, we propose a sequence learning-based machine learning method to detect APT and construct an attack story from existing audit logs. Our observation is that APT attacks may share similar attack strategies. We proposed a learning-based model to extract patterns of attack and non-attack behaviors from a provenance graph using a combination of graph analysis, lemmatization, and machine learning techniques. We use a sampling strategy to generate equal numbers of attack and non-attack sequences to solve the problem of APT attack data imbalance and then detect nodes contributing to the attack with a well-trained model. Based on these malicious nodes, we reconstructed the scenario graph to reproduce the behavior of the attacker. From the experiment results, the proposed method can achieve an AUC score of 0.91. The reconstructed scenario graph captures 76\% of malicious entities, which can capture all entry points and malicious network connections and is higher than the results of anomaly detectors. Compared to related work anomaly detectors, our graph size can be ten times smaller to allow security analysts to further investigate attacks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-18T16:32:33Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-05-18T16:32:33Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	ABSTRACT ii LIST OF TABLES vi LIST OF FIGURES vii CHAPTER 1 INTRODUCTION 1 CHAPTER 2 BACKGROUND AND RELATED WORK 5 2.1 APT Model 5 2.2 APT Detection 7 2.2.1 Signature-Based Detection 7 2.2.2 Behavior-Based Detection 7 2.2.3 Graph-Based Detection 8 2.3 APT Detection With Audit Logs 9 2.3.1 Linux Security Module 10 2.3.2 Whole-System Provenance 10 2.3.3 A Provenance Graph Example 11 2.4 Related Work 12 2.4.1 Attack Detection Technique 12 2.4.2 Attack Reconstruction 13 2.4.3 Thesis of Tseng 15 2.4.4 ATLAS 18 CHAPTER 3 SYSTEM DESIGN AND IMPLEMENTATION 19 3.1 Graph Constructor 21 3.1.1 Parser 21 3.1.2 Construct Graph 22 3.1.3 Graph Reduction 26 3.1.4 Implementation 28 3.2 Sequence Extractor 30 3.2.1 Sequence Construction 30 3.2.2 Implementation 32 3.3 Deep Learning Model 34 3.3.1 Training Phase of Model 34 3.3.2 Implementation 37 3.4 Attack Constructor 39 3.4.1 Testing Phase of Model 39 3.4.2 Attack Scenario Recovery 41 CHAPTER 4 METHODOLOGY 43 4.1 Convolutional Neural Network (CNN) 43 4.2 Long Short-Term Memory (LSTM) 45 4.3 Sequence Lemmatization 50 4.4 Optimization 52 4.4.1 Loss Function 52 4.4.2 Sequence Sampling 53 4.5 Summary 55 CHAPTER 5 PERFORMANCE EVALUATION 58 5.1 Dataset 58 5.1.1 DARPA TC Program 58 5.1.2 Cadets 58 5.1.3 Cadets Dataset Summary 61 5.2 Metrics 63 5.2.1 Recall, Precision, and AUC 63 5.2.2 Coverage, Redundancy, and Reduction Ratio 64 5.3 System and Experiment Setup 65 5.4 APT Detection Experiment 65 5.4.1 Non-Sampling Sequence Training Result 66 5.4.2 Undersampling Sequence Training Result 66 5.4.3 Oversampling Sequence Training Result 67 5.4.4 Resampling Sequence Training Result 68 5.4.5 Compare with Learning-Based Detector 71 5.5 Attack Reconstruction Experiment 71 5.5.1 Scenario Graph 72 5.5.2 Compare with Related Work 76 5.6 Performance 86 5.6.1 Data Storage 86 5.6.2 Execution time 86 5.7 Summary 87 CHAPTER 6 CONCLUSION AND FUTURE WORK 89 REFERENCES 91	-
dc.language.iso	zh_TW	-
dc.subject	攻擊情境重建	zh_TW
dc.subject	進階持續性威脅	zh_TW
dc.subject	Attack Scenario Reconstruction	en
dc.subject	Advanced Persistent Threat	en
dc.subject	Long-term Recurrent Convolutional Network	en
dc.title	進階持續性威脅之偵測方法改進	zh_TW
dc.title	A Study on Improving Detection of Advanced Persistent Threat	en
dc.type	Thesis	-
dc.date.schoolyear	111-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	吳沛遠;馮教授	zh_TW
dc.contributor.oralexamcommittee	Pei-Yuan Wu;Huei-Wen Ferng	en
dc.subject.keyword	進階持續性威脅,攻擊情境重建,	zh_TW
dc.subject.keyword	Advanced Persistent Threat,Attack Scenario Reconstruction,Long-term Recurrent Convolutional Network,	en
dc.relation.page	95	-
dc.identifier.doi	10.6342/NTU202300489	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-02-15	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2028-02-14	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-1.pdf 未授權公開取用	4.52 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。