基於遮罩式圖自編碼器之進階持續性威脅偵測與攻擊情境重建

李承駿; Cheng-Chun Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97013

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	謝宏昀	zh_TW
dc.contributor.advisor	Hung-Yun Hsieh	en
dc.contributor.author	李承駿	zh_TW
dc.contributor.author	Cheng-Chun Lee	en
dc.date.accessioned	2025-02-25T16:29:02Z	-
dc.date.available	2026-02-17	-
dc.date.copyright	2025-02-25	-
dc.date.issued	2025	-
dc.date.submitted	2025-02-13	-
dc.identifier.citation	[1] Z. Jia, Y. Xiong, Y. Nan, Y. Zhang, J. Zhao, and M. Wen, “MAGIC: Detecting advanced persistent threats via masked graph representation learning,” in 33rd USENIX Security Symposium (USENIX Security 24), 2024, pp. 5197–5214. [2] C.-T. Fan, “A study on improving detection of advanced persistent threat,” Master’s thesis, National Taiwan University, February 2023. [3] T.-H. Tseng, “Detection of advanced persistent threat and reconstruction of its attack scenario using graph convolutional recurrent networks,” Master’s thesis, National Taiwan University, October 2021. [4] C. Wright, C. Cowan, J. Morris, S. Smalley, and G. Kroah-Hartman, “Linux security module framework,” in Ottawa Linux Symposium, vol. 8032, 2002, pp. 6–16. [5] A. Alsaheel, Y. Nan, S. Ma, L. Yu, G. Walkup, Z. B. Celik, X. Zhang, and D. Xu, “ATLAS: A sequence-based learning approach for attack investigation,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 3005–3022. [6] J. Khoury, T. Upthegrove, A. Caro, B. Benyo, and D. Kong, “An event-based data model for granular information flow tracking,” in 12th International Workshop on Theory and Practice of Provenance (TaPP 2020), 2020. [7] B. Binde, R. McRee, and T. J. O’Connor, “Assessing outbound traffic to uncover advanced persistent threat,” SANS Institute. Whitepaper, vol. 16, 2011. [8] Mandiant, “APT1: Exposing one of china’s cyber espionage units,” https://www.mandiant.com/sites/default/files/2021-09/mandiant-apt1-report.pdf, accessed: 2024-08-29. [9] M. N. Hossain, S. M. Milajerdi, J. Wang, B. Eshete, R. Gjomemo, R. Sekar, S. Stoller, and V. Venkatakrishnan, “SLEUTH: Real-time attack scenario reconstruction from COTS audit data,” in 26th USENIX Security Symposium (USENIX Security 17), 2017. [10] M. N. Hossain, S. Sheikhi, and R. Sekar, “Combating dependence explosion in forensic analysis using alternative tag propagation semantics,” in 2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 1139–1155. [11] W. U. Hassan, S. Guo, D. Li, Z. Chen, K. Jee, Z. Li, and A. Bates, “Nodoze: Combatting threat alert fatigue with automated provenance triage,” in network and distributed systems security symposium, 2019. [12] Mandiant, “Targeted attack lifecycle: Common cyber attack lifecycles.” Online Available at: https://www.mandiant.com/resources/insights/targeted-attack-lifecycle [13] E. M. Hutchins, M. J. Cloppert, R. M. Amin, et al., “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,” Leading Issues in Information Warfare & Security Research, vol. 1, no. 1, p. 80, 2011. [14] S. M. Milajerdi, R. Gjomemo, B. Eshete, R. Sekar, and V. Venkatakrishnan, “Holmes: Real-time apt detection through correlation of suspicious information flows,” in 2019 IEEE Symposium on Security and Privacy (SP), 2019, pp. 1137–1152. [15] kbandla, “Apt notes,” https://github.com/kbandla/APTnotes, accessed: 2024-11-21. [16] D. J. Pohly, S. McLaughlin, P. McDaniel, and K. Butler, “Hi-fi: collecting high-fidelity whole-system provenance,” in Proceedings of the 28th Annual Computer Security Applications Conference, 2012, p. 259–268. [17] A. Gehani and D. Tariq, “Spade: Support for provenance auditing in distributed environments,” in ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Springer, 2012, pp. 101–120. [18] A. Bates, D. J. Tian, K. R. Butler, and T. Moyer, “Trustworthy Whole-System provenance for the linux kernel,” in 24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 319–334. [19] T. Pasquier, X. Han, M. Goldstein, T. Moyer, D. Eyers, M. Seltzer, and J. Bacon, “Practical whole-system provenance capture,” in Proceedings of the 2017 Symposium on Cloud Computing, New York, NY, USA, 2017, p. 405–418. [20] “Snort - network intrusion detection prevention system,” https://www.snort.org/, accessed: 2024-08-29. [21] “Suricata,” https://suricata.io/, accessed: 2024-08-29. [22] R.-H. Hwang, M.-C. Peng, C.-W. Huang, P.-C. Lin, and V.-L. Nguyen, “An unsupervised deep learning model for early network traffic anomaly detection,” IEEE Access, vol. 8, pp. 30 387–30 399, 2020. [23] Y. Zhong, W. Chen, Z. Wang, Y. Chen, K. Wang, Y. Li, X. Yin, X. Shi, J. Yang, and K. Li, “HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning,” Computer Networks, vol. 169, p. 107049, 2020. [24] J. Zengy, X. Wang, J. Liu, Y. Chen, Z. Liang, T.-S. Chua, and Z. L. Chua, “Shadewatcher: Recommendation-guided cyber threat analysis using system audit records,” in 2022 IEEE Symposium on Security and Privacy (SP), 2022, pp. 489–506. [25] W. U. Hassan, D. Li, K. Jee, X. Yu, K. Zou, D. Wang, Z. Chen, Z. Li, J. Rhee, J. Gui, and A. Bates, “This is why we can’t cache nice things: Lightning-fast threat hunting using suspicion-based hierarchical storage,” in Proceedings of the 36th Annual Computer Security Applications Conference, 2020, p. 165–178. [26] Y. Liu, M. Zhang, D. Li, K. Jee, Z. Li, Z. Wu, J. Rhee, and P. Mittal, “Towards a timely causality analysis for enterprise security.” in NDSS, 2018. [27] W. U. Hassan, A. Bates, and D. Marino, “Tactical provenance analysis for endpoint detection and response systems,” in 2020 IEEE Symposium on Security and Privacy (SP), 2020, pp. 1172–1189. [28] S. M. Milajerdi, B. Eshete, R. Gjomemo, and V. Venkatakrishnan, “Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’19. ACM, Nov. 2019. [29] T. M. Corporation, “Mitre attck® matrix for enterprise,” https://attack.mitre.org/matrices/enterprise/, accessed: 2024-11-23. [30] X. Han, T. Pasquier, A. Bates, J. Mickens, and M. Seltzer, “Unicorn: Runtime provenance-based detector for advanced persistent threats,” in Proceedings of the 2020 Network and Distributed System Security Symposium, 2020. [31] S. Afnan, M. Sadia, S. Iqbal, and A. Iqbal, “Logshield: A transformer-based apt detection system leveraging self-attention,” arXiv preprint arXiv:2311.05733, 2023. [32] S. Media, “Siem rules ignore bulk of mitre attck framework,” https://www.scworld.com/news/siem-rules-ignore-bulk-of-mitre-attck-framework-placing-risk-burden-on-users, accessed: 2024-11-24. [33] S. Wang, Z. Wang, T. Zhou, H. Sun, X. Yin, D. Han, H. Zhang, X. Shi, and J. Yang, “Threatrace: Detecting and tracing host-based threats in node level through provenance graph learning,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 3972–3987, 2022. [34] “Transparent computing engagement 3 data release,” accessed: 2024-11-23. Online Available at: https://github.com/darpa-i2o/Transparent-Computing/blob/master/README-E3.md [35] Apache avro™ - a data serialization system. https://avro.apache.org/. [36] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2019. Online Available at: https://arxiv.org/abs/1810.04805 [37] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009. [38] A. Hagberg, P. Swart, and D. S Chult, “Exploring network structure, dynamics, and function using networkx,” Los Alamos National Lab.(LANL), Los Alamos, NM (United States), Tech. Rep., 2008. [39] Pytorch. https://pytorch.org/. [40] Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, and J. Tang, “Graphmae: Self-supervised masked graph autoencoders,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 594–604. [41] M. Y. Wang, “Deep graph library: Towards efficient and scalable deep learning on graphs,” in ICLR workshop on representation learning on graphs and manifolds, 2019. [42] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018. [43] T.-Y. Ross and G. Dollár, “Focal loss for dense object detection,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2980–2988. [44] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, p. 321–357, June 2002. [45] D. Misra, “Mish: A self regularized non-monotonic activation function,” arXiv preprint arXiv:1908.08681, 2019.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97013	-
dc.description.abstract	進階持續性威脅 (APT) 是長期執行的網路攻擊，其過程隱密且不易被察覺。由於攻擊者的策略雖然相似但手段不同，常見的入侵偵測系統在偵測攻擊時容易被規避，亦無法關聯長期的潛伏行為；異常偵測系統則無法有效辨別已知攻擊，亦需時常調整偵測策略以適應系統正常行為的變化避免模型漂移，使得系統維護成本提升。為了解決上述問題，我們在本論文中提出基於遮罩式圖自編碼器的攻擊偵測系統，用於從系統審計日誌生成的溯源圖中檢測 APT 和重建攻擊情境。遮罩式圖自編碼器從溯源圖中代表系統實體的節點提取多跳行為資訊並作為其特徵，透過重建圖的遮蔽部分實現自監督學習並提高處理大量節點的效率。本系統抽取隱含在圖中的攻擊模式並將其抽象化為節點特徵，解決了過去無法偵測攻擊變體的問題，並作為攻擊偵測模組的基礎。我們的攻擊偵測模組使用監督式學習搭配重抽樣攻擊節點數據以增強模型偵測穩定性，可以有效辨別節點特徵中的惡意行為模式，以從正常系統活動中區分出攻擊。最後，我們提出最大二子圖和兩跳重建策略以重建攻擊情境，重建出的情境圖可提供不同警報間的關聯性，並進一步消除潛在的誤報使攻擊情境圖更加精簡。實驗結果顯示本系統的 AUC 分數約為 0.9，比基於規則的檢測器高18％。攻擊情境重建模組涵蓋了73%的攻擊活動，顯示系統能夠捕獲大多數的異常系統實體，並最大限度地減少冗餘資訊。	zh_TW
dc.description.abstract	Advanced Persistent Threats (APTs) are cyberattacks executed over a long period, and the process can be subtle and not easily detectable. Traditional Intrusion Detection Systems (IDSs) are not always effective in detecting APTs because these attack measures vary although they share similar strategies. To address these problems, we propose a system utilizing Masked Graph Autoencoders (MGAE) with a Multilayer Perceptron (MLP) attack detector for APT detection and attack scenario reconstruction from system audit logs. The MGAE extracts multi-hop behavioral information from nodes representing system entities in the provenance graph and produces node representations. By reconstructing the masked part of the graph, the MGAE realizes self-supervision and reduces computation overhead. The learned node representations address the problem of traditional IDSs not detecting the variants of attacks by abstracting the attack patterns into node representations, and serve as the foundation for our attack detection module. Conceptually aligned with misuse detection, our MLP attack detector was trained to learn malicious behavioral patterns encoded in the node representations to differentiate between benign and malicious activities. Lastly, we proposed a largest-two-subgraphs and two-hop reconstruction strategy to recover the attack scenario, removing potential false positives to keep the results more concise. Experimental evaluation demonstrates that the detector achieves an AUC score of approximately 0.9, which is competitive with other learning-based detectors and is 18% higher than a policy-based detector. The reconstruction module achieves 73% coverage of attack campaigns, comparable to previous works and signifying its ability to capture the majority of compromised entities while minimizing redundant information.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-25T16:29:02Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-02-25T16:29:02Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	ABSTRACT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii LIST OF TABLES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii CHAPTER 1 INTRODUCTION. . . . . . . . . . . . . . . . . . . . 1 CHAPTER 2 BACKGROUND AND RELATED WORK. . . . . 5 2.1 APT Life Cycle and Threat Model . . . . . . . . . . . . . . . . . . 5 2.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 System Audit Log and Data Provenance for APT Analysis . . . . 7 2.2.1 Linux Security Module . . . . . . . . . . . . . . . . . . . . 7 2.2.2 Whole-System Provenance . . . . . . . . . . . . . . . . . . 8 2.2.3 Graph Example for Data Provenance . . . . . . . . . . . . 9 2.3 APT detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Signature-Based Detection . . . . . . . . . . . . . . . . . . 10 2.3.2 Behavior-Based Detection . . . . . . . . . . . . . . . . . . . 10 2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 APT Detection Techniques . . . . . . . . . . . . . . . . . . 11 2.4.2 Attack Scenario Reconstruction . . . . . . . . . . . . . . . 14 2.4.3 SLEUTH, Morse, and Tseng . . . . . . . . . . . . . . . . . 14 2.4.4 ATLAS and Fan . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.5 MAGIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 CHAPTER 3 SYSTEM DESIGN AND IMPLEMENTATION. . 21 3.1 Graph Constructor . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.1 Graph Parser . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 Graph Initial Embedding . . . . . . . . . . . . . . . . . . . 25 3.1.3 Graph Reduction . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Graph Representation Learning . . . . . . . . . . . . . . . . . . . 29 3.2.1 Masked Graph Autoencoders . . . . . . . . . . . . . . . . . 30 3.2.2 Graph Encoder and Decoder . . . . . . . . . . . . . . . . . 30 3.2.3 Feature Masking . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.4 Decoding Graph with Remasking and Structure Reconstruction . . . . 33 3.2.5 MGAE Model Training and Inferencing . . . . . . . . . . . 34 3.3 Attack Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 MLP Model Training and Inferencing . . . . . . . . . . . . 37 3.4 Attack Reconstructor . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Scenario Recovery . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Graph Constructor . . . . . . . . . . . . . . . . . . . . . . 40 3.5.2 Graph Representation Learning . . . . . . . . . . . . . . . 42 3.5.3 Attack Detector . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5.4 Attack Reconstructor . . . . . . . . . . . . . . . . . . . . . 43 CHAPTER 4 MASKED GRAPH AUTOENCODERS FOR GRAPH REPRESENTATION LEARNING. . . . . . . . . . . . . . . . . . 44 4.1 Masked Graph Autoencoders . . . . . . . . . . . . . . . . . . . . . 44 4.2 Graph Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.1 Feature Masking . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 Encoding Phase . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3 Graph Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Remasking Node Embeddings . . . . . . . . . . . . . . . . 51 4.3.2 Decoding Phase . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 CHAPTER 5 ATTACK DETECTOR. . . . . . . . . . . . . . . . . 56 5.1 Data Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.1.1 Undersampling . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.1.2 Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.1.3 Intra-Scenario and Inter-Scenario Oversampling . . . . . . 59 5.2 MLP Attack Detector Model . . . . . . . . . . . . . . . . . . . . . 60 5.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.1 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.3.2 Tuning Decision Threshold . . . . . . . . . . . . . . . . . . 65 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 CHAPTER 6 PERFORMANCE EVALUATION. . . . . . . . . . 66 6.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1.1 DARPA TC Program . . . . . . . . . . . . . . . . . . . . . 66 6.1.2 Cadets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.1 Recall, Precision, and AUC score . . . . . . . . . . . . . . . 70 6.2.2 Reduction Rate and Loss Rate . . . . . . . . . . . . . . . . 71 6.2.3 Coverage and Redundancy . . . . . . . . . . . . . . . . . . 72 6.3 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4 APT Detection Results . . . . . . . . . . . . . . . . . . . . . . . . 73 6.4.1 Encoder Depth and Mask Rate of MGAE . . . . . . . . . . 74 6.4.2 Intra-Scenario Oversampling . . . . . . . . . . . . . . . . . 76 6.4.3 Scaling λpos in Attack Detector . . . . . . . . . . . . . . . . 77 6.4.4 Compare with MAGIC . . . . . . . . . . . . . . . . . . . . 78 6.5 APT Scenario Reconstruction Results . . . . . . . . . . . . . . . . 79 6.5.1 Reconstructed Scenario Graph . . . . . . . . . . . . . . . . 79 6.5.2 Effectiveness of the Reconstruction Strategy . . . . . . . . 82 6.5.3 Compare with Related Work . . . . . . . . . . . . . . . . . 82 6.6 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.6.1 Execution Time . . . . . . . . . . . . . . . . . . . . . . . . 86 6.6.2 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 CHAPTER 7 CONCLUSION AND FUTURE WORK. . . . . . 90 REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93	-
dc.language.iso	en	-
dc.subject	進階持續性威脅	zh_TW
dc.subject	攻擊情境重建	zh_TW
dc.subject	遮罩式圖自編碼器	zh_TW
dc.subject	attack scenario reconstruction	en
dc.subject	masked graph autoencoders	en
dc.subject	advanced persistent threats	en
dc.title	基於遮罩式圖自編碼器之進階持續性威脅偵測與攻擊情境重建	zh_TW
dc.title	Detection of Advanced Persistent Threats and Reconstruction of Attack Scenarios using Masked Graph Autoencoders	en
dc.type	Thesis	-
dc.date.schoolyear	113-1	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	高榮鴻;沈上翔;葉佳宜	zh_TW
dc.contributor.oralexamcommittee	Rung-Hung Gau;Shan-Hsiang Shen;Chia-Yi Yeh	en
dc.subject.keyword	遮罩式圖自編碼器,進階持續性威脅,攻擊情境重建,	zh_TW
dc.subject.keyword	masked graph autoencoders,advanced persistent threats,attack scenario reconstruction,	en
dc.relation.page	96	-
dc.identifier.doi	10.6342/NTU202500635	-
dc.rights.note	未授權	-
dc.date.accepted	2025-02-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	N/A	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-1.pdf 未授權公開取用	6.38 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。