系統日誌異常檢測方法的效能評估

Chi-Shih Wang; 王啟時

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17210

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪士灝(Shih-Hao Hung)
dc.contributor.author	Chi-Shih Wang	en
dc.contributor.author	王啟時	zh_TW
dc.date.accessioned	2021-06-08T00:01:10Z	-
dc.date.copyright	2020-08-20
dc.date.issued	2020
dc.date.submitted	2020-08-12
dc.identifier.citation	[1] Amrouche, F., et al. Graph-based malicious login events investigation. in 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). 2019. IEEE. [2] Chollet, F., Deep Learning with Python. 2017: Manning Publications Co. [3] Du, M., et al. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017. [4] Fu, Q., et al. Execution anomaly detection in distributed systems through unstructured log analysis. in 2009 ninth IEEE international conference on data mining. 2009. IEEE. [5] He, P., et al. Drain: An online log parsing approach with fixed depth tree. in 2017 IEEE International Conference on Web Services (ICWS). 2017. IEEE. [6] Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation, 1997. 9(8): p. 1735-1780. [7] Kaiafas, G., et al. Detecting malicious authentication events trustfully. in NOMS 2018-2018 IEEE/IFIP Network Operations and Management Symposium. 2018. IEEE. [8] Meng, W., et al. LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization. 2019. [9] Olah, C. (2015). Understanding LSTM Networks. Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/ (August 2, 2020) [10] Siadati, H. and N. Memon. Detecting structurally anomalous logins within enterprise networks. in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017. [11] Siadati, H., B. Saket, and N. Memon. Detecting malicious logins in enterprise networks using visualization. in 2016 IEEE Symposium on Visualization for Cyber Security (VizSec). 2016. IEEE. [12] Xu, W., et al. Detecting large-scale system problems by mining console logs. in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. 2009. [13] Yifan, C. (2020) Pytorch Implementation of DeepLog. Retrieve from https://github.com/wuyifan18/DeepLog (March 2 2020) [14] Zhu, J., et al. Tools and benchmarks for automated log parsing. in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 2019. IEEE.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17210	-
dc.description.abstract	幾乎所有的系統都會有系統日誌，內容記錄了系統執行時期豐富的資訊，包含開機、關機、登入、登出以及異常事件等資訊，管理人員可透過對系統日誌的分析，診斷系統是否有出現異常的情況，此外，在資安事件方面，也可藉由系統日誌的訊息，對可疑的攻擊行為發出警告。但隨著系統變得龐大且複雜，所產生的系統日誌數量也大幅成長，因此，需要透過自動異常偵測機制來取代人工查找的方式。而根據過去的研究，透過長短期記憶 (Long Short-Term Memory, LSTM) 建立的異常偵測模型，可有效偵測出異常的行為。但由於這類型異常偵測模型是透過Top-g參數的設定，決定出預測結果的候選清單數量，設定較大的數值代表較寬鬆的條件，有助於提升精確度(precision)，但會降低召回率(recall)，而較小的數值代表較嚴格的條件，能提升召回率，但會降低精確度，使用者常需要權衡準確度及召回率，無法同時兼顧。本研究提出動態Top-g的方法，將序列依照出現在正常及異常資料集的狀況作分類，於計算候選清單時，讓Top-g參數可以依照序列資料所屬的類別作動態設定，實驗結果發現，透過動態Top-g設定參數，精確度可達到92%，召回率可達到99%。	zh_TW
dc.description.abstract	Almost all systems has system logs. It records rich information about startup, shutdown, login, logout, and error events. Administrators can analyze the system logs to diagnose whether the system is abnormal. In addition, in terms of information security, administrators can also use system logs to warn of suspicious attacks. But as the system becomes large and complex, the number of system logs generated has also grown substantially. Therefore, an automatic anomaly detection mechanism is needed to replace manual search. According to past research, the anomaly detection model established through Long Short-Term Memory (LSTM) can effectively detect abnormal behaviors. This type of anomaly detection model determines the candidate list through the Top-g parameter. Setting a larger value represents a looser condition, which helps to improve the precision, but it will reduce the recall. A smaller value represents a stricter condition, which can increase the recall, but will reduce the precision. Users often need to weigh the precision and the recall, which cannot be both. This study proposes a dynamic Top-g method, which classifies sequences according to the conditions that appear in normal and abnormal datasets. When calculating the candidate list, the Top-g parameter can be dynamically set according to the category of the sequence data. The experimental results found that through dynamic Top-g setting parameters, the precision can reach 92%, and the recall can reach 99%.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T00:01:10Z (GMT). No. of bitstreams: 1 U0001-0608202018061800.pdf: 1560177 bytes, checksum: 115e5e1cd7f119096f855d6b9daf247e (MD5) Previous issue date: 2020	en
dc.description.tableofcontents	誌謝 I 中文摘要 II 英文摘要 III 目錄 IV 圖目錄 VI 表目錄 VIII 第一章介紹 1 第二章背景知識 3 2.1 循環神經網路 3 2.2 長短期記憶網路 4 2.3 一維卷積神經網路 5 第三章研究方法 6 3.1 HDFS資料集 6 3.2 日誌解析工具 6 3.3 參數與LSTM模型說明 7 3.4 預測資料處理方法 9 3.4.1 單筆資料處理 9 3.4.2 批次資料處理 10 3.4.3 平行資料處理 10 3.5 序列特徵擷取 11 3.6 一維卷積神經網路架構 12 3.7 動態 Top-g 12 第四章實驗結果 14 4.1 實驗環境設置 14 4.2 單筆資料處理效能測試 14 4.3 批次資料處理效能測試 15 4.4 平行資料處理效能測試 16 4.5 預測的精確度及召回率 20 4.6 HDFS資料集特徵分析 23 4.7 序列特徵擷取與分群 23 4.8 依序列特徵設定Top-g參數 25 4.9 使用一維卷積神經網路標記序列資料 26 4.10 動態Top-g實測結果 26 4.10.1 訓練階段流程圖 26 4.10.2 預測階段流程圖 27 4.10.3 Top-g實測結果 27 第五章結論與未來目標 29 5.1 結論 29 5.2 未來展望 29 參考文獻 30
dc.language.iso	zh-TW
dc.title	系統日誌異常檢測方法的效能評估	zh_TW
dc.title	Performance Evaluation of Anomaly Detection Methods for System Log Data	en
dc.type	Thesis
dc.date.schoolyear	108-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	施吉昇(Chi-Sheng Shih),楊佳玲(Chia-Lin Yang),梁文耀(Wen-Yew Liang)
dc.subject.keyword	異常偵測,系統日誌,長短期記憶網路,一維卷積神經網路,效能,	zh_TW
dc.subject.keyword	Anomaly detection,system log,Long Short-Term Memory,1D convolution neural network,Performance,	en
dc.relation.page	30
dc.identifier.doi	10.6342/NTU202002569
dc.rights.note	未授權
dc.date.accepted	2020-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
U0001-0608202018061800.pdf 目前未授權公開取用	1.52 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。