請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17210
標題: | 系統日誌異常檢測方法的效能評估 Performance Evaluation of Anomaly Detection Methods for System Log Data |
作者: | Chi-Shih Wang 王啟時 |
指導教授: | 洪士灝(Shih-Hao Hung) |
關鍵字: | 異常偵測,系統日誌,長短期記憶網路,一維卷積神經網路,效能, Anomaly detection,system log,Long Short-Term Memory,1D convolution neural network,Performance, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 幾乎所有的系統都會有系統日誌,內容記錄了系統執行時期豐富的資訊,包含開機、關機、登入、登出以及異常事件等資訊,管理人員可透過對系統日誌的分析,診斷系統是否有出現異常的情況,此外,在資安事件方面,也可藉由系統日誌的訊息,對可疑的攻擊行為發出警告。但隨著系統變得龐大且複雜,所產生的系統日誌數量也大幅成長,因此,需要透過自動異常偵測機制來取代人工查找的方式。而根據過去的研究,透過長短期記憶 (Long Short-Term Memory, LSTM) 建立的異常偵測模型,可有效偵測出異常的行為。但由於這類型異常偵測模型是透過Top-g參數的設定,決定出預測結果的候選清單數量,設定較大的數值代表較寬鬆的條件,有助於提升精確度(precision),但會降低召回率(recall),而較小的數值代表較嚴格的條件,能提升召回率,但會降低精確度,使用者常需要權衡準確度及召回率,無法同時兼顧。本研究提出動態Top-g的方法,將序列依照出現在正常及異常資料集的狀況作分類,於計算候選清單時,讓Top-g參數可以依照序列資料所屬的類別作動態設定,實驗結果發現,透過動態Top-g設定參數,精確度可達到92%,召回率可達到99%。 Almost all systems has system logs. It records rich information about startup, shutdown, login, logout, and error events. Administrators can analyze the system logs to diagnose whether the system is abnormal. In addition, in terms of information security, administrators can also use system logs to warn of suspicious attacks. But as the system becomes large and complex, the number of system logs generated has also grown substantially. Therefore, an automatic anomaly detection mechanism is needed to replace manual search. According to past research, the anomaly detection model established through Long Short-Term Memory (LSTM) can effectively detect abnormal behaviors. This type of anomaly detection model determines the candidate list through the Top-g parameter. Setting a larger value represents a looser condition, which helps to improve the precision, but it will reduce the recall. A smaller value represents a stricter condition, which can increase the recall, but will reduce the precision. Users often need to weigh the precision and the recall, which cannot be both. This study proposes a dynamic Top-g method, which classifies sequences according to the conditions that appear in normal and abnormal datasets. When calculating the candidate list, the Top-g parameter can be dynamically set according to the category of the sequence data. The experimental results found that through dynamic Top-g setting parameters, the precision can reach 92%, and the recall can reach 99%. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17210 |
DOI: | 10.6342/NTU202002569 |
全文授權: | 未授權 |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
U0001-0608202018061800.pdf 目前未授權公開取用 | 1.52 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。