Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82005
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor魏志平(Chih-Ping Wei)
dc.contributor.authorYu-Yuan Chenen
dc.contributor.author陳禹媛zh_TW
dc.date.accessioned2022-11-25T05:33:57Z-
dc.date.available2023-08-16
dc.date.copyright2021-11-06
dc.date.issued2021
dc.date.submitted2021-08-17
dc.identifier.citationAgrawal, R., Stokes, J. W., Marinescu, M., Selvaraj, K. (2018). Neural Sequential Malware Detection with Parameters. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2656-2660. Allamanis, M. (2019). The Adverse Effects of Code Duplication in Machine Learning Models of Code. Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. Alon, U., Brody, S., Levy, O., Yahav, E. (2019). Code2Seq: Generating Sequences from Structured Representations of Code. Proceedings of International Conference on Learning Representations. Alon, U., Zilberstein, M., Levy, O., Yahav, E. (2019). Code2Vec: Learning Distributed Representations of Code. Proceedings of the ACM on Programming Languages, 3(POPL), 1-29. Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T. (2011). Graph-Based Malware Detection Using Dynamic Analysis. Journal in Computer Virology, 7(4), 247- 258. Anderson, B., Storlie, C., Lane, T. (2012). Improving Malware Classification: Bridging the Static/Dynamic Gap. Proceedings of the 5th ACM workshop on Security and Artificial Intelligence, 3-14. Bai, J. R., Wang, J. F., Zou, G. Z. (2014). A Malware Detection Scheme Based on Mining Format Information. The Scientific World Journal, 2014, 1-11. 60 Bayer, U., Kruegel, C., Kirda, E. (2006). TTAnalyze: A Tool for Analyzing Malware. Proceedings of 15th Annual Conference of the European Institute for Computer Antivirus Research, 180-192. Dai, J. Y., Guha, R., Lee, J. H. (2009). Efficient Virus Detection Using Dynamic Instruction Sequences. Journal of Computers, 4, 405-414. David, O. E., Netanyahu, N. S. (2015). DeepSign: Deep learning for Automatic Malware Signature Generation and Classification. Proceedings of 2015 International Joint Conference on Neural Networks (IJCNN), 1-8. Devlin, J., Chang, M. W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 4171-4186. Ding, S. H. H., Fung, B. C. M., Charland, P. (2019). Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization. Proceedings of 2019 IEEE Symposium on Security and Privacy (SP). Ding, Y. X., Zhu, S. Y. (2019). Malware Detection Based on Deep Learning Algorithm. Neural Computing and Applications, 31(2), 461-472. Elovici, Y., Shabtai, A., Moskovitch, R., Tahan, G., Glezer, C. (2007). Applying Machine Learning Techniques for Detection of Malicious Code in Network Traffic. Lecture Notes in Computer Science, 44-50. Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., Zhou, M. (2020). CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv preprint arXiv:2002.08155. 61 Firdausi, I., Lim, C., Erwin, A., Nugroho, A. S. (2010). Analysis of Machine Learning Techniques Used in Behavior-Based Malware Detection. Proceedings of 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies. Henchiri, O., Japkowicz, N. (2006). A Feature Selection and Evaluation Scheme for Computer Virus Detection. Proceedings of Sixth International Conference on Data Mining (ICDM'06). Hochreiter, S., Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. Huang, M., Cao, Y., Dong, C. (2016). Modeling Rich Contexts for Sentiment Classification with LSTM. arXiv preprint arXiv:1605.01478. Idika, N., Mathur, A. P. (2007). A Survey of Malware Detection Techniques. Working Paper, Department of Computer Science, Purdue University. Jung, J. M., Park, J. Y., Cho, S. J., Han, S. C., Park, M. K., Cho, H. H. (2021). Feature Engineering and Evaluation for Android Malware Detection Scheme. Journal of Internet Technology, 22(2), 423-440. Kanade, A., Maniatis, P., Balakrishnan, G., Shi, K. S. (2020). Learning and Evaluating Contextual Embedding of Source Code. Proceedings of International Conference on Machine Learning, 5110-5121. Kedziora, M., Gawin, P., Szczepanik, M., Jozwiak, I. (2019). Malware Detection Using Machine Learning Algorithms and Reverse Engineering of Android Java Code. SSRN Electronic Journal. Kingma, D. P., Ba, J. (2017). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations. 62 Kolosnjaji, B., Zarras, A., Webster, G., Eckert, C. (2016). Deep Learning for Classification of Malware System Call Sequences. Proceedings of Australasian joint conference on artificial intelligence, 137-149. Kolter, J. Z., Maloof, M. A. (2004). Learning to Detect Malicious Executables in the Wild. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Komashinskiy, D., Kotenko, I. (2010). Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. Proceedings of 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 617-623. Le, Q., Mikolov, T. (2014). Distributed Representations of Sentences and Documents. Proceedings of the 31st International Conference on International Conference on Machine Learning, 32, II–1188–II–1196. Liu, Y. H., Ott, M., Goyal, N., Du, J. F., Joshi, M., Chen, D. Q., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692. Masud, M. M., Khan, L., Thuraisingham, B. (2008). A Scalable Multi-Level Feature Extraction Technique to Detect Malicious Executables. Information Systems Frontiers, 10(1), 33-45. Mat, S. R. T., Ab Razak, M. F., Kahar, M. N. M., Arif, J. M., Mohamad, S., Firdaus, A. (2021). Towards a Systematic Description of the Field Using Bibliometric Analysis: Malware Evolution. Scientometrics, 126(3), 2013-2055. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781. 63 Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J. (2013). Distributed Representations of Words and Phrases and Their Compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, 2, 3111-3119. Moskovitch, R., Nissim, N., Elovici, Y. (2010). Acquisition of Malicious Code Using Active Learning. Proceedings of 2nd International Workshop on Privacy, Security, Trust in KDD. Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., Song, X., Ward, R. (2016). Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 694-707. Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., Thomas, A. (2015). Malware Classification with Recurrent Networks. Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1916-1920. Schultz, M. G., Eskin, E., Zadok, F., Stolfo, S. J. (2001). Data Mining Methods for Detection of New Malicious Executables. Proceedings of 2001 IEEE Symposium on Security and Privacy. Sennrich, R., Haddow, B., Birch, A. (2016). Neural Machine Translation of Rare Words with Subword Units. arXiv preprint arXiv:1508.07909. Tian, R., Islam, R., Batten, L., Versteeg, S. (2010). Differentiating Malware from Cleanware Using Behavioural Analysis. Proceedings of 2010 5th International Conference on Malicious and Unwanted Software. 64 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems, 5998-6008. Walenstein, A., Hefner, D. J., Wichers, J. (2010). Header Information in Malware Families and Impact on Automated Classifiers. Proceedings of 2010 5th International Conference on Malicious and Unwanted Software, 15-22. Wang, J. H., Deng, P. S., Fan, Y. S., Jaw, L. J., Liu, Y. C. (2003). Virus Detection Using Data Mining Techinques. Proceedings of IEEE 37th Annual 2003 International Carnahan Conference on Security Technology. Ye, Y., Li, T., Adjeroh, D., Iyengar, S. S. (2017). A Survey on Malware Detection Using Data Mining Techniques. ACM Computing Surveys, 50(3), 1-40. Ye, Y., Wang, D., Li, T., Ye, D. (2007). IMDS: Intelligent Malware Detection System. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Zhang, B., Yin, J. P., Hao, J. B., Zhang, D., Wang, S. L. (2006). Using Support Vector Machine to Detect Unknown Computer Viruses. International Journal of Computational Intelligence Research, 2(1), 100-104. Zhang, Z., Qi, P., Wang, W. (2020). Dynamic Malware Analysis with Feature Engineering and Feature Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(1), 1210-1217.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82005-
dc.description.abstract隨著科技的進步,越來越多人得以接觸電子產品,當一般使用者缺乏隱私及資訊安全的觀念時,無形中已經將自己暴露於危險當中,並且讓具有惡意的使用者(或稱之為攻擊者)有機可趁(例如:竊取機敏資料、勒索金錢)。因此,有效偵測惡意程式成為當務之急,其目的在於將惡意樣本從可疑樣本中區隔出來。 近年來,基於機器學習的偵測方法越來越受歡迎,其高度可調節性大幅降低傳統方法所需花費的時間與人力。然而,過去文獻採用的方法經常忽略字詞間的關係。因此,本篇論文採用深度學習技術,以更好地將字詞間的關係考慮進程式嵌入中。 本篇研究是第一個基於靜態分析方法,將原始碼應用於 PE 惡意程式偵測的研究。基於深度學習技術,我們建構三層的分層長短期記憶網路架構,利用函式嵌入、函式段嵌入、程式嵌入學習能完整代表一個樣本的原始碼嵌入。此外,為了更好的訓練模型,我們採納並應用多任務學習,並提出一個輔助任務:二分類標準呼叫函式。 根據實驗結果,我們驗證本研究提出的深度學習模型表現優於傳統的向量空間模型。加入輔助任務後,模型的 Macro F1 分數可以提升大約 3%。此外,我們也針對不同的輔助任務以及資料擴增策略,進行實驗並且評估其成效。zh_TW
dc.description.provenanceMade available in DSpace on 2022-11-25T05:33:57Z (GMT). No. of bitstreams: 1
U0001-1608202121552300.pdf: 2413249 bytes, checksum: 1c4e001ab1a0132f17b8a4c90d177fc8 (MD5)
Previous issue date: 2021
en
dc.description.tableofcontents誌謝 ii 摘要 iii Abstract iv Table of Contents vi List of Tables ix List of Figures x Chapter 1 Introduction 1 1.1 Background 1 1.1.1 Malware Analysis Approaches 2 1.1.2 Malware Detection Approaches 3 1.2 Motivation 6 1.3 Research Objectives 7 Chapter 2 Literature Review 9 2.1 Overview of Malware Analysis Approaches 9 2.1.1 Static Analysis Approach 9 2.1.2 Dynamic Analysis Approach 10 2.2 Overview of Malware Detection 11 2.2.1 Challenges in Malware Detection 11 2.2.2 Feature Engineering Model 12 2.2.3 Vector Space Model 13 2.2.4 Deep Learning Model 14 2.3 Overview of Code Representation Learning 15 2.3.1 Data Preprocessing for Code Representation Learning 16 2.3.2 Research Methods of Code Representation Learning 17 Chapter 3 Methodology 20 3.1 Overall Proposed Architecture 20 3.1.1 Binary Extraction 21 3.2 Program Structure Analysis 22 3.2.1 Caller-Callee Relation Map Construction 23 3.2.2 Call Tree Construction 23 3.2.3 Segment Construction from Call Trees 26 3.3 Hierarchical Embedding Construction 27 3.3.1 Word-level LSTM 28 3.3.2 Function-level LSTM 29 3.3.3 Segment-level LSTM 30 3.4 Malware Detection Model with an Auxiliary Task 30 3.4.1 Malware Detection Classification 31 3.4.2 Standard Call Binary Classification for Function Embedding 32 3.4.3 Training Strategies for Hierarchical LSTM Architecture 33 3.4.4 Overall Loss Function 35 Chapter 4 Evaluation 36 4.1 Evaluation Context 36 4.1.1 Data Collection 36 4.1.2 Dataset Preparation 37 4.2 Experiment Settings 39 4.2.1 Hyperparameter Settings 39 4.2.2 Pre-train Word Embedding 41 4.3 Evaluation Procedure 42 4.4 Empirical Evaluations 44 4.4.1 Comparison with Benchmark Models 44 4.4.2 Effects of Additional Auxiliary Tasks 47 4.4.3 Effects of Different Data Augmentation Strategies 52 4.4.4 Summary 55 Chapter 5 Conclusion 57 5.1 Summary 57 5.2 Limitations 58 5.3 Future Work 58 References 60
dc.language.isoen
dc.subject多任務學習zh_TW
dc.subject惡意程式偵測zh_TW
dc.subject原始碼zh_TW
dc.subject深度學習zh_TW
dc.subject分層長短期記憶網路zh_TW
dc.subject靜態分析zh_TW
dc.subjectstatic analysisen
dc.subjectmulti-task learningen
dc.subjectsource codeen
dc.subjectmalware detectionen
dc.subjectdeep learningen
dc.subjecthierarchical LSTMen
dc.title應用多任務學習分層長短期記憶網路於惡意程式偵測zh_TW
dc.titleA Hierarchical LSTM Approach in Multi-task Learning for Malware Detectionen
dc.date.schoolyear109-2
dc.description.degree碩士
dc.contributor.oralexamcommittee楊錦生(Hsin-Tsai Liu),吳家齊(Chih-Yang Tseng)
dc.subject.keyword惡意程式偵測,原始碼,深度學習,分層長短期記憶網路,靜態分析,多任務學習,zh_TW
dc.subject.keywordmalware detection,source code,deep learning,hierarchical LSTM,static analysis,multi-task learning,en
dc.relation.page65
dc.identifier.doi10.6342/NTU202102413
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2021-08-17
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
dc.date.embargo-lift2023-08-16-
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
U0001-1608202121552300.pdf
授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務)
2.36 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved