以三值樹狀自動機為基礎之惡意程式分析

Yi-Hsiang Wang; 王奕翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7064

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蔡益坤(Yih-Kuen Tsay)
dc.contributor.author	Yi-Hsiang Wang	en
dc.contributor.author	王奕翔	zh_TW
dc.date.accessioned	2021-05-17T10:18:08Z	-
dc.date.available	2012-01-17
dc.date.available	2021-05-17T10:18:08Z	-
dc.date.copyright	2012-01-17
dc.date.issued	2011
dc.date.submitted	2011-11-07
dc.identifier.citation	[1] AV-TEST Statistics. http://www.av-test.org/en/statistics/malware/. [2] Babic's website. http://www.domagoj-babic.com/index.php/ResearchProjects/MalwareAnalysis. [3] libSFTA. http://www. t.vutbr.cz/research/groups/veri t/tools/libSFTA/. [4] Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Com- put., 75(2):87-106, 1987. [5] Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319-342, 1987. [6] Domagoj Babic, Daniel Reynaud, and Dawn Song. Malware analysis with tree automata inference. In CAV, pages 116{131, 2011. [7] J er^ome Besombes and Jean-Yves Marion. Learning tree languages from positive examples and membership queries. Theor. Comput. Sci., 382(3):183{197, 2007. [8] Guillaume Bonfante, Matthieu Kaczmarek, and Jean-Yves Marion. Architecture of a morphological malware detector. Journal in Computer Virology, 5(3):263-270, 2009. [9] John M. Brayer and King-Sun Fu. A note on the k-tail method of tree grammar inference. Systems, Man and Cybernetics, IEEE Transactions on, 7(4):293-300, april 1977. [10] Yu-Fang Chen, Azadeh Farzan, Edmund M. Clarke, Yih-Kuen Tsay, and Bow- Yaw Wang. Learning Minimal Separating DFA's for Compositional Veri cation. In TACAS, pages 31{45, 2009. [11] Mihai Christodorescu and Somesh Jha. Testing malware detectors. In ISSTA, pages 34-44, 2004. [12] Colin de la Higuera. A bibliographical study of grammatical inference. Pattern Recognition, 38(9):1332-1348, 2005. [13] Colin de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA, 2010. [14] Frank Drewes. MAT learners for recognizable tree languages and tree series. Acta Cybern., 19(2):249-274, 2009. [15] Matt Fredrikson, Somesh Jha, Mihai Christodorescu, Reiner Sailer, and Xifeng Yan. Synthesizing near-optimal malware speci cations from suspicious behaviors. In IEEE Symposium on Security and Privacy, pages 45-60, 2010. [16] H. Fukuda and K. Kamata. Inference of tree automata from sample set of trees. Inter- national Journal of Parallel Programming, 13:177-196, 1984. 10.1007/BF00979871. [17] Pedro Garc a and Jose Oncina. Inference of Recognizable Tree Sets. Technical Report DSIC-II/47/93, Universidad de Alicante, 1993. [18] Clemens Kolbitsch, Paolo Milani Comparetti, Christopher Kruegel, Engin Kirda, Xiaoyong Zhou, and XiaoFeng Wang. E ective and e cient malware detection at the end host. In Proceedings of the 18th conference on USENIX security symposium, SSYM'09, pages 351-366, 2009. [19] Lorenzo Martignoni, Elizabeth Stinson, and John C. Mitchell. A layered architecture for detecting malicious behaviors. In Symposium on Recent Advances in Intrusion Detection (RAID), pages 78-97, 2008. [20] Yasubumi Sakakibara. Recent advances of grammatical inference. Theor. Comput. Sci., 185(1):15-45, 1997. [21] Symantec. Symantec Internet Security Threat Report, Trends for 2010. http://www.symantec.com/business/threatreport/, 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/7064	-
dc.description.abstract	網路上存在許多不同的資安威脅，其中最惡名昭彰的就是惡意程式。惡意程式指的是那些帶有惡意企圖並且有惡意行為的程式。典型的惡意程式包含了病毒、蠕蟲、木馬與間諜軟體。惡意程式偵測軟體可降低我們被惡意程式攻擊的風險。不同的偵測軟體有其各別的偵測方法，但在現今的偵測軟體中最基本也最普遍被應用的方法是靜態特徵碼比對。然而這種偵測方法已經普遍的被認為無法對抗現今更為進階的惡意程式。進階的惡意程式使用程式混淆的手法改變程式本身的結構，也因此能夠非常輕易的避過偵查軟體。幸運的是，程式本身的語意在經過混淆後通常仍會保持一致，因此一個可行的對策就是設計一個以程式語意為基礎的偵測方法。在這篇論文裡，我們提出一個以程式語意為基礎的惡意程式偵測方法。觀察近年來被提出的各種辦法，我們發覺字串仍然被廣泛的使用為一種特徵碼的形式。我們可以將字串擴充為樹，樹不但更為一般化而且帶有更多的語意資訊。因此，我們使用樹做為我們偵測軟體的特徵碼形式。我們的偵測軟體需要一組惡意程式跟一組正常程式做為輸入。這些程式的語意會以系統呼叫的資料相依圖表示。接著，我們將這些相依圖解析為樹。使用文法推論的方法，我們最後可以得到一個三值的樹狀自動機。一個三值的樹狀自動機有三個互斥的最終狀態：接受、拒絕、與未知。如果我們使用三值樹狀自動機做為我們的惡意程式偵測軟體，根據輸入的程式他會有三種可能的輸出值。如果輸入的程式是一個惡意程式，偵測軟體會輸出是。如果輸入的程式是一個正常程式，偵測軟體會輸出否。其餘的狀況下他會輸出未知。根據我們的實驗結果，我們的偵測軟體有著相當低的誤報。然而，相對的代價是許多程式經過軟體檢查後的結果是未知。	zh_TW
dc.description.abstract	There exist many security threats on the Internet, and the most notorious is malware. Malware (malicious software) refers to programs that have malicious intention and per- form some harmful actions. Typical malware includes viruses, worms, trojan horses, and spyware. The rst line of defense to deter malware is malware detector. Each malware detector has its own analysis method. The most basic and prevalent methods used in commercial malware detectors are based on syntactic signature matching. It is widely recognized that this detection mechanism cannot cope with advanced malware. Advanced malware uses program obfuscation to alter program structures and therefore can evade the detection easily. However, the semantics of a malware instance is usually preserved after obfuscation. So, it is feasible to develop a malware detector that is based on pro- gram semantics. In this thesis, we propose a semantics-based approach to malware analysis. Observing recently proposed methods for malware detection, we notice that string-based signatures are still used widely. It is natural to extend from string to tree, which is more general and can carry more semantics. Therefore, we use trees as signatures. Our malware detector requires a set of malware instances and a set of benign programs. The semantics of each input program is extracted and represented as a system call dependence graph. The graph is then transformed into a tree. With the set of trees generated from malware and benign programs, we use the method of grammatical inference to learn a 3-valued deter- ministic nite tree automaton (3DFT). A 3DFT has three di erent nal states: accept, reject, and unknown. If we take this 3DFT as the malware detector, it outputs three di erent values. If an input program is a malware instance, the detector outputs true. If an input program is a benign program, the detector outputs false. Otherwise, it outputs unknown. According to our experiments, our detector exhibits very low false positives. However, there is a tradeo that many programs are identi ed as unknown.	en
dc.description.provenance	Made available in DSpace on 2021-05-17T10:18:08Z (GMT). No. of bitstreams: 1 ntu-100-R98725035-1.pdf: 1598430 bytes, checksum: 192e75a6def3d9b0ca7eed7982b25fca (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related Work 6 2.1 Grammatical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Tree Automata Inference . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Malware Analysis with Executed System Calls . . . . . . . . . . . . . . . 10 2.2.1 Eective and Ecient Malware Detection at the End Host . . . . 10 2.2.2 A Layered Architecture for Detecting Malicious Behaviors . . . . 12 2.3 Malware Analysis with Tree Automata . . . . . . . . . . . . . . . . . . . 15 2.3.1 Architecture of a Morphological Malware Detector . . . . . . . . . 15 2.3.2 Malware Analysis with Tree Automata Inference . . . . . . . . . . 17 3 Preliminaries 19 3.1 Finite Ordered Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Finite Tree Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Approach 22 4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 3-Valued Deterministic Finite Tree Automata . . . . . . . . . . . . . . . 24 4.3 Semantics Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.4 Graph Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Automata Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 28 4.5.1 Learning Algorithm of Drewes . . . . . . . . . . . . . . . . . . . . 28 4.5.2 Tree Automata Learning Algorithm . . . . . . . . . . . . . . . . . 31 5 Implementation and Experiments 33 5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 37 6 Conclusion 44 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Bibliography 47
dc.language.iso	zh-TW
dc.subject	系統呼叫	zh_TW
dc.subject	惡意程式分析	zh_TW
dc.subject	惡意程式偵測軟體	zh_TW
dc.subject	文法推論	zh_TW
dc.subject	三值自動機	zh_TW
dc.subject	程式語意	zh_TW
dc.subject	System Call	en
dc.subject	Malware Analysis	en
dc.subject	Malware Detector	en
dc.subject	Grammatical Inference	en
dc.subject	3-Valued Automata	en
dc.subject	Program Semantics	en
dc.title	以三值樹狀自動機為基礎之惡意程式分析	zh_TW
dc.title	Malware Analysis with 3-Valued Deterministic Finite Tree Automata	en
dc.type	Thesis
dc.date.schoolyear	100-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王柏堯(Bow-Yaw Wang),陳郁方(Yu-Fang Chen)
dc.subject.keyword	惡意程式分析,惡意程式偵測軟體,文法推論,三值自動機,程式語意,系統呼叫,	zh_TW
dc.subject.keyword	Malware Analysis,Malware Detector,Grammatical Inference,3-Valued Automata,Program Semantics,System Call,	en
dc.relation.page	48
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2011-11-07
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf	1.56 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。