利用機器學習之Android惡意程式偵測架構

Wan-Ting Yeh; 葉婉婷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52310

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	洪士灝
dc.contributor.author	Wan-Ting Yeh	en
dc.contributor.author	葉婉婷	zh_TW
dc.date.accessioned	2021-06-15T16:11:35Z	-
dc.date.available	2017-08-31
dc.date.copyright	2015-08-31
dc.date.issued	2015
dc.date.submitted	2015-08-18
dc.identifier.citation	Bibliography [1] “Accelerate Machine Learning with the cuDNN Deep Neural Network Library,” https://public.gdatasoftware.com/Presse/Publikationen/Malware_Reports/G_DATA_ MobileMWR_Q1_2015_US.pdf. [2] A. P. Felt, M. Finifter, E. Chin, S. Hanna, and D. Wagner, “A survey of mobile mal- ware in the wild,” in Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices. ACM, 2011, pp. 3–14. [3] “Kantar worldpanel comtech’s smartphone os market share data q4 2013,” http://www. kantarworldpanel.com/smartphone-os-market-share/. [4] “Trendlabs3Q 2012 SECURITY ROUNDUP Android Under Siege: Popularity Comes at a Price,” http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/reports/ rpt-3q-2012-security-roundup-android-under-siege-popularity-comes-at-a-price.pdf. [5] M. Grace, Y. Zhou, Q. Zhang, S. Zou, and X. Jiang, “Riskranker: scalable and accurate zero-day android malware detection,” in Proceedings of the 10th international conference on Mobile systems, applications, and services. ACM, 2012, pp. 281–294. [6] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu, “Droidmat: Android malware detection through manifest and api calls tracing,” in Information Security (Asia JCIS), 2012 Seventh Asia Joint Conference on. IEEE, 2012, pp. 62–69. 37 [7] D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, and C. Siemens, “Drebin: Effective and explainable detection of android malware in your pocket,” in Proceedings of the Annual Symposium on Network and Distributed System Security (NDSS), 2014. [8] B. Amos, H. Turner, and J. White, “Applying machine learning classifiers to dynamic android malware detection at scale,” in Wireless Communications and Mobile Computing Conference (IWCMC), 2013 9th International. IEEE, 2013, pp. 1666–1671. [9] W.-C. Wu and S.-H. Hung, “Droiddolphin: a dynamic android malware detection frame- work using big data and machine learning,” in Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems. ACM, 2014, pp. 247–252. [10] G. E. Dahl, J. W. Stokes, L. Deng, and D. Yu, “Large-scale malware classification using random projections and neural networks,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 3422–3426. [11] W. Yu, L. Ge, G. Xu, and X. Fu, “Towards neural network based malware detection on android mobile devices,” in Cybersecurity Systems for Human Cognition Augmentation. Springer, 2014, pp. 99–117. [12] “DroidBox,” https://code.google.com/p/droidbox/. [13] “TaintDroid: Realtime Privacy Monitoring on Smartphones,” http://appanalysis.org. [14] P. O’Kane, S. Sezer, and K. McLaughlin, “N-gram density based malware detection,” in Computer Applications & Research (WSCAR), 2014 World Symposium on. IEEE, 2014, pp. 1–6. [15] T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, “Project adam: Building an efficient and scalable deep learning training system.” [16] Nvidia, “Accelerate Machine Learning with the cuDNN Deep Neural Network Library,” http://devblogs.nvidia.com/parallelforall/ accelerate-machine-learning-cudnn-deep-neural-network-library/. 38 [17] D. Cires ̧an, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deep neural network for traffic sign classification,” Neural Networks, vol. 32, pp. 333–338, 2012. [18] T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for lvcsr,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8614–8618. [19] S.-J. Chang and S.-H. Hung, “Ape: A Smart Automatic Testing Environment for Android Malware,” http://www.airitilibrary.com/Publication/alDetailedMesh1?DocID= U0001-1908201316171100. [20] “Caffe \| Deep Learning Framework,” http://caffe.berkeleyvision.org. [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representa- tions in vector space,” arXiv preprint arXiv:1301.3781, 2013. [22] “APIMonitor,” https://code.google.com/p/droidbox/wiki/APIMonitor. [23] “Android Logcat,” http://developer.android.com/tools/help/logcat.html. [24] “Android Monkeyrunner,” http://developer.android.com/tools/help/monkey.html.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52310	-
dc.description.abstract	現今行動裝置應用越來越普及,豐富且便利了我們的生活,然而,隨著應用程式的增加,惡意的應用程式也逐漸增加。根據 G DATA 的統計顯示,有超過一半的惡意應用程式是針對財務方面的應用,安裝了這類的應用程式,會造成使用者金錢損失。機器學習已經被廣泛地應用在判斷惡意程式上,大部份的論文使用 trigram 來抽取應用程式的行為,作為機器學習的輸入,但只有少數的論文在使用巨量的資料下能取得良好的準確率。本論文針對 trigram 的缺點探討,並提出嶄新的輸入格式,搭配卷積類神經網路(CNN)作為機器學習的演算法來改善低準確率的問題。就我們所知,本論文是第一個將卷積類神經網路演算法應用在惡意程式偵測應用,且針對整個網路架構進行完整討論的論文。使用我們所設計的輸入格式,卷積類神經網路能達到類似 k-skip-n-gram 的效果,學習到更複雜的行為來偵測惡意程式。根據我們的實驗結果,新的輸入格式在不同配置下的卷積類神經網路架構,都能達到很好且穩定的準確率。在使用 32,000 個應用程式下,本論文所提出的架構能達到 93.012% 的預測準確度、12.9% 的誤判率,同時我們整合卷積類神經網路和 SVM 兩個不同特性的機器學習演算法,能有效降低誤判率到達 3%。最後,我們將本論文的架構與 NVIDIA 低功耗的開發板 Jetson-TK1 結合,進行惡意程式學習與預測,雖然機器學習的訓練時間增加,但有效地節省了整體架構的耗電量。	zh_TW
dc.description.abstract	Mobile applications are getting more and more popular nowadays with various kinds of applications to make our lives more convenient.Unfortunately, as the number of applications grows, malicious applications, also known as malware, arise as well.In addition, more than a half of malware are financially motivated and cause huge loss of money according to the statistics of G DATA in 2015.While machine learning techniques have been adopted to identify malware, most of the prior works use trigram as the input format to extract the behavior patterns for mobile applications, but only a few of them obtain good performance with a large dataset. In this thesis, we discuss the weaknesses of the trigram-based machine learning methods and further improve the accuracy of malware detection by adding a new machine learning method based on the convolutional neural network (CNN) with a novel flattened input format.To our knowledge, this is the first work to discuss the usability of CNN on malware detection. With the proposed flattened input format, our CNN scheme can perform a k-skip-n-gram dimensionality reduction which learns more flexible and complex patterns to detect different types of malware from the trigram-based methods. Our experimental results show that the flattened input format yield good and stable accuracies with a simple topology design of the CNN scheme under different configurations.With 32,000 applications in our training set, CNN achieves 93.01% prediction accuracy and 12.9% FNR. After looking into the results of CNN, we can reduce FNR to 3% by using aggregation with SVM while retaining a similar accuracy.We demonstrate that running CNN on NVIDIA Jetson-TK1 further saves a half of power consumption comparing to the modern graphic cards, which reveals a new application scenario with low-cost, pervasive malware detection even on mobile platforms.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T16:11:35Z (GMT). No. of bitstreams: 1 ntu-104-R02922118-1.pdf: 1100222 bytes, checksum: 6e0c89ac67cbb3efebecbd1db3cda6bd (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	致謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要................................................................ ii Abstract................................................................. iii 1 Introduction.......................................................... 1 1.1 Motivation......................................................... 1 1.2 ThesisOrganization ................................................. 3 RelatedWorks ........................................................ 4 2.1 StaticandDynamicAnalysis .......................................... 4 2.2 TaintDroid......................................................... 4 2.3 MalwareDetection.................................................. 5 2.4 TheOpportunityforCNN............................................. 5 3 DesignofCNN........................................................ 8 3.1 FlattenedData...................................................... 9 3.2 ArchitectureofCNN................................................. 10 3.2.1 InputOrder: OpportunityorChallenge............................... 11 3.2.2 KernelSize: BeyondSquare ....................................... 12 3.2.3 InnerProductLayer.............................................. 13 3.3 Re-train........................................................... 15 4 EvaluationResult...................................................... 17 4.1 ExperimentalSetup.................................................. 17 4.2 Dataset........................................................... 17 4.3 Metrics ........................................................... 18 4.4 ExperimentalResult................................................. 18 4.4.1 EffectofUnbalancedDataset ...................................... 18 4.4.2 QuanityofData:Balanced ......................................... 19 4.5 Comparison........................................................ 19 4.6 Aggregation........................................................ 21 4.7 PredictiononJetsonTK1............................................. 22 5 BackgroundKnowledges................................................ 32 5.1 MalwareAnalysis................................................... 32 5.1.1 StaticAnalysis.................................................. 32 5.1.2 DynamicAnalysis ............................................... 32 5.2 Preprocessing ...................................................... 33 5.3 Emulation......................................................... 33 5.4 FeatureExtraction................................................... 34 6 ConclusionandFutureWork............................................ 36 Bibliography............................................................. 37 List of Figures Figure2.1 Convolution Operator .......................................... 6 Figure2.2 Pooling Operator.............................................. 7 Figure3.1 Architecture.................................................. 9 Figure3.2 CNN Architecture............................................. 9 Figure3.3 A Example of Input Feature for CNN. ............................. 10 Figure3.4 Three kinds of Flattened Data.................................... 11 Figure3.5 Comparison of Sensitivity....................................... 14 Figure3.6 Adding a Hidden Layer......................................... 14 Figure4.1 Benign/Malicious Ratio......................................... 19 Figure4.2 Quantity of Data:Banlanced...................................... 20 Figure4.3 Different Threshold of Confidence ................................ 22 Figure5.1 The Application Behavior Log ................................... 35 List of Tables Table3.1 Convolution Kernel Size and Ordering.............................. 13 Table3.2 Re-train Time with Accuracy ..................................... 15 Table4.1 The Result of DroidDolphin...................................... 20 Table4.2 Comparison Performance of DroidDolphin .......................... 20 Table4.3 Aggregation(1) ................................................ 21 Table4.4 Aggregation(2) ................................................ 21 Table4.5 Comparison Performance with TK1 and PC.......................... 23 Table4.6 Different Aggregation with SVM and CNN.......................... 23 Table5.1 SensitiveAPI.................................................. 33 Table5.2 TagsofDroidBox .............................................. 34
dc.language.iso	en
dc.subject	平坦化輸入	zh_TW
dc.subject	深度類神經網路	zh_TW
dc.subject	卷積類神經網路	zh_TW
dc.subject	惡意程式偵測	zh_TW
dc.subject	SVM	zh_TW
dc.subject	整合	zh_TW
dc.subject	Aggregation	en
dc.subject	Flattened Events	en
dc.subject	Deep Neural Network	en
dc.subject	Convolutional Neural Network	en
dc.subject	Malware Detection	en
dc.subject	SVM	en
dc.title	利用機器學習之Android惡意程式偵測架構	zh_TW
dc.title	Effectively Detecting Android Malware with Machine Learning	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林軒田,徐宏民,涂嘉恆,林風
dc.subject.keyword	深度類神經網路,卷積類神經網路,惡意程式偵測,SVM,整合,平坦化輸入,	zh_TW
dc.subject.keyword	Deep Neural Network,Convolutional Neural Network,Malware Detection,SVM,Aggregation,Flattened Events,	en
dc.relation.page	39
dc.rights.note	有償授權
dc.date.accepted	2015-08-18
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	1.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。