利用圖媒合達成APK原始碼反混淆

Yu-Ching Hsu; 徐有慶

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70944

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李允中
dc.contributor.author	Yu-Ching Hsu	en
dc.contributor.author	徐有慶	zh_TW
dc.date.accessioned	2021-06-17T04:45:07Z	-
dc.date.available	2019-08-08
dc.date.copyright	2018-08-08
dc.date.issued	2018
dc.date.submitted	2018-08-02
dc.identifier.citation	[1] Celery. http://www.celeryproject.org/. [2] Deguard. http://apk-deguard.com/. [3] Github api. https://developer.github.com/v3/. [4] Nice2predict. https://github.com/eth-srl/Nice2Predict. [5] B. Bichsel, V. Raychev, P. Tsankov, and M. Vechev. Statistical deobfuscation of android applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 343–355. ACM, 2016. [6] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM, 2000. [7] A. Einarsson and J. D. Nielsen. A survivor’s guide to java program analysis with soot. BRICS, Department of Computer Science, University of Aarhus, Denmark, page 17, 2008. [8] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996. [9] S.-W. Huang. Towards a solution to iot interoperability through reverse engineering. Master’s thesis, National Taiwan University, 2017. [10] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Prob- abilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282– 289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. [11] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010. [12] T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes, 25(2-3):259–284, 1998. [13] V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and re- versals. In Soviet physics doklady, volume 10, pages 707–710, 1966. [14] D. Low. Protecting java code via code obfuscation. Crossroads, 4(3):21–23, Apr. 1998. [15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [16] V. Raychev, M. Vechev, and A. Krause. Predicting program properties from big code. In ACM SIGPLAN Notices, volume 50, pages 111–124. ACM, 2015. [17] K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7):950–959, 2009. [18] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965. [19] E. Ukkonen. Approximate string-matching with q-grams and maximal matches. The- oretical computer science, 92(1):191–211, 1992. [20] R. Vall ́ee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON ’99, pages 13–. IBM Press, 1999.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70944	-
dc.description.abstract	由於程式碼在經過編譯成為位元組碼後，能夠透過反編譯取得其程式碼，應用程式為了保護其程式碼，在編譯時會使用程式碼混淆，透過改變使用者自定義的名稱，降低其可讀性。在本研究中，我們主要透過以下3個步驟來解決代碼混淆的問題: 1. 將每個程式轉換為對應的圖形，2. 從未被混淆的圖中收集子圖以形成模式，並作為計算圖形相似度的基礎，3. 比較圖的相似度以獲得未知節點最有可能的名稱。我們也透過現有的CRF模型評估我們提出的方法的效益，並且透過假設檢定來驗證我們提出的方法在預測實體類型的準確率優於CRF模型。	zh_TW
dc.description.abstract	Java source code can be obtained by decompiling its bytecode, therefore, obfuscation by modifying the names of packages, classes, and methods is usually adopted as a means to reduce the readability to protect the source code. In this research work, we address the obfuscation through the following three steps: 1. transform Java programs into their corresponding graphs, 2. collect sub-graphs from the graphs of non-obfuscated programs to form patterns as a basis for similarity calculation, and 3. compare the similarity of graphs to obtain a most probable name for the unknown node. An experiment is also conducted to evaluate the benefit of our proposed approach with the extant CRF approach to show that our proposed approach is statistically more significant in improving the precision of predicting entity type than the extant CRF approach.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:45:07Z (GMT). No. of bitstreams: 1 ntu-107-R05922162-1.pdf: 13904606 bytes, checksum: 1d01050e3a05f99e7e4c12b6acbadebb (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstracts iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Dataset .................................. 4 2.1.1 Crawling Android Projects in GitHub .............. 4 2.2 Parser ................................... 5 2.2.1 Soot................................ 5 2.3 Machine Learning Model......................... 6 2.3.1 Nice2Predict............................ 6 2.4 Clustering................................. 6 2.4.1 Latent Semantic Analysis .................... 6 2.4.2 Density-based Local Outlier Detection ............. 7 2.4.3 Density-based Spatial Clustering of Applications with Noise . 7 2.4.4 Silhouette Coefficient....................... 8 2.5 Graph Similarity ............................. 9 2.5.1 Graph Edit Distance ....................... 9 2.5.2 String Distance .......................... 10 Chapter 3 Increasing the Size of the Dataset 12 Chapter 4 Deobfuscation by Graph Matchmaking 15 4.1 Generating Dependency Graph ..................... 16 4.2 Clustering Process ............................ 19 4.2.1 Feature Selection ......................... 21 4.2.2 Clustering Algorithm Selection ................. 23 4.2.3 Validation of Results ....................... 24 4.2.4 Interpretation........................... 25 4.3 Pattern Identification........................... 25 4.3.1 Candidate Pattern ........................ 26 4.4 Graph Matching.............................. 27 4.4.1 Traversing Unknown Nodes ................... 28 4.4.2 Pruning Candidate Patterns................... 29 4.4.3 Graph Similarity ......................... 30 4.4.4 Distributed Computing...................... 31 Chapter 5 Experiment 34 5.1 Environment................................ 34 5.2 Procedure ................................. 35 5.2.1 Graph Matchmaking ....................... 35 5.2.2 Nice2Predict(CRF) ....................... 35 5.3 Result ................................... 37 Chapter 6 Conclusion 40 Bibliography 43 A Graph Match making Example 46
dc.language.iso	en
dc.subject	逆向工程	zh_TW
dc.subject	程式碼混淆	zh_TW
dc.subject	圖媒合	zh_TW
dc.subject	圖相似度	zh_TW
dc.subject	反編譯	zh_TW
dc.subject	Obfuscated Code	en
dc.subject	Reverse Engineering	en
dc.subject	Decompilation	en
dc.subject	Graph Similarity	en
dc.subject	Graph Matchmaking	en
dc.title	利用圖媒合達成APK原始碼反混淆	zh_TW
dc.title	Deobfuscating APK with Graph Matchmaking	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	張智星,孫雅麗,劉立頌,徐國勛
dc.subject.keyword	逆向工程,程式碼混淆,圖媒合,圖相似度,反編譯,	zh_TW
dc.subject.keyword	Reverse Engineering,Obfuscated Code,Graph Matchmaking,Graph Similarity,Decompilation,	en
dc.relation.page	53
dc.identifier.doi	10.6342/NTU201802358
dc.rights.note	有償授權
dc.date.accepted	2018-08-02
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	13.58 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。