請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70944
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 李允中 | |
dc.contributor.author | Yu-Ching Hsu | en |
dc.contributor.author | 徐有慶 | zh_TW |
dc.date.accessioned | 2021-06-17T04:45:07Z | - |
dc.date.available | 2019-08-08 | |
dc.date.copyright | 2018-08-08 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-08-02 | |
dc.identifier.citation | [1] Celery. http://www.celeryproject.org/.
[2] Deguard. http://apk-deguard.com/. [3] Github api. https://developer.github.com/v3/. [4] Nice2predict. https://github.com/eth-srl/Nice2Predict. [5] B. Bichsel, V. Raychev, P. Tsankov, and M. Vechev. Statistical deobfuscation of android applications. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 343–355. ACM, 2016. [6] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, pages 93–104. ACM, 2000. [7] A. Einarsson and J. D. Nielsen. A survivor’s guide to java program analysis with soot. BRICS, Department of Computer Science, University of Aarhus, Denmark, page 17, 2008. [8] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996. [9] S.-W. Huang. Towards a solution to iot interoperability through reverse engineering. Master’s thesis, National Taiwan University, 2017. [10] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Prob- abilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282– 289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc. [11] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010. [12] T. K. Landauer, P. W. Foltz, and D. Laham. An introduction to latent semantic analysis. Discourse processes, 25(2-3):259–284, 1998. [13] V. I. Levenshtein. Binary codes capable of correcting deletions, insertions, and re- versals. In Soviet physics doklady, volume 10, pages 707–710, 1966. [14] D. Low. Protecting java code via code obfuscation. Crossroads, 4(3):21–23, Apr. 1998. [15] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [16] V. Raychev, M. Vechev, and A. Krause. Predicting program properties from big code. In ACM SIGPLAN Notices, volume 50, pages 111–124. ACM, 2015. [17] K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7):950–959, 2009. [18] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):591–611, 1965. [19] E. Ukkonen. Approximate string-matching with q-grams and maximal matches. The- oretical computer science, 92(1):191–211, 1992. [20] R. Vall ́ee-Rai, P. Co, E. Gagnon, L. Hendren, P. Lam, and V. Sundaresan. Soot - a java bytecode optimization framework. In Proceedings of the 1999 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON ’99, pages 13–. IBM Press, 1999. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70944 | - |
dc.description.abstract | 由於程式碼在經過編譯成為位元組碼後,能夠透過反編譯取得其程式碼,應用 程式為了保護其程式碼,在編譯時會使用程式碼混淆,透過改變使用者自定義的 名稱,降低其可讀性。在本研究中,我們主要透過以下3個步驟來解決代碼混淆的 問題: 1. 將每個程式轉換為對應的圖形,2. 從未被混淆的圖中收集子圖以形成模 式,並作為計算圖形相似度的基礎,3. 比較圖的相似度以獲得未知節點最有可能 的名稱。
我們也透過現有的CRF模型評估我們提出的方法的效益,並且透過假設檢定來 驗證我們提出的方法在預測實體類型的準確率優於CRF模型。 | zh_TW |
dc.description.abstract | Java source code can be obtained by decompiling its bytecode, therefore, obfuscation by modifying the names of packages, classes, and methods is usually adopted as a means to reduce the readability to protect the source code. In this research work, we address the obfuscation through the following three steps: 1. transform Java programs into their corresponding graphs, 2. collect sub-graphs from the graphs of non-obfuscated programs to form patterns as a basis for similarity calculation, and 3. compare the similarity of graphs to obtain a most probable name for the unknown node.
An experiment is also conducted to evaluate the benefit of our proposed approach with the extant CRF approach to show that our proposed approach is statistically more significant in improving the precision of predicting entity type than the extant CRF approach. | en |
dc.description.provenance | Made available in DSpace on 2021-06-17T04:45:07Z (GMT). No. of bitstreams: 1 ntu-107-R05922162-1.pdf: 13904606 bytes, checksum: 1d01050e3a05f99e7e4c12b6acbadebb (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 誌謝 i
摘要 ii Abstracts iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Related Work 4 2.1 Dataset .................................. 4 2.1.1 Crawling Android Projects in GitHub .............. 4 2.2 Parser ................................... 5 2.2.1 Soot................................ 5 2.3 Machine Learning Model......................... 6 2.3.1 Nice2Predict............................ 6 2.4 Clustering................................. 6 2.4.1 Latent Semantic Analysis .................... 6 2.4.2 Density-based Local Outlier Detection ............. 7 2.4.3 Density-based Spatial Clustering of Applications with Noise . 7 2.4.4 Silhouette Coefficient....................... 8 2.5 Graph Similarity ............................. 9 2.5.1 Graph Edit Distance ....................... 9 2.5.2 String Distance .......................... 10 Chapter 3 Increasing the Size of the Dataset 12 Chapter 4 Deobfuscation by Graph Matchmaking 15 4.1 Generating Dependency Graph ..................... 16 4.2 Clustering Process ............................ 19 4.2.1 Feature Selection ......................... 21 4.2.2 Clustering Algorithm Selection ................. 23 4.2.3 Validation of Results ....................... 24 4.2.4 Interpretation........................... 25 4.3 Pattern Identification........................... 25 4.3.1 Candidate Pattern ........................ 26 4.4 Graph Matching.............................. 27 4.4.1 Traversing Unknown Nodes ................... 28 4.4.2 Pruning Candidate Patterns................... 29 4.4.3 Graph Similarity ......................... 30 4.4.4 Distributed Computing...................... 31 Chapter 5 Experiment 34 5.1 Environment................................ 34 5.2 Procedure ................................. 35 5.2.1 Graph Matchmaking ....................... 35 5.2.2 Nice2Predict(CRF) ....................... 35 5.3 Result ................................... 37 Chapter 6 Conclusion 40 Bibliography 43 A Graph Match making Example 46 | |
dc.language.iso | en | |
dc.title | 利用圖媒合達成APK原始碼反混淆 | zh_TW |
dc.title | Deobfuscating APK with Graph Matchmaking | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 張智星,孫雅麗,劉立頌,徐國勛 | |
dc.subject.keyword | 逆向工程,程式碼混淆,圖媒合,圖相似度,反編譯, | zh_TW |
dc.subject.keyword | Reverse Engineering,Obfuscated Code,Graph Matchmaking,Graph Similarity,Decompilation, | en |
dc.relation.page | 53 | |
dc.identifier.doi | 10.6342/NTU201802358 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2018-08-02 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 13.58 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。