基於機器學習的求解器選擇增進符號執行效率

Sheng-Han Wen; 溫盛涵

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71273

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	蕭旭君(Hsu-Chun Hsiao)
dc.contributor.author	Sheng-Han Wen	en
dc.contributor.author	溫盛涵	zh_TW
dc.date.accessioned	2021-06-17T05:02:13Z	-
dc.date.available	2019-08-10
dc.date.copyright	2018-08-10
dc.date.issued	2018
dc.date.submitted	2018-07-25
dc.identifier.citation	[1] A. Aquino, F. A. Bianchi, M. Chen, G. Denaro, and M. Pezzè. Reusing constraint proofs in program analysis. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 305–315, New York, NY, USA, 2015. ACM. [2] A. Aquino, G. Denaro, and M. Pezzè. Heuristically matching solution spaces of arithmetic formulas to efficiently reuse solutions. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, pages 427–437, Piscataway, NJ, USA, 2017. IEEE Press. [3] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley. Enhancing symbolic execution with veritesting. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 1083–1094, New York, NY, USA, 2014. ACM. [4] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King, A. Reynolds, and C. Tinelli. Cvc4. In Proceedings of the 23rd International Conference on Computer Aided Verification, CAV’11, pages 171–177, Berlin, Heidelberg, 2011. Springer-Verlag. [5] C. Barrett, P. Fontaine, and C. Tinelli. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org, 2016. [6] L. S. A. Barrett, Clarkand de Moura. Smt-comp: Satisfiability modulo theories competition. In S. K. Etessami, Koushaand Rajamani, editor, Computer Aided Verification, pages 20–23, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg. [7] M. Borges, M. d’Amorim, S. Anand, D. Bushnell, and C. S. Pasareanu. Symbolic execution with interval solving and meta-heuristic search. In Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation, ICST ’12, pages 111–120, Washington, DC, USA, 2012. IEEE Computer Society. [8] R. Brummayer and A. Biere. Boolector: An efficient smt solver for bit-vectors and arrays. In Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009,, TACAS ’09, pages 174–177, Berlin, Heidelberg, 2009. Springer-Verlag. [9] C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 209–224, Berkeley, CA, USA, 2008. USENIX Association. [10] A. Cimatti, A. Griggio, B. J. Schaafsma, and R. Sebastiani. The mathsat5 smt solver. In Proceedings of the 19th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’13, pages 93–107, Berlin, Heidelberg, 2013. Springer-Verlag. [11] L. De Moura and N. Bjørner. Z3: An efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. [12] P. Dinges and G. Agha. Solving complex path conditions through heuristic search on induced polytopes. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 425–436, New York, NY, USA, 2014. ACM. [13] C. Dubois, P. Masci, and D. Méry, editors. Proceedings of the Third Workshop on Formal Integrated Development Environment, F-IDE@FM 2016, Limassol, Cyprus, November 8, 2016, volume 240 of EPTCS, 2017. [14] B. Dutertre. Yicesä2.2. In Proceedings of the 16th International Conference on Computer Aided Verification - Volume 8559, pages 737–744, New York, NY, USA, 2014. Springer-Verlag New York, Inc. [15] S. Gao, S. Kong, and E. M. Clarke. dreal: An smt solver for nonlinear theories over the reals. In Proceedings of the 24th International Conference on Automated Deduction, CADE’13, pages 208–214, Berlin, Heidelberg, 2013. Springer-Verlag. [16] M. Gario, A. Micheli, and F. B. Kessler. Pysmt: a solver-agnostic library for fast prototyping of smt-based algorithms. In Proceedings of the 13th International Workshop on Satisfiability Modulo Theories SMT, 2015. [17] P. Godefroid, N. Klarlund, and K. Sen. Dart: Directed automated random testing. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’05, pages 213–223, New York, NY, USA, 2005. ACM. [18] P. Godefroid, M. Y. Levin, and D. Molnar. Sage: Whitebox fuzzing for security testing. Queue, 10(1):20:20–20:27, Jan. 2012. [19] X. Jia, C. Ghezzi, and S. Ying. Enhancing reuse of constraint solutions to improve symbolic execution. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, ISSTA 2015, pages 177–187, New York, NY, USA, 2015. ACM. [20] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea. Efficient state merging in symbolic execution. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pages 193–204, New York, NY, USA, 2012. ACM. [21] X. Li, Y. Liang, H. Qian, Y.-Q. Hu, L. Bu, Y. Yu, X. Chen, and X. Li. Symbolic execution of complex program driven by machine learning based constraint solving. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pages 554–559, New York, NY, USA, 2016. ACM. [22] M. Lin, X. Hou, R. Liu, and L. Ge. Enhancing constraint based test generation by local search. In Proceedings of the 6th International Conference on Software and Computer Applications, ICSCA ’17, pages 154–158, New York, NY, USA, 2017. ACM. [23] H. Liu, E. R. Dougherty, J. G. Dy, K. Torkkola, E. Tuv, H. Peng, C. Ding, F. Long, M. Berens, H. Liu, L. Parsons, Z. Zhao, L. Yu, and G. Forman. Evolving feature selection. IEEE Intelligent Systems, 20(6):64–76, Nov 2005. [24] K. Luckow, M. Dimjašević, D. Giannakopoulou, F. Howar, M. Isberner, T. Kahsai,Z. Rakamarić, and V. Raman. Jdart: A dynamic symbolic analysis framework. In Proceedings of the 22Nd International Conference on Tools and Algorithms for the Construction and Analysis of Systems - Volume 9636, pages 442–459, New York, NY, USA, 2016. Springer-Verlag New York, Inc. [25] I. Mierswa and K. Morik. Automatic feature extraction for classifying audio data. Mach. Learn., 58(2-3):127–149, Feb. 2005. [26] H. Palikareva and C. Cadar. Multi-solver support in symbolic execution. In Proceedings of the 25th International Conference on Computer Aided Verification, CAV’13, pages 53–68, Berlin, Heidelberg, 2013. Springer-Verlag. [27] C. S. Păsăreanu, N. Rungta, and W. Visser. Symbolic execution with mixed concretesymbolic solving. In Proceedings of the 2011 International Symposium on Software Testing and Analysis, ISSTA ’11, pages 34–44, New York, NY, USA, 2011. ACM. [28] H. Riener, F. Haedicke, S. Frehse, M. Soeken, D. Grobe, R. Drechsler, and G. Fey. metasmt: Focus on your application and not on solver integration. Int. J. Softw. Tools Technol. Transf., 19(5):605–621, Oct. 2017. [29] K. Sen, D. Marinov, and G. Agha. Cute: A concolic unit testing engine for c. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 263–272, New York, NY, USA, 2005. ACM. [30] Y. Shoshitaishvili, R. Wang, C. Salls, N. Stephens, M. Polino, A. Dutcher, J. Grosen, S. Feng, C. Hauser, C. Kruegel, and G. Vigna. Sok: (state of) the art of war: Offensive techniques in binary analysis. In IEEE Symposium on Security and Privacy,2016. [31] M. Souza, M. Borges, M. d’Amorim, and C. S. Păsăreanu. Coral: Solving complex constraints for symbolic pathfinder. In Proceedings of the Third International Conference on NASA Formal Methods, NFM’11, pages 359–374, Berlin, Heidelberg, 2011. Springer-Verlag. [32] W. Visser, J. Geldenhuys, and M. B. Dwyer. Green: Reducing, reusing and recycling constraints in program analysis. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 58:1–58:11, New York, NY, USA, 2012. ACM. [33] S. Whiteson, P. Stone, K. O. Stanley, R. Miikkulainen, and N. Kohl. Automatic feature selection in neuroevolution. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation, GECCO ’05, pages 1225–1232, New York, NY, USA, 2005. ACM. [34] L. Xu, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Satzilla: Portfolio-based algorithm selection for sat. J. Artif. Int. Res., 32(1):565–606, June 2008. [35] Y. Zheng, X. Zhang, and V. Ganesh. Z3-str: A z3-based string solver for web application analysis. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pages 114–124, New York, NY, USA, 2013. ACM.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71273	-
dc.description.abstract	本論文提出了一個符號執行引擎中的新元件：路經約束條件分類器 (Path Constraint Classfier)，這個元件主要透過對符號執行(Symbolic Execution) 中的路徑約束條件選擇最適當的求解器 (Constraint Solver) 以增進符號執行在約束條件求解上的效率。在以往的符號執行中，通常一個符號執行引擎只會使用固定的一個求解器來進行求解，但根據我們初步的研究，不同的路徑約束條件都有著不同的特性，而當今熱門的求解器在不同的問題上也都有著不同的求解能力，若我們在對路徑約束條件求解時，可以先選擇一個最適合這個路徑約束條件的求解器，那符號執行的效率將可大大的提升。因此，本論文主要專注在這一點，實作了一個支援路徑約束條件分類的系統雛形，透過從符號執行引擎蒐集到的路徑約束條件，嘗試對其進行特徵抽取 (feature extraction)，並使用基於深度神經網路 (Deep Neural Network) 的機器學習模型進行求解器預測。在我們的系統雛型中，我們提出了約束路徑特徵快取 (Constraint Feature Cache) 的優化方法，使得預測求解器上的額外時間花費可以大幅度降低。除此之外，我們還提出了邏輯選擇 (logic selection) 以及備援求解器 (backup solver) 兩個優化技巧，藉此提升我們的系統在約束路徑求解的能力。	zh_TW
dc.description.abstract	Symbolic execution is a widely applied automated program analysis technique for test case generation and vulnerability discovery. Symbolic execution systematically explores possible program execution paths, represents each path as a set of constraints, and sends such constraints to an SMT solver for satisfilability check and input generation. Despite being a key enabler of symbolic execution, contraint solving can take up to 99% of total time [26] and thus becomes a serious performance bottleneck. Observing a plethora of SMT solvers with diverse capabilities, we seek to explore the following research questions: How much performance can symbolic execution gain if it can pick a priori the best solver for a given path contraint? How can such a prediction oracle be practically implemented? In this work, we first formally define the solver selection problem and its evaluation metrics, and perform a preliminary study to confirm that an performance improvement through solver selection is possible. We then present the design and implementation of Path Constraint Classifier (PCC), a machine learning based meta-solver that aims to reduce overall constraint solving latency by dynamically selecting a solver per query. The idea of using machine learning seems straightforward yet surprisingly less explored, and we identify one main technical challenge being how to avoid excessive overhead introduced by machine learning, including feature extraction. We address this challenge by taking advantage of structural characteristics of symbolic execution, and our experiment confirms that by sacrificing a small amount of time on solver prediction procedure, more performance benefit can be obtained. Also, our solution can be seamlessly integrated with other optimization techniques (e.g., constraint simplification, query reduction, and improved solvers) to further improve the scalability of constraint solving in symbolic execution.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T05:02:13Z (GMT). No. of bitstreams: 1 ntu-107-R05922009-1.pdf: 524070 bytes, checksum: b01357a84fd0082f3c4685730c0f7d61 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 iii Acknowledgements v 摘要 vii Abstract ix 1 Introduction 1 2 Background: Symbolic Execution and Constraint Solving 5 2.1 Performance Bottlenecks of Symbolic Execution . . . . . . . . . . . . . 6 2.2 Optimizations of Constraint Solving . . . . . . . . . . . . . . . . . . . . 7 3 Solver Selection Problem in Symbolic Execution 11 3.1 Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Proposed Solution and Challenges . . . . . . . . . . . . . . . . . . . . . 15 3.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Path Constraint Classifier Design and Implementation 19 4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Constraint Feature Cache Table . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Classification Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.5 Solving Procedure Optimization . . . . . . . . . . . . . . . . . . . . . . 25 5 Evaluation 27 5.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.2 Constraint Feature Cache . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 Solving Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.3.1 Training Data Generation . . . . . . . . . . . . . . . . . . . . . . 29 5.3.2 Experiment Settings and Evaluation Metrics . . . . . . . . . . . . 30 5.3.3 Experiment Result. . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Related Work 37 7 Conclusion and Future Work 39 Bibliography 41
dc.language.iso	en
dc.subject	符號執行	zh_TW
dc.subject	約束條件求解優化	zh_TW
dc.subject	複合式求解器	zh_TW
dc.subject	求解器選擇	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	machine learning	en
dc.subject	constraint solving optimization	en
dc.subject	symbolic execution	en
dc.subject	multi-solver	en
dc.subject	solver selection	en
dc.title	基於機器學習的求解器選擇增進符號執行效率	zh_TW
dc.title	Enhancing Symbolic Execution by Machine Learning Based Solver Selection	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃世昆(Shih-Kun Huang),許富皓(Fu-Hau Hsu),黃俊穎(Chun-Ying Huang)
dc.subject.keyword	符號執行,約束條件求解優化,複合式求解器,求解器選擇,機器學習,	zh_TW
dc.subject.keyword	symbolic execution,constraint solving optimization,multi-solver,solver selection,machine learning,	en
dc.relation.page	45
dc.identifier.doi	10.6342/NTU201801451
dc.rights.note	有償授權
dc.date.accepted	2018-07-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	511.79 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。