在MLIR上透過隨機優化演算法自動化Affine排程並加速Halide

王祥宇; Xiang-Yu Wang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88343

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	廖世偉	zh_TW
dc.contributor.advisor	Shih-Wei Liao	en
dc.contributor.author	王祥宇	zh_TW
dc.contributor.author	Xiang-Yu Wang	en
dc.date.accessioned	2023-08-09T16:38:17Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-09	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-25	-
dc.identifier.citation	[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016. [2] A. H. Ashouri, W. Killian, J. Cavazos, G. Palermo, and C. Silvano. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR), 51(5):1–42, 2018. [3] C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7–16, Juan-les-Pins, France, September 2004. [4] M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In R. Gupta, editor, Compiler Construction, pages 283–303, Berlin, Heidelberg, 2010. Springer Berlin Heidelb [5] A. Bik, P. Koanantakool, T. Shpeisman, N. Vasilache, B. Zheng, and F. Kjols Compiler support for sparse tensor computations in mlir. ACM Transactions on Architecture and Code Optimization (TACO), 19(4):1–25, 2022. [6] U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43(6):101–113, June 2008. [7] A. Brauckmann, A. Goens, and J. Castrillon. Polygym: Polyhedral optimizations as an environment for reinforcement learning. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 17–29, 2021. [8] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning, 2018. [9] J. Clemons, H. Zhu, S. Savarese, and T. Austin. Mevbench: A mobile computer vision benchmarking suite. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 91–102, 2011. [10] J. Clemons, H. Zhu, S. Savarese, and T. Austin. Mevbench: A mobile computer vision benchmarking suite. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 91–102, 2011. [11] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005. [12] G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, et al. Milepost gcc: Machin learning enabled self-tuning compiler. International journal of parallel programming, 39:296–327, 2011. [13] A. Haj-Ali, N. K. Ahmed, T. Willke, Y. S. Shao, K. Asanovic, and I. Stoica. Neurovectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, pages 242–255, 2020. [14] Q. Huang, A. Haj-Ali, W. Moses, J. Xiang, I. Stoica, K. Asanovic, and J. Wawrzynek. Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 308–308. IEEE, 2019. [15] S. Jain, Y. Andaluri, S. VenkataKeerthy, and R. Upadrasta. Poset-rl: Phase ordering for optimizing size and execution time using reinforcement learning. In 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 121–131. IEEE, 2022. [16] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [17] S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983. [18] M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review, 25(Special Issue):63–74, 1991. [19] C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program ysis amp; transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO ’04, page 75, USA, 2004. IEEE Computer Society. [20] C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2–14, 2021. [21] W. S. Moses, L. Chelini, R. Zhao, and O. Zinenko. Polygeist: Raising c to polyhedral mlir. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 45–59, 2021. [22] R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. Automatically scheduling halide image processing pipelines. ACM Trans. Graph., 35(4), July 2016. [23] M. Nagiub and W. Farag. Automatic selection of compiler options using genetic techniques for embedded software design. In 2013 IEEE 14th international symposium on computational intelligence and informatics (CINTI), pages 69–74. IEEE, 2013. [24] Y. Nourani and B. Andresen. A comparison of simulated annealing cooling strategies. Journal of Physics A: Mathematical and General, 31(41):8373, 1998. [25] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPL Conference on Programming Language Design and Implementation, PLDI ’13, page 519–530, New York, NY, USA, 2013. Association for Computing Machinery. [26] F. Romeo and A. Sangiovanni-Vincentelli. A theoretical framework for simulated annealing. Algorithmica, 6:302–345, 1991. [27] M. O. T. Kisuki, P.M.W. Knijnenburg. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000. [28] S. Verdoolaege. Isl: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software, ICMS’10, page 299–302, Berlin, Heidelberg, 2010. Springer-Verlag. [29] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, and Y. Wang. Intel Math Kernel Library, pages 167–188. Springer International Publishing, Cham, 2014. [30] S.-L. Wu, X.-Y. Wang, M.-Y. Peng, and S.-W. Liao. Accelerating openvx through halide and mlir. Journal of Signal Processing Systems, pages 1–14, 2023. [31] Z. Xianyi, W. Qian, and Z. Yunquan. Model-driven level 3 blas performance optimization on loongson 3a processor. In 2012 IEEE 18th international conference on parallel and distributed systems, pages 684–691. IEEE, 20	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88343	-
dc.description.abstract	電腦視覺算法在影像中心的設備上的使用越來越多，這使得我們需要優化這些演算法以提高其在資源受限的平台上的性能。Halide是一種在C++上的特定領域語言用來處理高性能圖像處理演算法，可以將演算法與指令排程分離。本論文提出了一種使用模擬退火的隨機演算法優化排程的方法，以提高Halide電腦視覺演算法的性能，從而實現在有限時間內探索仿射轉換的全局最優解。為了將Halide程式碼轉換為MLIR，我們使用了新的編譯流程，即Halide到MLIR（HTM）編譯器。此方法的效能將在不同平台上進行評估，例如x86、Arm和RISC-V。研究展示了MLIR在仿射方言(Affine Dialect)上的轉換和優化能力，並顯示了使用隨機演算法優化基礎建設以充分最佳化MLIR不同優化能力的必要性。	zh_TW
dc.description.abstract	The increasing usage of computer vision algorithms in camera-centric devices has led to a growing need for optimizing these algorithms to improve their performance on resource-constrained platforms. Halide is a language specific to image processing algorithms that separates the algorithm's scheduling from its implementation, resulting in high performance. This thesis proposes an approach to improve the performance of Halide computer vision algorithms using stochastic algorithms such as simulated annealing to optimize scheduling, which enables exploring the global optimum of affine scheduling in constrained time. To convert the Halide program to MLIR, we use novel compile flows, namely the Halide to MLIR (HTM) converter. The efficacy of the approach will be evaluated on different platforms, such as x86, ARM, and RISC-V. The study demonstrates the potential of MLIR's transformation and optimization capabilities on Affine dialects and highlights the need for tuning infrastructure to fully leverage MLIR's optimization capabilities.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:38:17Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-09T16:38:17Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2. Background 5 2.1. Halide 5 2.2. LLVM 6 2.3. MLIR. 7 2.4. Affine transformation 9 Chapter 3 Motivation. 13 3.1. Motivation. 13 Chapter 4 Design and Implementation 16 4.1 HTM.................................. 16 4.2 stochastic optimization algorithm ................... 17 4.2.1 Simulatedannealing.......................... 17 4.2.1.1 Markov decision process ................ 19 4.3 code generation 20 Chapter 5 Evaluation and Discussion 23 5.1 Environment Setup. 23 5.2 Experiment Steps. 24 5.3 Result. 25 5.4 Discussion. 31 Chapter 6 Conclusion and Future work. 35 6.1 Conclusion. 35 6.2 Related work. 37 6.3 Future work. 39 References. 41	-
dc.language.iso	en	-
dc.subject	隨機優化演算法	zh_TW
dc.subject	仿射轉換	zh_TW
dc.subject	MLIR	zh_TW
dc.subject	Halide	zh_TW
dc.subject	Halide	en
dc.subject	MLIR	en
dc.subject	Affine transformation	en
dc.subject	stochastic optimization algorithm	en
dc.title	在MLIR上透過隨機優化演算法自動化Affine排程並加速Halide	zh_TW
dc.title	Accelerating Halide framework by automatic Affine scheduling using the stochastic optimization algorithm on MLIR	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	陸忠立;黃敬群;鄭振牟;黎士瑋	zh_TW
dc.contributor.oralexamcommittee	Zhong-Li Lu;Jing-Qun Huang;Chen-Mou Cheng;Shih-Wei Li	en
dc.subject.keyword	Halide,MLIR,仿射轉換,隨機優化演算法,	zh_TW
dc.subject.keyword	Halide,MLIR,Affine transformation,stochastic optimization algorithm,	en
dc.relation.page	45	-
dc.identifier.doi	10.6342/NTU202300704	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2023-07-27	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊網路與多媒體研究所	-
dc.date.embargo-lift	2025-07-24	-
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	1.13 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。