請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88343
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 廖世偉 | zh_TW |
dc.contributor.advisor | Shih-Wei Liao | en |
dc.contributor.author | 王祥宇 | zh_TW |
dc.contributor.author | Xiang-Yu Wang | en |
dc.date.accessioned | 2023-08-09T16:38:17Z | - |
dc.date.available | 2023-11-09 | - |
dc.date.copyright | 2023-08-09 | - |
dc.date.issued | 2023 | - |
dc.date.submitted | 2023-07-25 | - |
dc.identifier.citation | [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265–283, 2016.
[2] A. H. Ashouri, W. Killian, J. Cavazos, G. Palermo, and C. Silvano. A survey on compiler autotuning using machine learning. ACM Computing Surveys (CSUR), 51(5):1–42, 2018. [3] C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7–16, Juan-les-Pins, France, September 2004. [4] M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. The polyhedral model is more widely applicable than you think. In R. Gupta, editor, Compiler Construction, pages 283–303, Berlin, Heidelberg, 2010. Springer Berlin Heidelb [5] A. Bik, P. Koanantakool, T. Shpeisman, N. Vasilache, B. Zheng, and F. Kjols Compiler support for sparse tensor computations in mlir. ACM Transactions on Architecture and Code Optimization (TACO), 19(4):1–25, 2022. [6] U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43(6):101–113, June 2008. [7] A. Brauckmann, A. Goens, and J. Castrillon. Polygym: Polyhedral optimizations as an environment for reinforcement learning. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 17–29, 2021. [8] T. Chen, T. Moreau, Z. Jiang, L. Zheng, E. Yan, M. Cowan, H. Shen, L. Wang, Y. Hu, L. Ceze, C. Guestrin, and A. Krishnamurthy. Tvm: An automated end-to-end optimizing compiler for deep learning, 2018. [9] J. Clemons, H. Zhu, S. Savarese, and T. Austin. Mevbench: A mobile computer vision benchmarking suite. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 91–102, 2011. [10] J. Clemons, H. Zhu, S. Savarese, and T. Austin. Mevbench: A mobile computer vision benchmarking suite. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 91–102, 2011. [11] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005. [12] G. Fursin, Y. Kashnikov, A. W. Memon, Z. Chamski, O. Temam, M. Namolaru, E. Yom-Tov, B. Mendelson, A. Zaks, E. Courtois, et al. Milepost gcc: Machin learning enabled self-tuning compiler. International journal of parallel programming, 39:296–327, 2011. [13] A. Haj-Ali, N. K. Ahmed, T. Willke, Y. S. Shao, K. Asanovic, and I. Stoica. Neurovectorizer: End-to-end vectorization with deep reinforcement learning. In Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, pages 242–255, 2020. [14] Q. Huang, A. Haj-Ali, W. Moses, J. Xiang, I. Stoica, K. Asanovic, and J. Wawrzynek. Autophase: Compiler phase-ordering for hls with deep reinforcement learning. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 308–308. IEEE, 2019. [15] S. Jain, Y. Andaluri, S. VenkataKeerthy, and R. Upadrasta. Poset-rl: Phase ordering for optimizing size and execution time using reinforcement learning. In 2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 121–131. IEEE, 2022. [16] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [17] S. Kirkpatrick, C. D. Gelatt Jr, and M. P. Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983. [18] M. D. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. ACM SIGOPS Operating Systems Review, 25(Special Issue):63–74, 1991. [19] C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program ysis amp; transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO ’04, page 75, USA, 2004. IEEE Computer Society. [20] C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pages 2–14, 2021. [21] W. S. Moses, L. Chelini, R. Zhao, and O. Zinenko. Polygeist: Raising c to polyhedral mlir. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 45–59, 2021. [22] R. T. Mullapudi, A. Adams, D. Sharlet, J. Ragan-Kelley, and K. Fatahalian. Automatically scheduling halide image processing pipelines. ACM Trans. Graph., 35(4), July 2016. [23] M. Nagiub and W. Farag. Automatic selection of compiler options using genetic techniques for embedded software design. In 2013 IEEE 14th international symposium on computational intelligence and informatics (CINTI), pages 69–74. IEEE, 2013. [24] Y. Nourani and B. Andresen. A comparison of simulated annealing cooling strategies. Journal of Physics A: Mathematical and General, 31(41):8373, 1998. [25] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Proceedings of the 34th ACM SIGPL Conference on Programming Language Design and Implementation, PLDI ’13, page 519–530, New York, NY, USA, 2013. Association for Computing Machinery. [26] F. Romeo and A. Sangiovanni-Vincentelli. A theoretical framework for simulated annealing. Algorithmica, 6:302–345, 1991. [27] M. O. T. Kisuki, P.M.W. Knijnenburg. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques, 2000. [28] S. Verdoolaege. Isl: An integer set library for the polyhedral model. In Proceedings of the Third International Congress Conference on Mathematical Software, ICMS’10, page 299–302, Berlin, Heidelberg, 2010. Springer-Verlag. [29] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, and Y. Wang. Intel Math Kernel Library, pages 167–188. Springer International Publishing, Cham, 2014. [30] S.-L. Wu, X.-Y. Wang, M.-Y. Peng, and S.-W. Liao. Accelerating openvx through halide and mlir. Journal of Signal Processing Systems, pages 1–14, 2023. [31] Z. Xianyi, W. Qian, and Z. Yunquan. Model-driven level 3 blas performance optimization on loongson 3a processor. In 2012 IEEE 18th international conference on parallel and distributed systems, pages 684–691. IEEE, 20 | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88343 | - |
dc.description.abstract | 電腦視覺算法在影像中心的設備上的使用越來越多,這使得我們需要優化這些演算法以提高其在資源受限的平台上的性能。Halide是一種在C++上的特定領域語言用來處理高性能圖像處理演算法,可以將演算法與指令排程分離。本論文提出了一種使用模擬退火的隨機演算法優化排程的方法,以提高Halide電腦視覺演算法的性能,從而實現在有限時間內探索仿射轉換的全局最優解。為了將Halide程式碼轉換為MLIR,我們使用了新的編譯流程,即Halide到MLIR(HTM)編譯器。此方法的效能將在不同平台上進行評估,例如x86、Arm和RISC-V。研究展示了MLIR在仿射方言(Affine Dialect)上的轉換和優化能力,並顯示了使用隨機演算法優化基礎建設以充分最佳化MLIR不同優化能力的必要性。 | zh_TW |
dc.description.abstract | The increasing usage of computer vision algorithms in camera-centric devices has led to a growing need for optimizing these algorithms to improve their performance on resource-constrained platforms. Halide is a language specific to image processing algorithms that separates the algorithm's scheduling from its implementation, resulting in high performance. This thesis proposes an approach to improve the performance of Halide computer vision algorithms using stochastic algorithms such as simulated annealing to optimize scheduling, which enables exploring the global optimum of affine scheduling in constrained time. To convert the Halide program to MLIR, we use novel compile flows, namely the Halide to MLIR (HTM) converter. The efficacy of the approach will be evaluated on different platforms, such as x86, ARM, and RISC-V. The study demonstrates the potential of MLIR's transformation and optimization capabilities on Affine dialects and highlights the need for tuning infrastructure to fully leverage MLIR's optimization capabilities. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-09T16:38:17Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2023-08-09T16:38:17Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | Verification Letter from the Oral Examination Committee i
Acknowledgements ii 摘要 iii Abstract iv Contents v List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2. Background 5 2.1. Halide 5 2.2. LLVM 6 2.3. MLIR. 7 2.4. Affine transformation 9 Chapter 3 Motivation. 13 3.1. Motivation. 13 Chapter 4 Design and Implementation 16 4.1 HTM.................................. 16 4.2 stochastic optimization algorithm ................... 17 4.2.1 Simulatedannealing.......................... 17 4.2.1.1 Markov decision process ................ 19 4.3 code generation 20 Chapter 5 Evaluation and Discussion 23 5.1 Environment Setup. 23 5.2 Experiment Steps. 24 5.3 Result. 25 5.4 Discussion. 31 Chapter 6 Conclusion and Future work. 35 6.1 Conclusion. 35 6.2 Related work. 37 6.3 Future work. 39 References. 41 | - |
dc.language.iso | en | - |
dc.title | 在MLIR上透過隨機優化演算法自動化Affine排程並加速Halide | zh_TW |
dc.title | Accelerating Halide framework by automatic Affine scheduling using the stochastic optimization algorithm on MLIR | en |
dc.type | Thesis | - |
dc.date.schoolyear | 111-2 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 陸忠立;黃敬群;鄭振牟;黎士瑋 | zh_TW |
dc.contributor.oralexamcommittee | Zhong-Li Lu;Jing-Qun Huang;Chen-Mou Cheng;Shih-Wei Li | en |
dc.subject.keyword | Halide,MLIR,仿射轉換,隨機優化演算法, | zh_TW |
dc.subject.keyword | Halide,MLIR,Affine transformation,stochastic optimization algorithm, | en |
dc.relation.page | 45 | - |
dc.identifier.doi | 10.6342/NTU202300704 | - |
dc.rights.note | 同意授權(限校園內公開) | - |
dc.date.accepted | 2023-07-27 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊網路與多媒體研究所 | - |
顯示於系所單位: | 資訊網路與多媒體研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-111-2.pdf 目前未授權公開取用 | 1.13 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。