請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92163
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 廖世偉 | zh_TW |
dc.contributor.advisor | Shih-Wei Liao | en |
dc.contributor.author | 彭旻翊 | zh_TW |
dc.contributor.author | Ming-Yi Peng | en |
dc.date.accessioned | 2024-03-07T16:22:36Z | - |
dc.date.available | 2024-03-08 | - |
dc.date.copyright | 2024-03-07 | - |
dc.date.issued | 2024 | - |
dc.date.submitted | 2024-02-18 | - |
dc.identifier.citation | [1] N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole G. Gabrielli, M. Horsnell, G.Magklis, A. Martinez, N. Premillieu, A. Reid, A. Rico, P. Walker, "The ARM Scalable Vector Extension," IEEE Micro, vol. 37, pp. 26-39, 2017, doi: 10.1109/mm.2017.35.
[2] M. Perotti, M. Cavalcante, N. Wistoff, R. Andri, L. Cavigelli, and L. Benini, "A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design," presented at the 2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2022. [3] R. Allen and K. Kennedy, "PFC: A program to convert Fortran to parallel form," Department of Mathematical Sciences, Rice University, Technical Report, 1982. [4] R. Allen and K. Kennedy, "Automatic translation of FORTRAN programs to vector form," ACM Trans. Program. Lang. Syst., vol. 9, no. 4, pp. 491–542, 1987, doi: 10.1145/29873.29875. [5] V. Porpodas, R. C. O. Rocha, E. Brevnov, L. F. W. Góes, and T. Mattson, "Super-Node SLP: Optimized Vectorization for Code Sequences Containing Operators and Their Inverse Elements," in 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 16-20 Feb. 2019 2019, pp. 206-216, doi: 10.1109/CGO.2019.8661192. [6] D. Callahan, J. J. Dongarra, and D. Levine, "Vectorizing compilers: a test suite and results," Proceedings. SUPERCOMPUTING ''88, pp. 98-105, 1988. [7] C. Lattner and V. Adve, "LLVM: a compilation framework for lifelong program analysis & transformation," in International Symposium on Code Generation and Optimization, 2004. CGO 2004., 20-24 March 2004 2004, pp. 75-86, doi: 10.1109/CGO.2004.1281665. [8] C. Lattner et al., "MLIR: A Compiler Infrastructure for the End of Moore''s Law," CoRR, vol. abs/2002.11054, / 2020. [9] C. Lattner et al., "MLIR: Scaling Compiler Infrastructure for Domain Specific Computation," in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 27 Feb.-3 March 2021 2021, pp. 2-14, doi: 10.1109/CGO51591.2021.9370308. [10] S.-L. Wu, X.-Y. Wang, M.-Y. Peng, and S.-W. Liao, "Accelerating OpenVX through Halide and MLIR," Journal of Signal Processing Systems, vol. 95, no. 5, pp. 571-584, 2023/05/01 2023, doi: 10.1007/s11265-022-01826-8. [11] S. Larsen and S. Amarasinghe, "Exploiting superword level parallelism with multimedia instruction sets," presented at the Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, Vancouver, British Columbia, Canada, 2000. [12] N. D. Rosen I, Zaks A., "Loop-aware SLP in GCC," presented at the GCC and GNU Toolchain Developers'' Summit 2007, Ottawa, ON, Canada, 2007. [13] K. D. Cooper and L. Torczon, Engineering A Compiler 2nd Edition. 2012. [14] The LLVM Compiler Infrastructure. "Iterating over def-use & use-def chains." https://llvm.org/docs/ProgrammersManual.html#iterating-over-def-use-use-def-chains. [15] F. Bellard, "QEMU, a fast and portable dynamic translator," presented at the Proceedings of the annual conference on USENIX Annual Technical Conference, Anaheim, CA, 2005. [16] Y. Chen, C. Mendis, M. Carbin, and S. Amarasinghe, "VeGen: a vectorizer generator for SIMD and beyond," presented at the Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual, USA, 2021. [17] C. Mendis, C. Yang, Y. Pu, S. Amarasinghe, and M. Carbin, "Compiler auto-vectorization with imitation learning," in Proceedings of the 33rd International Conference on Neural Information Processing Systems: Curran Associates Inc., 2019, p. Article 1310. [18] V. Porpodas, R. C. O. Rocha, and L. F. W. Góes, "VW-SLP: auto-vectorization with adaptive vector width," presented at the Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, 2018. [19] C. Mendis and S. Amarasinghe, "goSLP: globally optimized superword level parallelism framework," Proc. ACM Program. Lang., vol. 2, no. OOPSLA, p. Article 110, 2018, doi: 10.1145/3276480. [20] N. Adit and A. Sampson, "Performance Left on the Table: An Evaluation of Compiler Autovectorization for RISC-V," IEEE Micro, vol. 42, no. 5, pp. 41-48, 2022, doi: 10.1109/MM.2022.3184867. | - |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92163 | - |
dc.description.abstract | 在當今資料驅動的時代,處理大數據的需求也迅速提升,因此程式執行效能成為重要的研究方向。現代處理器普遍配備了單指令多資料流(Single Instruction, Multiple Data, SIMD)處理元件,且各種指令集架構均支持相對應的向量擴展指令集,例如Arm支援Neon與 SVE,RISC-V支援RVV。自動向量化為一種編譯器最佳化技術,可以讓程式碼在編譯階段自動轉換為向量指令,從而充分發揮向量處理單元的性能,提高程式運行效率。本研究探討了LLVM編譯器內實現的超字組平行(Superword Level Parallelism, SLP)自動向量化技術,並且針對當前演算法中尚未涵蓋的領域進行改良最佳化,以拓寬SLP技術的應用場景。進一步地,本研究分別在Arm和RISC-V架構上進行了模擬效能測試,以驗證改良演算法的實際效益。 | zh_TW |
dc.description.abstract | In the data-driven era, the demand for processing large datasets has rapidly increased, and execution efficiency has become an important research direction. Modern processors are commonly equipped with Single Instruction, Multiple Data (SIMD) processing units, and various instruction set architectures support corresponding vector extension instruction sets. Auto-vectorization is a compiler optimization technique that enables program codes to be automatically translated into vector instructions during the compilation stage, thereby fully utilizing the vector processing unit and enhancing the efficiency of program execution.
We investigate the superword level parallelism (SLP) auto-vectorization implemented in the LLVM compiler and refine the existing algorithm in order to expand the scope of its application scenarios. Furthermore, this research has conducted performance simulation and evaluations on Arm and RISC-V architectures to verify the actual benefits of the improved algorithm. | en |
dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-03-07T16:22:36Z No. of bitstreams: 0 | en |
dc.description.provenance | Made available in DSpace on 2024-03-07T16:22:36Z (GMT). No. of bitstreams: 0 | en |
dc.description.tableofcontents | 口試委員會審定書 i
摘要 ii ABSTRACT iii TABLE OF CONTENTS iv LIST OF FIGURES v LIST OF TABLES vi CHAPTER 1. INTRODUCTION 1 CHAPTER 2. BACKGROUND 3 2.1 LLVM 3 2.2 AUTO-VECTORIZATION 4 2.3 SUPERWORD LEVEL PARALLELISM 6 2.4 ARM 8 2.5 RISC-V 9 CHAPTER 3. MOTIVATION 10 3.1 MOTIVATION 10 CHAPTER 4. METHOD 14 4.1 RECURSIVELY BUILDING THE VECTORIZATION TREE 14 4.2 CHECKING COMMUTATIVITY 16 4.3 COMMUTATIVE REORDERING 17 CHAPTER 5. EVALUATION AND DISCUSSION 19 5.1 ENVIRONMENTAL SETUP 19 5.2 EVALUATION AND DISCUSSION 20 CHAPTER 6. CONCLUSION AND FUTURE WORK 23 6.1 CONCLUSION 23 6.2 RELATED WORK 24 6.3 FUTURE WORK 25 REFERENCES 26 | - |
dc.language.iso | en | - |
dc.title | Arm和RISC-V架構編譯器自動向量化和超字組平行的設計與分析 | zh_TW |
dc.title | Design and Analysis of Compiler Auto-vectorization and Superword Level Parallelism on Arm and RISC-V Architecture | en |
dc.type | Thesis | - |
dc.date.schoolyear | 112-1 | - |
dc.description.degree | 碩士 | - |
dc.contributor.oralexamcommittee | 傅楸善;黃敬群;鄭振牟 | zh_TW |
dc.contributor.oralexamcommittee | Chiou-Shann Fuh;Ching-Chun Huang;Chen-Mou Cheng | en |
dc.subject.keyword | 單指令多資料流,自動向量化,超字組平行,LLVM,Arm,RISC-V, | zh_TW |
dc.subject.keyword | SIMD,Auto-vectorization,SLP,LLVM,Arm,RISC-V, | en |
dc.relation.page | 28 | - |
dc.identifier.doi | 10.6342/NTU202400692 | - |
dc.rights.note | 未授權 | - |
dc.date.accepted | 2024-02-18 | - |
dc.contributor.author-college | 電機資訊學院 | - |
dc.contributor.author-dept | 資訊工程學系 | - |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-1.pdf 目前未授權公開取用 | 2.57 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。