請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96172完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 廖世偉 | zh_TW |
| dc.contributor.advisor | Shih-Wei Liao | en |
| dc.contributor.author | 陳至成 | zh_TW |
| dc.contributor.author | Chih-Cheng Chen | en |
| dc.date.accessioned | 2024-11-19T16:09:28Z | - |
| dc.date.available | 2024-11-20 | - |
| dc.date.copyright | 2024-11-19 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-11-13 | - |
| dc.identifier.citation | [1] J. R. Allen and K. Kennedy. Pfc: A program to convert fortran to parallel form.1982.
[2] R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form.ACM Transactions on Programming Languages and Systems (TOPLAS), 9(4):491–542, 1987. [3] R. Bellman. Dynamic programming. science, 153(3731):34–37, 1966. [4] M. S. Chauhan. Review of llvm compiler architecture enhancements for cuda. AsianJournal of Computer and Information Systems, 4(1), 2016. [5] Y. Chen, C. Mendis, M. Carbin, and S. Amarasinghe. Vegen: a vectorizer generatorfor simd and beyond. In Proceedings of the 26th ACM International Conference onArchitectural Support for Programming Languages and Operating Systems, pages902–914, 2021. [6] L. Dagum and R. Menon. Openmp: an industry standard api for shared-memoryprogramming. IEEE Computational Science and Engineering, 5(1):46–55, 1998. [7] J. G. Feng, Y. P. He, and Q. M. Tao. Evaluation of compilers'capability ofautomatic vectorization based on source code analysis. Scientific Programming,2021(1):3264624, 2021. [8] J. L. Henning. Spec cpu2006 benchmark descriptions. ACM SIGARCH ComputerArchitecture News, 34(4):1–17, 2006. [9] L. Kalms, T. Hebbeler, and D. Göhringer. Automatic opencl code generation fromllvm-ir using polyhedral optimization. In Proceedings of the 9th Workshop and7th Workshop on Parallel Programming and RunTime Management Techniquesfor Manycore Architectures and Design Tools and Architectures for MulticoreEmbedded Computing Platforms, pages 45–50, 2018. [10] R. Kumar, A. Martinez, and A. Gonzalez. A variable vector length simd architecturefor hw/sw co-designed processors. arXiv preprint arXiv:2102.13410, 2021. [11] S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multi-media instruction sets. Acm Sigplan Notices, 35(5):145–156, 2000. [12] C. Lattner and V. Adve. Llvm: a compilation framework for lifelong programanalysis transformation. In International Symposium on Code Generation andOptimization, 2004. CGO 2004., pages 75–86, 2004. [13] C. R. Lazo, E. Reggiani, C. R. Morales, R. F. Bagué, L. A. V. Vargas, M. A. R.Salinas, M. V. Cortés, O. S. Ünsal, and A. Cristal. Adaptable register file organizationfor vector processors. In 2022 IEEE International Symposium on High-PerformanceComputer Architecture (HPCA), pages 786–799. IEEE, 2022. [14] C. Mendis and S. Amarasinghe. goslp: globally optimized superword levelparallelism framework. Proceedings of the ACM on Programming Languages,2(OOPSLA):1–28, 2018. [15] C. Mendis, C. Yang, Y. Pu, D. S. Amarasinghe, and M. Carbin. Compiler auto-vectorization with imitation learning. Advances in Neural Information ProcessingSystems, 32, 2019. [16] D. Naishlos. Autovectorization in gcc. In Proceedings of the 2004 GCC developerssummit, pages 105–118. Citeseer, 2004. [17] D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In InternationalSymposium on Code Generation and Optimization (CGO’06), pages 11–pp. IEEE,2006. [18] D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for simd.ACM SIGPLAN Notices, 41(6):132–143, 2006. [19] M. Perotti, M. Cavalcante, N. Wistoff, R. Andri, L. Cavigelli, and L. Benini. A“newara"for vector computing: An open source highly efficient risc-v v 1.0 vector pro-cessor design. In 2022 IEEE 33rd International Conference on Application-specificSystems, Architectures and Processors (ASAP), pages 43–51. IEEE, 2022. [20] V. Porpodas, R. C. Rocha, E. Brevnov, L. F. Góes, and T. Mattson. Super-node slp:Optimized vectorization for code sequences containing operators and their inverseelements. In 2019 IEEE/ACM International Symposium on Code Generation andOptimization (CGO), pages 206–216. IEEE, 2019. [21] V. Porpodas, R. C. Rocha, and L. F. Góes. Look-ahead slp: Auto-vectorization inthe presence of commutative operations. In Proceedings of the 2018 InternationalSymposium on Code Generation and Optimization, pages 163–174, 2018. [22] V. Porpodas, R. C. Rocha, and L. F. Góes. Vw-slp: auto-vectorization with adap-tive vector width. In Proceedings of the 27th International Conference on ParallelArchitectures and Compilation Techniques, pages 1–15, 2018. [23] I. Rosen, D. Nuzman, and A. Zaks. Loop-aware slp in gcc. In GCC DevelopersSummit, pages 131–142, 2007. [24] N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell,G. Magklis, A. Martinez, N. Premillieu, et al. The arm scalable vector extension.IEEE micro, 37(2):26–39, 2017. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96172 | - |
| dc.description.abstract | 在現今由資料驅動計算的時代,處理大型資料集的需求日漸增長,使得程序運行效率成為研究的重點。當代的處理器普遍搭載了單指令多資料流(SIMD)處理單元,並且多數指令集架構也都支持各自的向量擴展指令集。例如,RISC-V支援RVV,而Arm架構則支援Neon和SVE。
自動向量化是編譯器的優化技術之一,它能夠自動地在編譯過程中將純量指令轉換為向量指令,讓開發者能夠利用向量處理單元的性能潛力,同時減少開發者撰寫程式的負擔。 本研究旨在透過優化儲存指令向量化的切片選擇策略,來增強LLVM中實作的超字組平行化(SLP),從而發掘更多潛在的向量化機會。此外,還在Arm處理器上進行了效能模擬,以檢驗此設計的實際增益。 | zh_TW |
| dc.description.abstract | In the current era of data-driven computing, the need for processing large datasets has grown exponentially, making program execution efficiency a critical focus of research. Most processors are equipped with Single Instruction, Multiple Data (SIMD) units, and modern instruction set architectures also support vector extensions, such as RVV for RISC-V, and Neon and SVE for Arm.
Auto-vectorization, a compiler optimization technique, transforms scalar code into vector instructions during compilation, allowing developers to fully exploit the performance potential of vector processing units while minimizing manual effort. This study aims to enhance the Superword Level Parallelism (SLP) auto-vectorization implemented in the LLVM compiler by optimizing slice selection for Store vectorization, thereby uncovering more potential vectorization opportunities. Additionally, performance simulations were conducted on Arm processors to verify the practical benefits of the optimized algorithm. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-11-19T16:09:28Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-11-19T16:09:28Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 摘要 i
Abstract ii Contents iv List of Figures vi List of Tables vii Chapter 1 Introduction 1 1.1 Introduction 1 Chapter 2 Background 3 2.1 LLVM 3 2.2 Auto-vectorization 4 2.2.1 Loop Vectorization 5 2.2.2 Superword Level Parallelism 5 2.3 SPEC CPU2006 7 2.3.1 464.h264ref 7 2.4 Arm 8 Chapter 3 Motivation 9 3.1 Suitability Test for SPEC CPU2006 9 3.2 Store Vectorization in SLP 10 3.3 Slice Selection 12 Chapter 4 Design and Implementation 16 4.1 Design 16 4.2 Implementation 18 4.2.1 Creating one-shot cost table 18 4.2.2 Finding optimal combination 19 4.2.3 Exploring the effects of doubling the vectorization factor 21 4.2.4 Handling affected instructions 22 Chapter 5 Evaluation and Discussion 24 5.1 Environmental Setup 24 5.2 Static Analysis 25 5.3 Dynamic Analysis 27 Chapter 6 Conclusion 29 6.1 Summary 29 6.2 Future Work 30 References 31 | - |
| dc.language.iso | en | - |
| dc.subject | LLVM | zh_TW |
| dc.subject | 自動向量化 | zh_TW |
| dc.subject | 超字組平行化 | zh_TW |
| dc.subject | 單指令多資料流 | zh_TW |
| dc.subject | LLVM | en |
| dc.subject | SIMD | en |
| dc.subject | SuperwordLevelParallelism(SLP) | en |
| dc.subject | Auto-vectorization | en |
| dc.title | 最佳化超字組平行化中儲存指令向量化的切片選擇策略 | zh_TW |
| dc.title | Optimizing Slice Selection Strategy for Store Vectorization in Superword Level Parallelism | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 關啟邦;洪鼎詠;黎士瑋;黃敬群 | zh_TW |
| dc.contributor.oralexamcommittee | Chi-Bang kuan;Ding-Yong Hong;Shih-Wei Li;Ching-Chun Huang | en |
| dc.subject.keyword | 單指令多資料流,超字組平行化,自動向量化,LLVM, | zh_TW |
| dc.subject.keyword | SIMD,SuperwordLevelParallelism(SLP),Auto-vectorization,LLVM, | en |
| dc.relation.page | 34 | - |
| dc.identifier.doi | 10.6342/NTU202404591 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-11-13 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf 未授權公開取用 | 1.33 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
