Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96172
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor廖世偉zh_TW
dc.contributor.advisorShih-Wei Liaoen
dc.contributor.author陳至成zh_TW
dc.contributor.authorChih-Cheng Chenen
dc.date.accessioned2024-11-19T16:09:28Z-
dc.date.available2024-11-20-
dc.date.copyright2024-11-19-
dc.date.issued2024-
dc.date.submitted2024-11-13-
dc.identifier.citation[1] J. R. Allen and K. Kennedy. Pfc: A program to convert fortran to parallel form.1982.
[2] R. Allen and K. Kennedy. Automatic translation of fortran programs to vector form.ACM Transactions on Programming Languages and Systems (TOPLAS), 9(4):491–542, 1987.
[3] R. Bellman. Dynamic programming. science, 153(3731):34–37, 1966.
[4] M. S. Chauhan. Review of llvm compiler architecture enhancements for cuda. AsianJournal of Computer and Information Systems, 4(1), 2016.
[5] Y. Chen, C. Mendis, M. Carbin, and S. Amarasinghe. Vegen: a vectorizer generatorfor simd and beyond. In Proceedings of the 26th ACM International Conference onArchitectural Support for Programming Languages and Operating Systems, pages902–914, 2021.
[6] L. Dagum and R. Menon. Openmp: an industry standard api for shared-memoryprogramming. IEEE Computational Science and Engineering, 5(1):46–55, 1998.
[7] J. G. Feng, Y. P. He, and Q. M. Tao. Evaluation of compilers'capability ofautomatic vectorization based on source code analysis. Scientific Programming,2021(1):3264624, 2021.
[8] J. L. Henning. Spec cpu2006 benchmark descriptions. ACM SIGARCH ComputerArchitecture News, 34(4):1–17, 2006.
[9] L. Kalms, T. Hebbeler, and D. Göhringer. Automatic opencl code generation fromllvm-ir using polyhedral optimization. In Proceedings of the 9th Workshop and7th Workshop on Parallel Programming and RunTime Management Techniquesfor Manycore Architectures and Design Tools and Architectures for MulticoreEmbedded Computing Platforms, pages 45–50, 2018.
[10] R. Kumar, A. Martinez, and A. Gonzalez. A variable vector length simd architecturefor hw/sw co-designed processors. arXiv preprint arXiv:2102.13410, 2021.
[11] S. Larsen and S. Amarasinghe. Exploiting superword level parallelism with multi-media instruction sets. Acm Sigplan Notices, 35(5):145–156, 2000.
[12] C. Lattner and V. Adve. Llvm: a compilation framework for lifelong programanalysis transformation. In International Symposium on Code Generation andOptimization, 2004. CGO 2004., pages 75–86, 2004.
[13] C. R. Lazo, E. Reggiani, C. R. Morales, R. F. Bagué, L. A. V. Vargas, M. A. R.Salinas, M. V. Cortés, O. S. Ünsal, and A. Cristal. Adaptable register file organizationfor vector processors. In 2022 IEEE International Symposium on High-PerformanceComputer Architecture (HPCA), pages 786–799. IEEE, 2022.
[14] C. Mendis and S. Amarasinghe. goslp: globally optimized superword levelparallelism framework. Proceedings of the ACM on Programming Languages,2(OOPSLA):1–28, 2018.
[15] C. Mendis, C. Yang, Y. Pu, D. S. Amarasinghe, and M. Carbin. Compiler auto-vectorization with imitation learning. Advances in Neural Information ProcessingSystems, 32, 2019.
[16] D. Naishlos. Autovectorization in gcc. In Proceedings of the 2004 GCC developerssummit, pages 105–118. Citeseer, 2004.
[17] D. Nuzman and R. Henderson. Multi-platform auto-vectorization. In InternationalSymposium on Code Generation and Optimization (CGO’06), pages 11–pp. IEEE,2006.
[18] D. Nuzman, I. Rosen, and A. Zaks. Auto-vectorization of interleaved data for simd.ACM SIGPLAN Notices, 41(6):132–143, 2006.
[19] M. Perotti, M. Cavalcante, N. Wistoff, R. Andri, L. Cavigelli, and L. Benini. A“newara"for vector computing: An open source highly efficient risc-v v 1.0 vector pro-cessor design. In 2022 IEEE 33rd International Conference on Application-specificSystems, Architectures and Processors (ASAP), pages 43–51. IEEE, 2022.
[20] V. Porpodas, R. C. Rocha, E. Brevnov, L. F. Góes, and T. Mattson. Super-node slp:Optimized vectorization for code sequences containing operators and their inverseelements. In 2019 IEEE/ACM International Symposium on Code Generation andOptimization (CGO), pages 206–216. IEEE, 2019.
[21] V. Porpodas, R. C. Rocha, and L. F. Góes. Look-ahead slp: Auto-vectorization inthe presence of commutative operations. In Proceedings of the 2018 InternationalSymposium on Code Generation and Optimization, pages 163–174, 2018.
[22] V. Porpodas, R. C. Rocha, and L. F. Góes. Vw-slp: auto-vectorization with adap-tive vector width. In Proceedings of the 27th International Conference on ParallelArchitectures and Compilation Techniques, pages 1–15, 2018.
[23] I. Rosen, D. Nuzman, and A. Zaks. Loop-aware slp in gcc. In GCC DevelopersSummit, pages 131–142, 2007.
[24] N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell,G. Magklis, A. Martinez, N. Premillieu, et al. The arm scalable vector extension.IEEE micro, 37(2):26–39, 2017.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96172-
dc.description.abstract在現今由資料驅動計算的時代,處理大型資料集的需求日漸增長,使得程序運行效率成為研究的重點。當代的處理器普遍搭載了單指令多資料流(SIMD)處理單元,並且多數指令集架構也都支持各自的向量擴展指令集。例如,RISC-V支援RVV,而Arm架構則支援Neon和SVE。
自動向量化是編譯器的優化技術之一,它能夠自動地在編譯過程中將純量指令轉換為向量指令,讓開發者能夠利用向量處理單元的性能潛力,同時減少開發者撰寫程式的負擔。
本研究旨在透過優化儲存指令向量化的切片選擇策略,來增強LLVM中實作的超字組平行化(SLP),從而發掘更多潛在的向量化機會。此外,還在Arm處理器上進行了效能模擬,以檢驗此設計的實際增益。
zh_TW
dc.description.abstractIn the current era of data-driven computing, the need for processing large datasets has grown exponentially, making program execution efficiency a critical focus of research. Most processors are equipped with Single Instruction, Multiple Data (SIMD) units, and modern instruction set architectures also support vector extensions, such as RVV for RISC-V, and Neon and SVE for Arm.
Auto-vectorization, a compiler optimization technique, transforms scalar code into vector instructions during compilation, allowing developers to fully exploit the performance potential of vector processing units while minimizing manual effort.
This study aims to enhance the Superword Level Parallelism (SLP) auto-vectorization implemented in the LLVM compiler by optimizing slice selection for Store vectorization, thereby uncovering more potential vectorization opportunities. Additionally, performance simulations were conducted on Arm processors to verify the practical benefits of the optimized algorithm.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-11-19T16:09:28Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2024-11-19T16:09:28Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents摘要 i
Abstract ii
Contents iv
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Introduction 1
Chapter 2 Background 3
2.1 LLVM 3
2.2 Auto-vectorization 4
2.2.1 Loop Vectorization 5
2.2.2 Superword Level Parallelism 5
2.3 SPEC CPU2006 7
2.3.1 464.h264ref 7
2.4 Arm 8
Chapter 3 Motivation 9
3.1 Suitability Test for SPEC CPU2006 9
3.2 Store Vectorization in SLP 10
3.3 Slice Selection 12
Chapter 4 Design and Implementation 16
4.1 Design 16
4.2 Implementation 18
4.2.1 Creating one-shot cost table 18
4.2.2 Finding optimal combination 19
4.2.3 Exploring the effects of doubling the vectorization factor 21
4.2.4 Handling affected instructions 22
Chapter 5 Evaluation and Discussion 24
5.1 Environmental Setup 24
5.2 Static Analysis 25
5.3 Dynamic Analysis 27
Chapter 6 Conclusion 29
6.1 Summary 29
6.2 Future Work 30
References 31
-
dc.language.isoen-
dc.subjectLLVMzh_TW
dc.subject自動向量化zh_TW
dc.subject超字組平行化zh_TW
dc.subject單指令多資料流zh_TW
dc.subjectLLVMen
dc.subjectSIMDen
dc.subjectSuperwordLevelParallelism(SLP)en
dc.subjectAuto-vectorizationen
dc.title最佳化超字組平行化中儲存指令向量化的切片選擇策略zh_TW
dc.titleOptimizing Slice Selection Strategy for Store Vectorization in Superword Level Parallelismen
dc.typeThesis-
dc.date.schoolyear113-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee關啟邦;洪鼎詠;黎士瑋;黃敬群zh_TW
dc.contributor.oralexamcommitteeChi-Bang kuan;Ding-Yong Hong;Shih-Wei Li;Ching-Chun Huangen
dc.subject.keyword單指令多資料流,超字組平行化,自動向量化,LLVM,zh_TW
dc.subject.keywordSIMD,SuperwordLevelParallelism(SLP),Auto-vectorization,LLVM,en
dc.relation.page34-
dc.identifier.doi10.6342/NTU202404591-
dc.rights.note未授權-
dc.date.accepted2024-11-13-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept資訊工程學系-
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-1.pdf
  未授權公開取用
1.33 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved