RISC-V 向量指令集鏈結微架構評估

Chen Wei; 魏禛

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74054

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	徐慰中(Wei-Chung Hsu)
dc.contributor.author	Chen Wei	en
dc.contributor.author	魏禛	zh_TW
dc.date.accessioned	2021-06-17T08:18:12Z	-
dc.date.available	2020-08-20
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-14
dc.identifier.citation	Hewlett-Packard Company,PA-RISC 1.1 Architecture and Instruction Set ReferenceManual, Feb. 1994. Sun Microsystems,The VIS Instruction Set, Jun. 2002. Intel Corporation,Intel 64 and IA-32 architectures software developer’s manualvolume 1: Basic architecture, Sep. 2015. K. Asanović,Vector Extension Proposal,https : / / riscv . org / wp - content /uploads/2015/06/riscv-vector-workshop-june2015.pdf, 2nd RISC-V Work-shop Proceedings, 2015. N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell,G. Magklis, A. Martinez, N. Prémillieu, A. Reid, A. Rico, and P. Walker, “The ARMscalable vector extension,”CoRR, vol. abs/1803.06185, 2018. arXiv:1803.06185.[Online]. Available:http://arxiv.org/abs/1803.06185. ARM Limited,ARM architecture reference manual: ARMv8, for ARMv8-A archi-tectural profile, Sep. 2015. R. M. Russell, “The CRAY-1 Computer System,”Commun. ACM, vol. 21, no. 1,pp. 63–72, Jan. 1978,issn: 0001-0782.doi:10.1145/359327.359336. [Online].Available:http://doi.acm.org/10.1145/359327.359336. K. Asanović, “Vector microprocessors,” PhD thesis, University of California atBerkely, 1998. R. Espasa, M. Valero, D. Padua, M. Jimenez, and E. Ayguade, “Quantitative analy-sis of vector code,” inProceedings Euromicro Workshop on Parallel and DistributedProcessing, Jan. 1995, pp. 452–461.doi:10.1109/EMPDP.1995.389176. K. A. Robbins and S. Robbins, “Relationship between average and real memorybehavior,”The Journal of Supercomputing, vol. 8, no. 3, pp. 209–232, Nov. 1994,issn: 1573-0484.doi:10.1007/BF01204729. [Online]. Available:https://doi.org/10.1007/BF01204729. R. Espasa and M. Valero, “Decoupled vector architectures,” inProceedings. SecondInternational Symposium on High-Performance Computer Architecture, Feb. 1996,pp. 281–290.doi:10.1109/HPCA.1996.501193. R. Espasa, M. Valero, and J. E. Smith, “Out-of-order vector architectures,” inPro-ceedings of 30th Annual International Symposium on Microarchitecture, Dec. 1997,pp. 160–170.doi:10.1109/MICRO.1997.645807. R. Espasa and M. Valero, “Multithreaded vector architectures,” inProceedings ThirdInternational Symposium on High-Performance Computer Architecture, Feb. 1997,pp. 237–248.doi:10.1109/HPCA.1997.569677. P. B. Schneck, “The cdc star-100,” inSupercomputer Architecture. Boston, MA:Springer US, 1987, pp. 99–117,isbn: 978-1-4615-7957-1.doi:10.1007/978- 1-4615-7957-1_5. [Online]. Available:https://doi.org/10.1007/978-1-4615-7957-1_5. D. Patterson and A. Waterman. (2017). SIMD instructions considered harmful, [On-line]. Available:https://www.sigarch.org/simd-instructions-considered-harmful/(visited on 09/18/2017). ARM Limited. (2019). Porting and optimizing HPC applications for arm SVE, [On-line]. Available:https://developer.arm.com/docs/101726/0100. RISC-V Vector Working Group,Risc-v vector extension draft (0.5),https : / /github.com/riscv/riscv-v-spec/tree/6cd8fba7, 2018. F. Schuiki and M. Cavalcante,Ara: 64-bit risc-v vector implementation in 22nmfdsoi,https://content.riscv.org/wp-content/uploads/2018/12/Ara-64-bit-RISC-V-Vector-Implementation-in-22nm-FDSOI-Cavalcante-Schuiki.pdf, Inangural RISC-V Summit Proceedings, 2018. G. Lemieux,Risc-v vector performance analysis,https://content.riscv.org/wp-content/uploads/2018/12/RISC- V- Vector- Performance- Analysis- Guy-Lemieux.pdf, Inangural RISC-V Summit Proceedings, 2018. C. G. Lee, “Code optimizers and register organizations for vector architectures,”PhD thesis, University of California at Berkely, 1992. Flang,https://github.com/flang-compiler/flang, 2017.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74054	-
dc.description.abstract	近幾年來，向量架構似乎逐漸死灰復燃。RISC-V 是一設計從微處理器及超級電腦都能適用之嶄新計算機架構，而讓效能可如此擴增的關鍵為其向量指令集。相較於多媒體單指令多資料流指令集架構，RISC-V 向量指令集允許已向量化程式被任何具不同向量長度之微架構所執行。雖然執行一個向量指令可能需要多個處理器週期，但鏈結微架構 (Vector Chaining) 也被隨之提出用來減緩指令發射延遲。在本論文中，我們在一 RISC-V 處理器模擬器內打造了一克雷 (Cray) 風格之向量微架構。我們不僅評估了各種鏈結微架構情景之效能，也評估了不同處理器資源對鏈結微架構造成的影響。除此之外，我們發現符合特定條件之程式碼在經過適當的程式碼最佳化後，缺乏鏈結微架構之向量處理器仍可達到幾與具鏈結微架構之向量處理器相同的效能。最後，我們整理出了這些可以被程式碼最佳化與不可被最佳化的情景，並對真實的應用程式進行鏈結微架構影響之概略評估。	zh_TW
dc.description.abstract	Reemergence of vector architecture seems to be around the corner in recent years. RISC-V is a brand-new computer architecture designed to scale from low-power microcontrollers to high-performance supercomputers. The key to such a scalable performance is its vector extension. Compared to multimedia SIMD instruction set, RISC-V Vector Extension enables vectorized program could be run in any implementations with different vector lengths. Since a vector instruction may take multiple cycles, Vector Chaining is often implemented to allow the depending vector instruction to start execution as soon as possible. In this thesis, we craft a Cray-style vector microarchitecture by utilizing an in-house cycle-accurate RISC-V CPU simulator. Then, we evaluate the performance tradeoffs between the full chaining and restricted chaining implementations and the impact of chaining across various vector processor configurations. Furthermore, in some scenarios, we find that implementations without chaining could have nearly the same performance as one with chaining via appropriate code optimizations. Finally, we identify those optimizable and unoptimizable scenarios and give an estimated performance evaluation of vector chaining impact on real applications.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:18:12Z (GMT). No. of bitstreams: 1 ntu-108-R06922135-1.pdf: 5088664 bytes, checksum: dd8d746adc50a0753a80b29087a8b500 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	1 Introduction: 1 2 Background: 5 2.1 Machine Parallelism: 5 2.1.1 Data-Level Parallelism: 5 2.1.2 Instruction-Level Parallelism: 6 2.2 Vector Architecture: 7 2.2.1 Conventional Vector Architecture: 7 2.2.2 Multimedia SIMD Architecture: 8 2.2.3 Vector-Length Agnostic Architecture: 8 2.3 Vector Chaining Mechanism: 9 3 Vector Processor Description: 11 3.1 RISC-V Vector Extension: 11 3.1.1 Vector Extension Overview: 12 3.1.2 An Example for Vector Extension: 12 3.1.3 Advantage of Vector Extension: 12 3.2 Vector Micro-architecture: 13 3.2.1 Overall Structure: 14 3.2.2 Vector Operation Unit: 15 3.2.3 Vector Function Unit: 15 3.2.4 Vector Register File: 16 3.2.5 Vector Memory Unit: 17 3.2.6 Vector Instruction Issue Queue: 18 3.3 Implementations of Vector Chaining Mechanism: 19 3.3.1 Vector Function Unit Chaining: 20 3.3.2 Memory Chaining: 21 4 Experimental Setup: 23 4.1 Vector Processor Design Space: 23 4.2 Microbenchmark: 24 4.2.1 MATRIX-ADD Kernel: 24 4.3 Loop Analysis Framework: 25 4.3.1 Instruction Parallelism: 26 4.3.2 Loop Unroll (And Jam) Availability: 26 4.3.3 Loop Iteration Size Instrumentation: 27 5 Exploit Instruction-Level Parallelism by Software: 28 5.1 Code Transformation: 28 5.1.1 Loop Unrolling: 29 5.1.2 Loop Unroll and Jam: 30 5.2 Code Scheduling: 30 6 Performance Evaluation: 33 6.1 Impact of Vector Chaining on Unoptimized Code: 33 6.1.1 Overall Comparison: 33 6.1.2 Memory Latency:35 6.1.3 Vector Execution Capability: 37 6.1.4 MSHR Resources: 37 6.2 Impact of Vector Chaining on Optimized Code: 38 6.2.1 Overall Comparison: 39 6.2.2 Loop Iteration Size: 40 7 Estimated Overall Performance Impact of Vector Chaining: 43 6.1 SPEC2017fp: 43 8 Concluding Remarks: 47 Reference: 48
dc.language.iso	en
dc.subject	中央處理器效能建模	zh_TW
dc.subject	計算機架構	zh_TW
dc.subject	編譯器最佳化	zh_TW
dc.subject	單指令多資料流	zh_TW
dc.subject	計算機微架構	zh_TW
dc.subject	向量架構	zh_TW
dc.subject	CPU Performance Modeling	en
dc.subject	Vector Architecture	en
dc.subject	Computer Microarchitecture	en
dc.subject	SIMD	en
dc.subject	Compiler Optimization	en
dc.subject	Computer Architecture	en
dc.title	RISC-V 向量指令集鏈結微架構評估	zh_TW
dc.title	Evaluation of Chaining Implementation on RISC-V Vector Extension	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	洪鼎詠,吳真貞,張傳華,黃敬群,廖世偉
dc.subject.keyword	計算機架構,向量架構,計算機微架構,單指令多資料流,編譯器最佳化,中央處理器效能建模,	zh_TW
dc.subject.keyword	Computer Architecture,Vector Architecture,Computer Microarchitecture,SIMD,Compiler Optimization,CPU Performance Modeling,	en
dc.relation.page	49
dc.identifier.doi	10.6342/NTU201903397
dc.rights.note	有償授權
dc.date.accepted	2019-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	4.97 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。