Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74054
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐慰中(Wei-Chung Hsu)
dc.contributor.authorChen Weien
dc.contributor.author魏禛zh_TW
dc.date.accessioned2021-06-17T08:18:12Z-
dc.date.available2020-08-20
dc.date.copyright2019-08-20
dc.date.issued2019
dc.date.submitted2019-08-14
dc.identifier.citationHewlett-Packard Company,PA-RISC 1.1 Architecture and Instruction Set ReferenceManual, Feb. 1994.
Sun Microsystems,The VIS Instruction Set, Jun. 2002.
Intel Corporation,Intel 64 and IA-32 architectures software developer’s manualvolume 1: Basic architecture, Sep. 2015.
K. Asanović,Vector Extension Proposal,https : / / riscv . org / wp - content /uploads/2015/06/riscv-vector-workshop-june2015.pdf, 2nd RISC-V Work-shop Proceedings, 2015.
N. Stephens, S. Biles, M. Boettcher, J. Eapen, M. Eyole, G. Gabrielli, M. Horsnell,G. Magklis, A. Martinez, N. Prémillieu, A. Reid, A. Rico, and P. Walker, “The ARMscalable vector extension,”CoRR, vol. abs/1803.06185, 2018. arXiv:1803.06185.[Online]. Available:http://arxiv.org/abs/1803.06185.
ARM Limited,ARM architecture reference manual: ARMv8, for ARMv8-A archi-tectural profile, Sep. 2015.
R. M. Russell, “The CRAY-1 Computer System,”Commun. ACM, vol. 21, no. 1,pp. 63–72, Jan. 1978,issn: 0001-0782.doi:10.1145/359327.359336. [Online].Available:http://doi.acm.org/10.1145/359327.359336.
K. Asanović, “Vector microprocessors,” PhD thesis, University of California atBerkely, 1998.
R. Espasa, M. Valero, D. Padua, M. Jimenez, and E. Ayguade, “Quantitative analy-sis of vector code,” inProceedings Euromicro Workshop on Parallel and DistributedProcessing, Jan. 1995, pp. 452–461.doi:10.1109/EMPDP.1995.389176.
K. A. Robbins and S. Robbins, “Relationship between average and real memorybehavior,”The Journal of Supercomputing, vol. 8, no. 3, pp. 209–232, Nov. 1994,issn: 1573-0484.doi:10.1007/BF01204729. [Online]. Available:https://doi.org/10.1007/BF01204729.
R. Espasa and M. Valero, “Decoupled vector architectures,” inProceedings. SecondInternational Symposium on High-Performance Computer Architecture, Feb. 1996,pp. 281–290.doi:10.1109/HPCA.1996.501193.
R. Espasa, M. Valero, and J. E. Smith, “Out-of-order vector architectures,” inPro-ceedings of 30th Annual International Symposium on Microarchitecture, Dec. 1997,pp. 160–170.doi:10.1109/MICRO.1997.645807.
R. Espasa and M. Valero, “Multithreaded vector architectures,” inProceedings ThirdInternational Symposium on High-Performance Computer Architecture, Feb. 1997,pp. 237–248.doi:10.1109/HPCA.1997.569677.
P. B. Schneck, “The cdc star-100,” inSupercomputer Architecture. Boston, MA:Springer US, 1987, pp. 99–117,isbn: 978-1-4615-7957-1.doi:10.1007/978- 1-4615-7957-1_5. [Online]. Available:https://doi.org/10.1007/978-1-4615-7957-1_5.
D. Patterson and A. Waterman. (2017). SIMD instructions considered harmful, [On-line]. Available:https://www.sigarch.org/simd-instructions-considered-harmful/(visited on 09/18/2017).
ARM Limited. (2019). Porting and optimizing HPC applications for arm SVE, [On-line]. Available:https://developer.arm.com/docs/101726/0100.
RISC-V Vector Working Group,Risc-v vector extension draft (0.5),https : / /github.com/riscv/riscv-v-spec/tree/6cd8fba7, 2018.
F. Schuiki and M. Cavalcante,Ara: 64-bit risc-v vector implementation in 22nmfdsoi,https://content.riscv.org/wp-content/uploads/2018/12/Ara-64-bit-RISC-V-Vector-Implementation-in-22nm-FDSOI-Cavalcante-Schuiki.pdf, Inangural RISC-V Summit Proceedings, 2018.
G. Lemieux,Risc-v vector performance analysis,https://content.riscv.org/wp-content/uploads/2018/12/RISC- V- Vector- Performance- Analysis- Guy-Lemieux.pdf, Inangural RISC-V Summit Proceedings, 2018.
C. G. Lee, “Code optimizers and register organizations for vector architectures,”PhD thesis, University of California at Berkely, 1992.
Flang,https://github.com/flang-compiler/flang, 2017.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74054-
dc.description.abstract近幾年來,向量架構似乎逐漸死灰復燃。RISC-V 是一設計從微處理器及超級電腦都能適用之嶄新計算機架構,而讓效能可如此擴增的關鍵為其向量指令集。相較於多媒體單指令多資料流指令集架構,RISC-V 向量指令集允許已向量化程式被任何具不同向量長度之微架構所執行。雖然執行一個向量指令可能需要多個處理器週期,但鏈結微架構 (Vector Chaining) 也被隨之提出用來減緩指令發射延遲。
在本論文中,我們在一 RISC-V 處理器模擬器內打造了一克雷 (Cray) 風格之向量微架構。我們不僅評估了各種鏈結微架構情景之效能,也評估了不同處理器資源對鏈結微架構造成的影響。除此之外,我們發現符合特定條件之程式碼在經過適當的程式碼最佳化後,缺乏鏈結微架構之向量處理器仍可達到幾與具鏈結微架構之向量處理器相同的效能。最後,我們整理出了這些可以被程式碼最佳化與不可被最佳化的情景,並對真實的應用程式進行鏈結微架構影響之概略評估。
zh_TW
dc.description.abstractReemergence of vector architecture seems to be around the corner in recent years. RISC-V is a brand-new computer architecture designed to scale from low-power microcontrollers to high-performance supercomputers. The key to such a scalable performance is its vector extension. Compared to multimedia SIMD instruction set, RISC-V Vector Extension enables vectorized program could be run in any implementations with different vector lengths. Since a vector instruction may take multiple cycles, Vector Chaining is often implemented to allow the depending vector instruction to start execution as soon as possible.
In this thesis, we craft a Cray-style vector microarchitecture by utilizing an in-house cycle-accurate RISC-V CPU simulator. Then, we evaluate the performance tradeoffs between the full chaining and restricted chaining implementations and the impact of chaining across various vector processor configurations. Furthermore, in some scenarios, we find that implementations without chaining could have nearly the same performance as one with chaining via appropriate code optimizations. Finally, we identify those optimizable and unoptimizable scenarios and give an estimated performance evaluation of vector chaining impact on real applications.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T08:18:12Z (GMT). No. of bitstreams: 1
ntu-108-R06922135-1.pdf: 5088664 bytes, checksum: dd8d746adc50a0753a80b29087a8b500 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents1 Introduction: 1
2 Background: 5
2.1 Machine Parallelism: 5
2.1.1 Data-Level Parallelism: 5
2.1.2 Instruction-Level Parallelism: 6
2.2 Vector Architecture: 7
2.2.1 Conventional Vector Architecture: 7
2.2.2 Multimedia SIMD Architecture: 8
2.2.3 Vector-Length Agnostic Architecture: 8
2.3 Vector Chaining Mechanism: 9
3 Vector Processor Description: 11
3.1 RISC-V Vector Extension: 11
3.1.1 Vector Extension Overview: 12
3.1.2 An Example for Vector Extension: 12
3.1.3 Advantage of Vector Extension: 12
3.2 Vector Micro-architecture: 13
3.2.1 Overall Structure: 14
3.2.2 Vector Operation Unit: 15
3.2.3 Vector Function Unit: 15
3.2.4 Vector Register File: 16
3.2.5 Vector Memory Unit: 17
3.2.6 Vector Instruction Issue Queue: 18
3.3 Implementations of Vector Chaining Mechanism: 19
3.3.1 Vector Function Unit Chaining: 20
3.3.2 Memory Chaining: 21
4 Experimental Setup: 23
4.1 Vector Processor Design Space: 23
4.2 Microbenchmark: 24
4.2.1 MATRIX-ADD Kernel: 24
4.3 Loop Analysis Framework: 25
4.3.1 Instruction Parallelism: 26
4.3.2 Loop Unroll (And Jam) Availability: 26
4.3.3 Loop Iteration Size Instrumentation: 27
5 Exploit Instruction-Level Parallelism by Software: 28
5.1 Code Transformation: 28
5.1.1 Loop Unrolling: 29
5.1.2 Loop Unroll and Jam: 30
5.2 Code Scheduling: 30
6 Performance Evaluation: 33
6.1 Impact of Vector Chaining on Unoptimized Code: 33
6.1.1 Overall Comparison: 33
6.1.2 Memory Latency:35
6.1.3 Vector Execution Capability: 37
6.1.4 MSHR Resources: 37
6.2 Impact of Vector Chaining on Optimized Code: 38
6.2.1 Overall Comparison: 39
6.2.2 Loop Iteration Size: 40
7 Estimated Overall Performance Impact of Vector Chaining: 43
6.1 SPEC2017fp: 43
8 Concluding Remarks: 47
Reference: 48
dc.language.isoen
dc.subject中央處理器效能建模zh_TW
dc.subject計算機架構zh_TW
dc.subject編譯器最佳化zh_TW
dc.subject單指令多資料流zh_TW
dc.subject計算機微架構zh_TW
dc.subject向量架構zh_TW
dc.subjectCPU Performance Modelingen
dc.subjectVector Architectureen
dc.subjectComputer Microarchitectureen
dc.subjectSIMDen
dc.subjectCompiler Optimizationen
dc.subjectComputer Architectureen
dc.titleRISC-V 向量指令集鏈結微架構評估zh_TW
dc.titleEvaluation of Chaining Implementation on RISC-V Vector Extensionen
dc.typeThesis
dc.date.schoolyear107-2
dc.description.degree碩士
dc.contributor.oralexamcommittee洪鼎詠,吳真貞,張傳華,黃敬群,廖世偉
dc.subject.keyword計算機架構,向量架構,計算機微架構,單指令多資料流,編譯器最佳化,中央處理器效能建模,zh_TW
dc.subject.keywordComputer Architecture,Vector Architecture,Computer Microarchitecture,SIMD,Compiler Optimization,CPU Performance Modeling,en
dc.relation.page49
dc.identifier.doi10.6342/NTU201903397
dc.rights.note有償授權
dc.date.accepted2019-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
4.97 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved