Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63815
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊佳玲(Chia-Lin Yang)
dc.contributor.authorChen-Ming Chungen
dc.contributor.author鐘振銘zh_TW
dc.date.accessioned2021-06-16T17:19:52Z-
dc.date.available2014-08-19
dc.date.copyright2012-08-19
dc.date.issued2012
dc.date.submitted2012-08-17
dc.identifier.citation[1] Imagination Technologies PowerVR Insider SDK.
[2] Unreal Technology.
[3] T. Austin, E. Larson, and D. Ernst. SimpleScalar: An Infrastructure for Com-
puter System Modeling. Computer, 35(2):59 {67, feb 2002.
[4] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt. Analyzing CUDA
Workloads Using a Detailed GPU Simulator. In Performance Analysis of Sys-
tems and Software, 2009. ISPASS 2009. IEEE International Symposium on,
pages 163 {174, april 2009.
[5] M. Chambers. NVIDIA GeForce3 Preview.
[6] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Shea er, S.-H. Lee, and
K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In
IISWC, pages 44{54. IEEE, 2009.
[7] B. W. Coon and J. E. Lindholm. Sysem and Method for Processing Thread
Groups in A SIMD Architecture, June 2007.
[8] B. W. Coon and J. E. Lindholm. System and Method for Managing Divergent
Threads in A SIMD Architecture, April 2008.
[9] S. P. E. Corporation. Specviewperf 11.
[10] V. del Barrio, C. Gonzalez, J. Roca, A. Fernandez, and E. E. ATTILA: A
Cycle-Level Execution-Driven Simulator for Modern GPU Architectures. In
Performance Analysis of Systems and Software, 2006 IEEE International Sym-
posium on, pages 231 { 241, march 2006.
[11] J. S. Donham, Christopher D. S.and Montrym and P. R. Marchand. US Patent
7565490:Out of Order Graphics L2 Cache, July 2009.
[12] S. Drone. Under the Hood: Revving up Shader Performance. Gamefest Un-
plugged (Europe), 2007.
[13] W. Fung, I. Sham, G. Yuan, and T. Aamodt. Dynamic Warp Formation and
Scheduling for E cient GPU Control Flow. In Microarchitecture, 2007. MICRO
2007. 40th Annual IEEE/ACM International Symposium on, pages 407 {420,
dec. 2007.
[14] M. Gebhart, D. R. Johnson, D. Tarjan, S. W. Keckler, W. J. Dally, E. Lindholm,
and K. Skadron. Energy-E cient Mechanisms for Managing Thread Context
in Throughput Processors. In ISCA, pages 235{246, 2011.
[15] G. Humphreys, M. Houston, R. Ng, R. Frank, S. Ahern, P. D. Kirchner, and
J. T. Klosowski. Chromium: A Stream-Processing Framework for Interactive
Rendering on Clusters. In Proceedings of the 29th annual conference on Com-
puter graphics and interactive techniques, SIGGRAPH '02, pages 693{702, New
York, NY, USA, 2002. ACM.
[16] S. V. I. Antochi, B. Juurlink and P. Liuha. Graalbench: A 3D Graphics Bench-
mark Suite for Mobile Phones. In In Conference on Languages, Compliers, and
Tools for Embedded Systems, 2004.
[17] K. F. I. Buck and P. Hanrahan. Gpubench: Evaluating gpu performance for
numerical and scienti c applications. In In ACM Workshop on General Purpose
Computing on Graphics Processors, 2004.
[18] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A
Uni ed Graphics and Computing Architecture. Micro, IEEE, 28(2):39 {55,
march-april 2008.
[19] C. W. Lo. A Cycle-Accurate Simulator for Modern GPU. Master's thesis,
National Taiwan University, Taiwan, 2010.
[20] J. Meng, D. Tarjan, and K. Skadron. Dynamic Warp Subdivision for Inte-
grated Branch and Memory Divergence Tolerance. In Proceedings of the 37th
ACM/IEEE International Symposium on Computer Architecture. ACM/IEEE,
Jun. 2010.
[21] A. L. Minkin and O. Rubinstein. US Patent 6629188:Circuit and Method for
Prefetching Data for A Texture Cache, September 2003.
[22] S. S. Moy and J. E. Lindholm. Across-Thread Out Of Order Instruction Dis-
patch in A Multithreaded Graphics Processor, December 2007.
[23] V. Narasiman, M. Shebanow, C. J. Lee, R. Miftakhutdinov, O. Mutlu, and
Y. N. Patt. Improving GPU Performance via Large Warps and Two-level Warp
Scheduling. In Proceedings of the 44th Annual IEEE/ACM International Sym-
posium on Microarchitecture, MICRO-44 '11, pages 308{317, New York, NY,
USA, 2011. ACM.
[24] NVIDIA. NVIDIA's New Generation CUDA Compute Architecture: Fermi.
[25] J. W. Shea er, D. Luebke, and K. Skadron. A Flexible Simulation
Framework for Graphics Architectures. In Proceedings of the ACM SIG-
GRAPH/EUROGRAPHICS conference on Graphics hardware, HWWS '04,
pages 85{94, New York, NY, USA, 2004. ACM.
[26] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and
D. Tarjan. Temperature-Aware Microarchitecture: Modeling and Implementa-
tion. ACM Trans. Archit. Code Optim., 1(1):94{125, Mar. 2004.
[27] N. Tatarchuk. Dynamic Parallax Occlusion Mapping with Approximate Soft
Shadows. In I3D '06: Proceedings of the 2006 symposium on Interactive 3D
graphics and games, pages 63{69, New York, NY, USA, 2006. ACM.
[28] Y. Uralsky and A. Ahmad. Soft Shadows. NVIDIA SDK White Paper, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63815-
dc.description.abstract現代圖形處理器提供了比集中式處理器更強大的運算能力,讓它在現今的研究獲得了更多的注意。在圖形處理器上,圖形應用程式已經不是唯一的應用程式,現在有許多有著高度平行度的一般應用程式原本執行在集中式處理器上的很適合執行在圖形處理器上,這類的工作被稱作一般目的圖形工作量,由於圖形化與一般目的的工作量注重的重點是不一樣的,它們不一定都可以在相同的架構上得到好處。然而,一個模擬這兩類應用程式的圖形化架構研究是非常少的,在我們先前的研究,我們已經有一個已週期為單位支援圖形程式語言的單指令多任務執行架構的處理器,單指令多任務執行架構的圖形處理器會將許多線程組成線程束來執行,而在這篇碩論,我們將擴展這個研究讓它可以支援一般目的的應用程式,有了一個能在同樣的圖形化處理器架構下執行此兩類的應用程式,我們去分析一些特性,包括動態指令混合、跳躍指令分散比、單指令多任務執行寬度之影響、同時執行線程束數量的影響以及線程束排程的影響等。zh_TW
dc.description.abstractModern Graphics Processing Units (GPUs) have obtained a lot of attention recently since they provide orders of magnitude more computing power than CPUs. Graphics application is not the only workloads for GPUs. General purpose applications with high data-parallelism are also suitable for GPUs, called GPGPU applications. Due to the different purpose of GPGPU and graphics application, they may not benefit from the same architectural design. Therefore, a simulation framework supporting both applications is mandatory for GPU architecture research. In our previous work, we presented a cycle-level simulation framework for modern GPUs that models the SIMT(Single-Instruction Multiple-Threads) execution pipeline and support graphics workloads (OpenGL ES). In this thesis, we extend the above work to support GPGPU applications as well. With the simulation framework, we conduct workload characterization for both graphics and GPGPU workloads, including dynamic instruction mixes, branch divergence ratio, effects of concurrent warps, SIMT width and warp scheduling policies.en
dc.description.provenanceMade available in DSpace on 2021-06-16T17:19:52Z (GMT). No. of bitstreams: 1
ntu-101-R99922128-1.pdf: 4306742 bytes, checksum: e05086edacbbfb5409ce9c3f53bcb97c (MD5)
Previous issue date: 2012
en
dc.description.tableofcontents1 Introduction 1
2 Related Works 4
2.1 GPU Simulation Frameworks . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Warp Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 GPU Overview 8
3.1 Single-Instruction Multiple-Thread (SIMT) Execution Model . . . . . 9
3.2 Microarchitecture of Stream Multiprocessor . . . . . . . . . . . . . . 11
3.2.1 Supports to Control-
ow Divergence . . . . . . . . . . . . . . 12
4 CUDA Simulation Framework 16
4.1 OpenGL ES Simulation Framework . . . . . . . . . . . . . . . . . . . 16
4.2 CUDA Simulation Framework . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 CUDA Interceptor . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 PTX Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.3 CUDA Sim-driver . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 SIMT-GPU Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 Experimental Setup 25
5.1 Hardware Conguration . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Experimental Results 28
6.1 Dynamic Instruction Distribution . . . . . . . . . . . . . . . . . . . . 29
6.2 Eects of Concurrent Warps . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Eects of SIMT Width . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.4 Eect of Warp Scheduling Policies . . . . . . . . . . . . . . . . . . . . 34
6.4.1 Latency Hiding Capabilities . . . . . . . . . . . . . . . . . . . 34
6.4.2 Memory Access Patterns . . . . . . . . . . . . . . . . . . . . . 35
6.4.3 Register File Usage . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Conclusion 42
Bibliography 43
dc.language.isoen
dc.subject圖形工作量zh_TW
dc.subject同時執行線程束數量zh_TW
dc.subject單指令多任務執行寬度zh_TW
dc.subject一般目的圖形工作量zh_TW
dc.subject線程束排程zh_TW
dc.subject繪圖晶片架構zh_TW
dc.subject單指令多任務執行模型zh_TW
dc.subjectSIMT widthen
dc.subjectWarp Schedulingen
dc.subjectGPGPU Benchmarksen
dc.subjectGraphics Workloadsen
dc.subjectSIMT execution modelen
dc.subjectGPU Architectureen
dc.subjectConcurrent Warpsen
dc.title分析單一指令多重執行緒執行在現代圖形處理器上的圖形及一般目的圖形處理器程序zh_TW
dc.titleSIMT Execution Analysis for Graphics and GPGPU applications on Modern GPUsen
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳維超,洪士灝
dc.subject.keyword繪圖晶片架構,線程束排程,一般目的圖形工作量,圖形工作量,單指令多任務執行模型,單指令多任務執行寬度,同時執行線程束數量,zh_TW
dc.subject.keywordGPU Architecture,Warp Scheduling,GPGPU Benchmarks,Graphics Workloads,SIMT execution model,SIMT width,Concurrent Warps,en
dc.relation.page45
dc.rights.note有償授權
dc.date.accepted2012-08-17
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
4.21 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved