精確時間週期與執行驅動之圖形處理器模擬平臺

Lin-Chieh Shangkuan; 上官林傑

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32016

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲(Chia-Lin Yang)
dc.contributor.author	Lin-Chieh Shangkuan	en
dc.contributor.author	上官林傑	zh_TW
dc.date.accessioned	2021-06-13T03:28:08Z	-
dc.date.available	2007-07-31
dc.date.copyright	2006-07-31
dc.date.issued	2006
dc.date.submitted	2006-07-27
dc.identifier.citation	[1] Dan Ernst Todd Austin, Eric Larson, “Simplescalar: an infrastructure for computer system modeling,” Computer, vol. 35, pp. 59 – 67, Feb 2002. [2] NVIDIA, “Nvidia lightspeed memory architecture ii – breakthrough memory design for industry-leading performance,” NVIDIA Technical Brief, 2006. [3] ATI, “Radeon x1800 memory controller,” ATI Technology White Paper, 2005. [4] Tien-Fu Chen and Jean-Loup Baer, “Reducing memory latency via non-blocking and prefetching caches,” Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, pp. 51 – 61, 1992. [5] Jin-Ho Lee Min-Young Lee Seong-Uk Choi and Myoung-Soon Park, “Reducing cache conflicts in data cache prefetching,” ACM SIGARCH Computer Architecture News, vol. 22, pp. 71 – 77, 1994. [6] Jean-Loup Baer and Tien-Fu Chen, “Effective hardware-based data prefetching for high-performance processors,” IEEE Transactions on Computers, vol. 44, no. 5, pp. 609–623, May 1995. [7] Alexander C. Klaiber and Henry M. Levy, “An architecture for software-controlled data prefetching,” The 18th Annual International Symposium on Computer Architecture, pp. 43–53, 1991. [8] Homan Igehy and et al, “Prefetching in a texture cache architecture,” In Proceedings of the ACM SIGGRAPH / EUROGRAPHICS conference on Graphics Hardware, 1998. [9] J. Hirche. U. Kanus, G. Wetekam, “Voxelcache: A cache-based memory architecture for volume graphics,” Proceedings of the ACM SIGGRAPH / EUROGRAPHICS conference on Graphics Hardware, pp. 76 – 83, 2003. [10] Masahiro Fujita Yosuke Bando, Takahiro Saito,“Hexagonal storage scheme for interleaved frame buffers and textures,” Proceedings of the ACM SIG-GRAPH/EUROGRAPHICS conference on Graphics Hardware, pp. 33 – 40, 2005. [11] Simon Fenny, “Texture compression using low-frequency signal modulation,” Proceedings of the ACM SIGGRAPH / EUROGRAPHICS conference on Graphics Hardware, pp. 84 – 91, 2003. [12] Greg Humphreys and et al., “Chromium: A stream processing framework for interactive rendering on clusters,” SIGGRAPH, 2002. [13] J. W. Sheaffer, D. Luebke, and K. Skadron, “A flexible simulation framework for graphics architectures,” Graphics Hardware, 2004. [14] Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan, “Temperature-aware microarchitecture,” Proceedings of the 30th annual international symposium on Computer architecture, pp. 2–13, 2003. [15] J.W. Sheaffer, K. Skadron, and D.P. Luebke, “Studying thermal management for graphics-processor architectures,” IEEE International Symposium on Performance Analysis of Systems and Software, pp. 54–65, 2005. [16] Kekoa Proudfoot and Matthew Eldridge, “Glt: An opengl tracing utility,”http://graphics.stanford.edu/software/glt/, 2003. [17] Victor Moya and et al, “Shader performance analysis on a modern gpu architecture,” MICRO, 2005. [18] Victor Moya and et al, “Attila: A cycle-level execution-driven simulator for modern gpu architectures,” ISPASS, 2006. [19] Srihari Cadambi Bren Mochocki, Kanishka Lahiri, “Power analysis of mobile 3d graphics,” DATE, 2006. [20] David Luebke and et al, “Gpgpu: general purpose computation on graphics hardware,” Proceedings of the conference on SIGGRAPH 2004 course notes, 2004. [21] Jiawen Chen and et al, “A reconfigurable architecture for load-balanced rendering,” Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 71 – 80, 2005. [22] Alan Watt, 3D Computer Graphics, Addison Wesley, 3rd edition, 2000. [23] Dave Salvator, “Extremetech 3d pipeline tutorial,” http://www.extremetech.com/article2/0,1558,464440,00.asp, 2001. [24] Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens,“Memory access scheduling,” ACM SIGARCH Computer Architecture News, vol. 28, no. 2, pp. 128 – 138, May 2000. [25] David Wang and et al., “Dramsim: A memory system simulator,” ACM SIGARCH Computer Architecture News, vol. 33, no. 4, September 2005. [26] Samsung Electronics, “256mbit gddr sdram,” Samsung GDDR Datasheet, 2006. [27] Khronos Group, “Opengl es - the standard for embedded accelerated 3d graphics,” http://www.khronos.org/opengles/, 2006.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/32016	-
dc.description.abstract	圖形處理器（GPU）是為了加速圖形運算而被設計出來的，GPU上高度平行化的硬體結構讓它比中央處理器（CPU）能更有效地實作圖形顯示的演算法。由於GPU需要支援愈來愈複雜的功能，並且開發廠商尚未公開其詳細的硬體架構，使得目前GPU的設計變得更加難以評估。為了研究設計GPU，本篇論文提出了一個精確時間週期與執行驅動之圖形處理器模擬平臺，在這個平臺中，為了達到更準確的模擬，GPU模擬器的核心被設計成一個有管線的處理器，同時也有一個精確時間模型的記憶體系統。GPU模擬器核心藉著執行由一連串 OpenGL 函式所轉譯的圖形顯示指令，以精確週期的方式來模擬GPU的行為，而這些 OpenGL 的函式是從真實的 3D 遊戲（例如：Quake 3）裡擷取出來的。為了示範這個平臺的應用，本篇論文也作了一個關於圖形記憶體的研究，藉著套用不同的記憶體存取排程原則來分析對GPU效能造成的影響。而實驗的結果顯示，可適應的排程原則可以得到最佳的效能。	zh_TW
dc.description.abstract	Graphics processing unit (GPU) is designed for accelerating the graphics rendering manipulations. Their highly-parallel structure makes them more effective than CPUs for a range of graphics rendering algorithms. Modern GPUs become increasingly hard to evaluate because it needs to support more complex funcionts and the architecture details are not released by the GPU vendors. To study the GPU design, this thesis proposes a cycle-accurate, execution-driven GPU simulation framework. In this framework, the GPU simulator core is modeled as a pipelined processor and there is also a detailed timing-model of memory system within it for more accurate simulation. The GPU simulator executes rendering commands that are converted from the stream of OpenGL function calls and simulates the behaviours in a cycle-accurate fashion. The OpenGL trace is captured from real 3D games (e.g., Quake 3). To demonstrate the applicability of the framework, this thesis also introduces a study on graphics memory system. I analyze the performance effect by applying different memory access scheduling policies. The experimental results shows that an adaptive policy is the most effective.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T03:28:08Z (GMT). No. of bitstreams: 1 ntu-95-R93922066-1.pdf: 640704 bytes, checksum: 3d4770a5a4eb6f88c05f350e409c05f0 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	Abstract ii 中文摘要 iii 1 Introduction 1 2 Related Works 5 2.1 Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Simulation Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 General Purposed Programming . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 3D Graphics Rendering 10 3.1 3D Graphics Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Graphics API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Simulation Framework 18 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 The Simulator Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.1 Program Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Rendering Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2.3 Memory System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Experiments and Analysis 34 5.1 Environment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1.2 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Bottleneck Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Memory Access Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6 Conclusions 40
dc.language.iso	en
dc.title	精確時間週期與執行驅動之圖形處理器模擬平臺	zh_TW
dc.title	A Cycle-Accurate, Execution-Driven GPU Simulation Framework	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	莊永裕(Yung-Yu Chuang),洪士灝(Shih-Hao Hung),周承復(Cheng-Fu Chou)
dc.subject.keyword	圖形處理器,模擬器,	zh_TW
dc.subject.keyword	GPU,Simulator,	en
dc.relation.page	44
dc.rights.note	有償授權
dc.date.accepted	2006-07-28
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 目前未授權公開取用	625.69 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。