適用於三維繪圖系統之頂點與像素通用著色處理器
之硬體架構設計與實現

Yu-Cheng Lin; 林昱呈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28026

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Yu-Cheng Lin	en
dc.contributor.author	林昱呈	zh_TW
dc.date.accessioned	2021-06-12T18:34:14Z	-
dc.date.available	2007-08-03
dc.date.copyright	2007-08-03
dc.date.issued	2007
dc.date.submitted	2007-08-01
dc.identifier.citation	[1] http://www.eecs.mit.edu/100th/images/Whirlwind-op.ctrl-site.html/. [2] I. E. Sutherland, Sketchpad, a Man-Machine Graphical Communication System, Ph.D. thesis, Massachusetts Insititute of Technology, January 1963. [3] http://www.pong-story.com/. [4] http://www.opengl.org/. [5] http://www.sgi.com/. [6] http://www.idsoftware.com/. [7] http://www.nvidia.com/. [8] http://www.extremetech.com/article2/0,1697,1154775,00.asp. [9] B.-T. Phong, “Illumination for computer generated pictures,” Commun. ACM, vol. 18, no. 6, pp. 311–317, 1975. [10] T. Whitted, “A scan line algorithm for computer display of curved surfaces,” in Proceedings of the 5th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’78), New York, NY, USA, 1978, p. 26, ACM Press. [11] E. Catmull, A Subdivision Algorithm for Computer Display of Curved Surfaces, Ph.D. thesis, University of Utah, December 1974. [12] http://www.gamedev.net/reference/articles/article1820.asp. [13] C. Maughan and M. Wloka, Vertex Shader Introduction. [14] http://www.microsoft.com/windows/directx/default.mspx/. [15] E. Lindholm, M. J. Kligard, and H. Moreton, “A user-programmable vertex engine,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques (SIGGRAPH ’01), New York, NY, USA, 2001, pp. 149–158, ACM Press. [16] W. F. Engel, Ed., Direct3D ShaderX: Vertex and Pixel Shader Tips and Tricks, Wordware Publishing, Inc., 2002. [17] M. D. McCool, J. Ang, and A. Ahmad, “Homomorphic factorization of BRDFs for high-performance rendering,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques( SIGGRAPH ’01), New York, NY, USA, 2001, pp. 171–178, ACM Press. [18] Philippe Beaudoin and Juan Guardado, ”Non-integer Power Function on the Pixel Shader”, ShaderX, Wordware Inc., second edition, 2002. [19] Philippe Beaudoin and Juan Guardado, ”Non-Photorealistic Rendering with Pixel and Vertex Shaders”, ShaderX, Wordware Inc., 2002. [20] T. Akenine-Mぴoller and E. Haines, Real-time rendering, A K Peters, Ltd., second edition, 2002. [21] C. Charles, “The poly pipeline,” http://www.cbloom.com/3d/index.html, July 2000. [22] Jiawen Chen, Michael I. Gordon, William Thies, Matthias Zwicker, Kari Pulli, and Fredo Durand, “A reconfigurable architecture for load-balanced rendering,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (HWWS ’05), Aire-la-Ville, Switzerland, Switzerland, 2005, pp. 71 – 80, Eurographics Association. [23] http://www.opencores.org/. [24] http://www.iaalab.ncku.edu.tw/iceer2005/Form/PaperFile/20-001.pdf. [25] V. Moya, C. Gonzalez, J. Roca, A. Fernandez, and R. Espasa, “Shader performance analysis on a modern gpu architecture,” in Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’05), Nov 2005. [26] http://msdn.microsoft.com/. [27] Microsoft Meltdown 2003, DirectX Next Slides. [28] John Kessenich, “The OpenGL ES shading language,” http://www.opengl.org. [29] M. Kameyama, Y. Kato, H. Fujimoto, H. Negishi, Y. Kodama, Y. Inoue, and H. Kawai, “3D graphics LSI core for mobile phone “Z3D”,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (HWWS ’03), Aire-la-Ville, Switzerland, Switzerland, 2003, pp. 60– 67, Eurographics Association. [30] J.-H. Sohn, R. Woo, and H.-J. Yoo, “A programmable vertex shader with fixed-point SIMD datapath for low power wireless applications,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware (HWWS ’04), New York, NY, USA, 2004, pp. 107–114, ACM Press. [31] F. Arakawa, T. Yoshinaga, T. Hayashi, Y. Kiyoshige, T. Okada, M. Nishibori, T. Hiraoka, M. Ozawa, T. Kodama, T. Irita, T. Kamei, M. Ishikawa, Y. Nitta, O. Nishii, and T. Hattori, “An embedded processor core for consumer appliances with 2.8GFLOPS and 36M polygons/s FPU,” in Proceedings of Digest of Technical Papers of the 2004 IEEE International Solid-State Circuits Conference (ISSCC 2004), Feb 2004, pp. 334–531 Vol.1. [32] J.-H. Sohn, J.-H.Woo, M.-W. Lee, H.-J. Kim, R. Woo, and H.-J. Yoo, “A 50 Mvertices/s graphics processor with fixed-point programmable vertex shader for mobile applications,” in Proceedings of Digest of Technical Papers of the 2005 IEEE International Solid-State Circuits Conference (ISSCC 2005), Feb 2005, pp. 192–592 Vol. 1. [33] Y.-M. Tsao, S.-Y. Chien, C.-H. Chang, C.-J. Lian, and L.-G. Chen, “Low power programmable shader with efficient graphics and video acceleration capabilities for mobile multimedia applications,” in Proceedings of Digest of Technical Papers of the 2006 International Conference on Consumer Electronics (ICCE 2006), Jan 2006, pp. 395–396. [34] http://www.synopsys.com/. [35] http://www.mentor.com/. [36] Chang-Hyo Yu, Kyusik Chung, Donghyun Kim, and Lee-Sup Kim, “A 120Mvertices/s multi-threaded VLIW vertex processor for mobile multimedia applications,” in Proceedings of Digest of Technical Papers of the 2006 IEEE International Solid-State Circuits Conference (ISSCC 2006), 2006. [37] Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong, Euljoo Jeong, Suk Joong Lee, and Hoi-Jun Yoo, “A 152mW/195mW multimedia processor with MPEG/H.264/JPEG and fully programmable 3D graphics for mobile applications,” in Proceedings of Digest of Technical Papers of the 2007 IEEE International Solid-State Circuits Conference (ISSCC 2007), 2007. [38] Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo, “A 52.4mW 3D graphics processor with 141Mvertices/s vertex shader and 3 power domains of dynamic voltage and frequency scaling,” in Proceedings of Digest of Technical Papers of the 2007 IEEE International Solid- State Circuits Conference (ISSCC 2007), 2007.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28026	-
dc.description.abstract	隨著電腦與娛樂技術的日益進步，人們對電腦繪圖的要求越來越高，希望電腦繪圖的效果能越逼真越好。另外一方面，手機的發展日趨進步，由電腦繪圖產生的特效也漸漸成為高階手機中的必備需求。然而，目前的硬體多為了桌上型平台所設計，著重於效能的表現，但是其功率消耗過大並不適合用於手持式的裝置上面。而少數專為手持式裝置設計的繪圖處理器又大多效能不高。手持裝置上受限的硬體資源和電力成為這項發展的重大限制。因此低功率且在合理成本下達到高效能之繪圖加速器便是具有價值之研究題目。本論文觀察到傳統電腦圖學管線中頂點著色處理器與像素著色處理器的負載不平衡情形，考慮到其指令的相似性頗高，提出了頂點與像素通用著色處理器的硬體架構，在硬體成本增加不多的條件下，大大提高了繪圖處理器的處理效能。本篇論文在硬體架構上有三項重要的貢獻：一、頂點與像素通用著色處理器，具有在執行期間隨著狀況不同而動態做出適當調整的能力，在負載不平衡的情況下，大幅提高硬體的利用率；二、可配置的記憶體陣列，可以依照應用的不同改變輸入及輸出快取的分配；三、低功率技術。低功率技術可以分成兩方面：轉置後提前丟棄和時脈閘。轉置後提前丟棄之技術可以將三維繪圖中三角形的資訊加入到以頂點為根本之頂點上色器中。應用此技術之下，多餘之打光運算可被去除進而降低功耗。在指令層級做時脈閘可以將不需要用到的邏輯閘及暫存器關閉，節省不必要之電源浪費。本論文提出之特點已經由實做之晶片驗證。實做之結果顯示結合上面所提到的各項架構上的優點，總共可以節省超過40%之處理時間。原型晶片利用聯電90nm技術製成，面積為3.500×3.500mm2。其處理速度為每秒200百萬頂點以及200百萬像素，等同於每秒64億浮點數運算。	zh_TW
dc.description.abstract	3D graphics technology, which is developed since 1960s, is widely used in animations, games, and user interfaces. For real-time graphics applications, Graphics Processing Units (GPUs) are now mainly designed for the desk-top environments. In recent years, there are two important migrations in graphics accelerators. The first one is that the fixed-function pipeline in the early days is now gradually replaced by the programmable pipeline, shader pipeline. The shader pipeline provides the artists and programmers freedom to program the GPU, and extraordinary graphic effects are emerging in an endless stream. The second important migration is that graphics accelerators for mobile devices become more and more important. Powerful graphics functions are going to be integrated in hand-held devices to provide users better user interface and portable gaming environments. The limited resources on a mobile devices, including hardware resource and energy resource, cause the major drawback to provide 3D graphic capability on the handheld devices. Several low-power low-cost solutions have been proposed in these years with low performance. A more efficient solution, where the computing, memory, and power resources should be effectively allocated, is still required. In this thesis, low-power cost-efficient yet high performance universal vertex/pixel shaders, which are used to replace the vertex shader and the pixel shader in the traditional programmable pipeline, are proposed. There are three major contributions in hardware architecture in this thesis. First, the universal vertex/pixel shader, which unifies the functions of the vertex shader as well as the pixel shader and has the ability to make adaptive execution-time resource allocation based on the different scenarios, is proposed to solve the load-imbalance problems. Second, the configurable memory array (CMA) can be used as input/output vertex cache and can change the configurations dynamically to keep the memory usage efficiently for different applications. Finally, many low power design techniques are also proposed. The main low power techniques applied are early rejection after transformation (ERAT) and gated clock. The ERAT technique analyzes the contents of transformed primitives to avoid redundant lighting computation in order to reduce power consumption of the shaders. Instruction level gated clock can be achieved from the operation (OP) and the active vector codes. The clock of those data registers of the un-issued PEs are gated for saving dynamic power. The unused vector pipeline would be turned off and gated to save power. The proposed design techniques are verified by real implementation. Implementation results show that over 40 percent processing time could be saved with all the architecture advantages mentioned above. The prototype chip is fabricated by UMC 90nm technology. The die size is 3.500×3.500mm2. It is capable of processing 200 mega vertices per second and 200 mega pixels per second, which is equivalent to 6.4 giga floating point operations per second. The power consumption is 10.75mW in the worst case when the chip works at 200MHz.	en
dc.description.provenance	Made available in DSpace on 2021-06-12T18:34:14Z (GMT). No. of bitstreams: 1 ntu-96-R94943020-1.pdf: 1275810 bytes, checksum: ebe830b424768e4f4ae720b2c878cb3c (MD5) Previous issue date: 2007	en
dc.description.tableofcontents	1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 The History of Computer Graphics . . . . . . . . . . . . . 1 1.1.2 Basic Concept of Computer Graphics . . . . . . . . . . . 3 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Algorithm of Rasterization Graphics Pipeline 7 2.1 Application/Scene . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Geometry Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Transformation . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Illumination . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Triangle Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Render Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3 Overview of Vertex Shader and Pixel Shader 17 3.1 High Level View of Vertex Shader Architecture . . . . . . . . . . 19 3.2 High Level View of Pixel Shader Architecture . . . . . . . . . . . 22 3.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 23 4 Hardware System Overview and Architecture Design 25 4.1 Hardware System Overview . . . . . . . . . . . . . . . . . . . . 25 4.2 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Instruction Sets of the Universal Vertex/Pixel Shader . . . . . . . 28 4.3.1 SIMD Instructions . . . . . . . . . . . . . . . . . . . . . 30 4.3.2 Scalar Instructions . . . . . . . . . . . . . . . . . . . . . 31 4.3.3 Flow Control Instructions . . . . . . . . . . . . . . . . . 31 4.3.4 Changing-Thread Instructions . . . . . . . . . . . . . . . 32 4.4 Input Stream Processing . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.2 Front Pixel Merging . . . . . . . . . . . . . . . . . . . . 33 4.5 FIFO and Tag Pool . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 Vertex Cache Control Mechanism . . . . . . . . . . . . . . . . . 34 4.7 Universal Vertex/Pixel Shader . . . . . . . . . . . . . . . . . . . 37 4.7.1 Universal Shader Architecture Overview . . . . . . . . . 37 4.7.2 Thread Controller . . . . . . . . . . . . . . . . . . . . . . 38 4.7.3 Instruction Decoder and Program Counter Stage . . . . . 40 4.7.4 Execution Stage . . . . . . . . . . . . . . . . . . . . . . 43 4.7.5 Write Back Stage . . . . . . . . . . . . . . . . . . . . . . 46 4.8 Instruction Query Controller . . . . . . . . . . . . . . . . . . . . 46 4.9 Configurable Memory Array . . . . . . . . . . . . . . . . . . . . 47 4.10 Output Stream Processing . . . . . . . . . . . . . . . . . . . . . . 51 4.11 Early Rejection After Transformation . . . . . . . . . . . . . . . 51 4.12 Gated Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 Experimental Results 55 5.1 Throughput Advancement With Universal Shader Architecture . . 55 5.2 Comparison of Different CMA Configurations . . . . . . . . . . . 57 5.3 ERAT Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.4 Result Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6 Implementation 63 6.1 Chip Implementation . . . . . . . . . . . . . . . . . . . . . . . . 63 6.1.1 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . 63 6.1.2 Test Consideration . . . . . . . . . . . . . . . . . . . . . 68 6.1.3 Chip Layout and Specification . . . . . . . . . . . . . . . 68 6.2 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 7 Conclusion 73
dc.language.iso	en
dc.title	適用於三維繪圖系統之頂點與像素通用著色處理器之硬體架構設計與實現	zh_TW
dc.title	Hardware Architecture Design and Implementation of Universal Vertex/Pixel Shader for 3D Graphics System	en
dc.type	Thesis
dc.date.schoolyear	95-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	莊永裕,梁伯嵩,陳炳宇,張雲南
dc.subject.keyword	頂點與像素通用著色處理器,電腦圖學,可程式化處理器,	zh_TW
dc.subject.keyword	universal vertex/pixel shader,graphics,programmable shader,	en
dc.relation.page	78
dc.rights.note	有償授權
dc.date.accepted	2007-08-01
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 目前未授權公開取用	1.25 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。