針對行動圖形處理器之高效率且可擴展核心架構設計與實作

Chia-Ming Chang; 張家銘

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5173

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸(Shao-Yi Chien)
dc.contributor.author	Chia-Ming Chang	en
dc.contributor.author	張家銘	zh_TW
dc.date.accessioned	2021-05-15T17:52:58Z	-
dc.date.available	2015-08-21
dc.date.available	2021-05-15T17:52:58Z	-
dc.date.copyright	2014-08-21
dc.date.issued	2014
dc.date.submitted	2014-08-07
dc.identifier.citation	[1] B. Mochockiet, K. Lahiri, and S. Cadambi, “Power analysis of mobile 3D graphics,” in Proc. Conf. Design, automation and test in Europe, vol. 44, pp. 502–507, May 2006. [2] http://people.csail.mit.edu/kapu/siggraph 2007/mob3D SG07 notes.pdf. [3] I. E. Sutherland, “Sketch pad a man-machine graphical communication system,” Proceedings of the SHARE Design Automation Workshop, pp. 6.329–6.346, Jan 1964. [4] T. Akenine-Moller, E. Haines, and N. Hoffman, Real-Time Rendering 3rd Edition, A. K. Peters, Ltd., Natick, MA, USA, 2008. [5] J. von Neumann, “First draft of a report on the edvac,” IEEE Ann. Hist. Comput., vol. 15, no. 4, pp. 27–75, Oct. 1993. [6] K.-H. Lok, Design and Implementation of Universal Buffer Compression and Decompression Unit for Mobile 3D Graphic System, Ph.D. thesis, National Taiwan University, July 2009. [7] http://www.khronos.org/opengles/. [8] http://www.khronos.org/registry/gles/specs/2.0/GLSL ES Specification 1.0.17.pdf. [9] M. Olano and T. Greer, “Triangle scan conversion using 2d homogeneous coordinates,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, New York, NY, USA, 1997, HWWS ’97, pp. 89–95, ACM. [10] J. Pineda, “A parallel algorithm for polygon rasterization,” in Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques, New York, NY, USA, 1988, SIGGRAPH ’88, pp. 17–20, ACM. [11] J. McCormack and R. McNamara, “Tiled polygon traversal using half-plane edge functions,” in Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware, New York, NY, USA, 2000, HWWS ’00, pp. 15–21, ACM. [12] Y.-M. Tsao, Scalable and Reconfigurable Stream Processor for Mobile Multimedia System, Ph.D. thesis, National Taiwan University, May 2008. [13] C.-H. Sun, Low Power Graphics Processing Units with Programmable Texture Unit and Universal Rasterizer for Mobile Multimedia Applications, Ph.D. thesis, National Taiwan University, July 2008. [14] D. Kim, K. Chung, C. H. Yu, C. H. Kim, I. Lee, J. Bae, Y. J. Kim, J. H. Park, S. Kim, Y. H. Park, N. H. Seong, J. A. Lee, J. Park, S. Oh, S. W. Jeong, and L. S. Kim, “An soc with 1.3 gtexels/s 3-d graphics full pipeline for consumer applications,” IEEE Journal of Solid-State Circuits, vol. 41, pp. 71–84, Jan. 2006. [15] J. H. Sohn, J. H. Woo, M. W. Lee, H. J. Kim, E. Woo, and H. J. Yoo, “A 155-mw 50-m vertices/s graphics processor with fixed-point programmable vertex shader for mobile applications,” IEEE Journal of Solid-State Circuits, vol. 41, pp. 1081–1091, May 2006. [16] B. G. Nam, H. Kim, and H. J. Yoo, “A low-power unified arithmetic unit for programmable handheld 3-d graphics systems,” IEEE Journal of Solid-State Circuits, vol. 42, pp. 1767–1778, Aug. 2007. [17] B. G. Nam and H. J. Yoo, “An embedded stream processor core based on logarithmic arithmetic for a low-power 3-d graphics soc,” IEEE Journal of Solid-State Circuits, vol. 44, pp. 1554–1570, May 2009. [18] C. H. Yu, K. Chung, D. Kim, and L. S. Kim, “An energy-efficient mobile vertex processor with multithread expanded vliw architecture and vertex caches,” IEEE Journal of Solid-State Circuits, vol. 42, pp. 2257–2269, Oct. 2007. [19] J. H. Woo, J. H. Sohn, Hy. Kim, and H. J. Yoo, “A 195 mw/152 mw mobile multimedia soc with fully programmable 3-d graphics and mpeg4/h.264/jpeg,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2047–2056, Sept. 2008. [20] J. H. Woo, J. H. Sohn, H. Kim, and H. H. Yoo, “A 195 mw, 9.1 mvertices/s fully programmable 3-d graphics processor for low-power mobile devices,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2370–2380, Nov. 2008. [21] S.-Y. Chien, Y.-M. Tsao, C.-H. Chang, and Y.-C. Lin, “An 8.6 mw 25 mvertices/s 400-mflops 800-mops 8.91 mm2 multimedia stream processor core for mobile applications,” IEEE Journal of Solid-State Circuits, vol. 43, pp. 2025–2035, Sept. 2008. [22] B.S. Nordquist, “Multithreaded simd parallel processor with loading of groups of threads,” Nov. 4 2008, US Patent 7,447,873. [23] R.M. Bastos, K.M. Abdalla, C. Rouet, M.J.M. Toksvig, J.S. Rhoades, R.L. Allen, J.D. Tynefield, E.M. Kilgariff, G.M. Tarolli, B. Cabral, et al., “Scalable shader architecture,” June 10 2008, US Patent 7,385,607. [24] J. S. Yoon, C. H. Yu, D. Kim, and L. S. Kim, “A dual-shader 3-d graphics processor with fast 4-d vector inner product units and power-aware texture cache,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, pp. 525–537, Apr. 2011. [25] H. Arakida, M. Takahashi, Y. Tsuboi, T. Nishikawa, H. Yamamoto, T. Fujiyoshi, Y. Kitasho, Y. Ueda, M. Watanabe, T. Fujita, T. Terazawa, K. Ohmori, M. Koana, H. Nakamura, E. Watanabe, H. Ando, T. Aikawa, and T. Furuyama, “A 160mW, 80nA standby, MPEG-4 audiovisual LSI with 16Mb embedded DRAM and a 5GOPS adaptive post filter,” in Int. Solid-State Circuits Conf. (ISSCC) Dig. Techn. Papers, 2003, pp. 42–43. [26] T. Akenine-Moller and J. Strom, “Graphics for the masses: a hardware rasterization architecture for mobile phones,” in Proceedings of ACM SIGGRAPH, 2003, pp. 801–808. [27] S. Morein, “ATI Radeon HyperZ Technology,” in Workshop on Graphics Hardware, Hot3D Proceedings, ACM SIGGRAPH/Eurographics, 2000. [28] E. Hamilton, “JPEG File Interchange Format,” Tech. Rep., C-Cube Microsystems, Milpitas, CA, USA, 9 1992. [29] D. Taubman and M. Marcellin, Eds., JPEG2000: Image Compression Fundamentals, Standards and Practice, Springer, 2001. [30] J. Hasselgren and T. Akenine-Moller, “Efficient depth buffer compression,” in GH ’06: Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, New York, NY, USA, 2006, pp. 103–110, ACM. [31] J. Strom, P. Wennersten, J. Rasmusson, J. Hasselgren, J. Munkberg, P. Clarberg, and T. Akenine-Moller, “Floating-point buffer compression in a unified codec architecture,” in GH ’08: Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, 2008, pp. 75–84. [32] S. L. Morein and M. A. Natale, “System, method, and apparatus for compression of video data using offset values,” July 2004. [33] S. E. Molnar, B-O Schneider, J. Montrym, J. M. Van Dyke, and S. D. Lew, “System and method for real-time compression of pixel colors,” Nov. 2004. [34] J. Rasmusson, J. Hasselgren, and T. Akenine-Moller, “Exact and error-bounded approximate color buffer compression and decompression,” in GH’07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, 2007, pp. 41–48. [35] J. Rasmusson, J. Strom, and T. Akenine-Moller, “Error-bounded lossy compression of floating-point color buffers using quadtree decomposition,” Vis. Comput., vol. 26, no. 1, pp. 17–30, Nov. 2009. [36] S.-Y. Chien, K.-H. Lok, and Y.-C. Lu, “Low-decoding-latency buffer compression for graphics processing units,” IEEE Transactions on Multimedia, vol. 14, no. 2, pp. 250–263, Apr. 2012. [37] A. C. Beers, M. Agrawala, and N. Chaddha, “Rendering from compressed textures,” in Proc. SIGGRAPH ’96, 1996. [38] E. J. Delp and O. R. Mitchell, “Image compression using block truncation coding,” IEEE Transactions on Communications, vol. 27, no. 9, pp. 1335–1342, Sept. 1979. [39] G. Campbell, T. A. DeFanti, J. Frederiksen, S. A. Joyce, L. A. Leske, J. A. Lindberg, and D. J. Sandin, “Two bit/pixel full color encoding,” in Proc. SIGGRAPH ’86, 1986, pp. 215–223. [40] K. I. Iourcha, K. S. Nayak, and Z. Hong, “System and method for fixed-rate block-based image compression with inferred pixel values,” Sept. 1999. [41] J. Strom and T. Akenine-Moller, “PACKMAN: texture compression for mobile phones,” in Sketches Program at SIGGRAPH ’04, 2004. [42] J. Strom and T. Akenine-Moller, “iPACKMAN, high-quality, low-complexity texture compression for mobile phones,” in Proc. Graphics Hardware, 2005, pp. 63–70. [43] J. Strom and M. Pettersson, “ETC2: Texture compression using invalid combinations,” in Proc. Graphics Hardware, 2007, pp. 49–54. [44] A. V. Pereberin, “Hierarchical approach for texture compression,” in Proc. of GraphiCon’99, 1999, pp. 195–199. [45] J. Stachera and P. Rokita, “Hierarchical texture compression,” in WSCG’2006, 2006, pp. 108–120. [46] C.-H. Sun, Y.-M. Tsao, and S.-Y. Chien, “High-quality mipmapping texture compression with alpha maps for graphics processing units,” IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 589–599, June 2009. [47] M. Boulton, “Using wavelets with current and future hardware,” in ACM SIGGRAPH 2008 classes, New York, NY, USA, 2008, SIGGRAPH ’08, pp. 102–132, ACM. [48] L. Williams, “Pyramidal parametrics,” in Proc. SIGGRAPH ’83, 1983. [49] Z. S. Hakura and A. Gupta, “The design and analysis of cache architecture for texture mapping,” in Proc. 24th International Symposium of Computer Architecture, 1997, pp. 108–120. [50] H. Igehy, M. Eldridge, and K. Proudfoot, “Prefetching in a texture cache architecture,” in Proc. Graphics Hardware 1998, 1998. [51] S. Fenney, “Texture compression using low-frequency signal modulation,” in Proc. Graphics Hardware 2003, 2003, pp. 84–91. [52] C.-H. Sun, Y.-M. Tsao, and S.-Y. Chien, “High-quality mipmapping texture compression with alpha maps for graphics processing units,” IEEE Trans. Multimedia, vol. 11, pp. 589–599, June 2009. [53] L. Yang, P. V. Sander, and J. Lawrence, “Geometry-aware framebuffer level of detail,” in Proc. 9th Eurographics Conf. on Rendering, vol. 0, pp. 1183–1188, May 2008. [54] C.-H. Sun, K.-H. Lok, Y.-M. Tsao, C.-M. Chang, and S.-Y. Chien, “CFU: multi-purpose configurable filtering unit for mobile multimedia applications on graphics hardware,” in Proc. Conf. High Performance Graphics, vol. 0, pp. 29–36, May 2009. [55] J. S. J. Li, “A class of multi-shell min/max median filters,” in IEEE Int. Symp. Circuits and Systems, vol. 1, pp. 421–424, May 1989. [56] S. Arakawa, Y. Yamaguchi, S. Akui, Y. Fukuda, H. Sumi, H. Hayashi,M. Igarashi, K. Ito, H. Nagano, M. Imai, and N. Asari, “A 512gops fully-programmable digital image processor with full hd 1080p processing capabilities,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, vol. 0, pp. 312–615, Feb. 2008. [57] J.-C. Chen and S.-Y. Chien, “Crisp: Coarse-grained reconfigurable image stream processor for digital still cameras and camcorders,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, pp. 1223–1236, Sept. 2008. [58] S. Agarwala, T. Anderson, A. Hill, M.D. Ales, R. Damodaran, P. Wiley, S. Mullinnix, J. Leach, A. Lellc, M. Gill, A. Rajagopal, A. Chachad, M. Agarwala, J. Apostol, M. Krishnan, Duc Bui, Quang An, N.S. Nagaraj, T. Wolf, and T.T. Elappuparackal, “A 600-mhz vliw dsp,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 1532–1544, Nov. 2002. [59] http://apitrace.github.io/.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5173	-
dc.description.abstract	在這幾年來，行動裝置的發展突飛猛進，不管是圖像的解析度越來越高，三維圖形的內容也越來越複雜，這都是推動了行動三維圖形處理器非常快速進化的因素。目前行動三維圖形處理器的架構早已是多核心的架構了，但如何設計一個高效率可擴展的架構是這篇博士論文中重要的議題。除了聚焦在這個問題之外，因為行動裝置都是使用電池來供給電力的。為了提升能源使用的效能，我們也探討了有效使用能源的設計。因此，我們提出了一種高效和可擴展的架構使得行動三維圖形處理器下在有限的能源下帶來高效能的圖形繪製能力。首先，我們分析涵蓋各個軸向的問題，包括了功能性、可擴展性以及效能。進而提出可擴展的統一渲染架構，平衡渲染管線設計，有效率使用能源技術和減少頻寬方法，來實現新一代的行動GPU。統一渲染架構可以提高平行運算而達到更高的性能，特別是Vertex和Pixel渲染工作負載不平衡的情況下，可發揮最好效能。除了統一渲染架構，渲染架構的可擴展性也是一個重要的議題。我們提出的設計可以經由簡單地擴展核心數量來達到有效擴充系統的性能。另外從圖形處理的效能的角度來分析，我們提出了可以平衡渲染管線進而提昇產出能力的方法。除了介紹Non-blocking資料提取單位的設計深入分析，我們也提出繪製狀態管線的設計，進而分割渲染管線狀態，使管線中可同時容納多個渲染工作，提昇工作的平行度和整體效能。我們也提出獨特的技術來實現高性能與低功耗的設計。從能源效率的角度來看，橋接緩衝暫存區之排程設計和節能資料異動技術可以提高硬體的使用效率並減少處理時間。值得一提的，近似渲染的架構可在選染圖像品質和更長持續的電力之間做最佳的取捨。此外，可組態的過濾單元（CFU）也被用於加速影像處理的應用。在越來越高的解析度的圖形顯示發展趨勢下，電力和頻寬的消耗也相對的越來越高。為了克服這個問題，我們提出了一個低延遲緩衝暫存區之壓縮/解壓縮引擎，並實現通用於色彩和深度資料的編解碼器架構，有效地降低了功耗以及頻寬的使用量。此外，為了減少貼圖頻寬，我們提出一個基於小波變換的概念，可依據貼圖內容細節層次（LOC）的貼圖方法。比以往的貼圖壓縮技術可進一步降低資料量。並且是個可調適Bit Rate的編碼方法，可達到2bpp（bit-per-pixel）到4bpp之間的壓縮率。從實驗結果來看，在3bpp壓縮率的影像品質並不會比4bpp的S3TC方法要來的差。在這篇博士論文中，我們實作了一個八個統一著色器核心且可具備擴展性的圖形處理器系統晶片。加上搭載了可組態的過濾單元（CFU），處理器可提供43.8GFLOPS的運算效能，並可達到1.2Gvertices/s和2.4Gpixels/s的圖形輸出能力。此晶片採用65納米CMOS製程，核心尺寸達7.56平方毫米。從實驗結果來看，可達到34％的省電效率，相較於其他先進的圖形處理器，此晶片增進了2.5倍的效能。而搭配CFU使用在影像處理應用上，比起其他先進的圖像信號處理器（ISP），可達到1.1倍至7.2倍的性能提昇。我們可以下一個簡單的結論，我們提出的架構設計，可以有效率的電力使用下，帶來高效能的圖形繪製能力。	zh_TW
dc.description.abstract	Mobile devices are becoming one of the major drivers of GPU evolution. The fast increasing demands for rich contents push the progress of mobile GPU in a very fast way. Mobile GPU architecture is in the unified multi-core era. For mobile GPU, energy consumption is also a significant concern for battery-powered mobile devices. In this dissertation, we focus on the design for scalable architecture. Thus, we propose a scalable architecture for mobile 3D graphics processor to deliver high performance under efficient energy utilization. At first of dissertation, an analysis for high performance graphics processor is introduced. The analysis covers various topics, including functionality, scalable unified architecture and performance. Based on the analysis, scalable unified architecture, balanced rendering pipeline design, energy efficiency techniques and bandwidth reduction techniques are presented. Unified shader architecture can increase more data parallelism than non-unified architecture, especially in the condition of unbalance workload between vertex task and pixel task. In addition, a scalable architecture is proposed to scale system performance efficiently by simply extending the core number. From perspective of performance, proposed pipeline bubble reduction techniques can achieve a balanced pipeline and increase throughput. A non-blocking fetch unit is presented with a thorough design analysis. Moreover, the proposed render state pipeline is designed for draw call overlapping. Different from conventional graphics processors, this work achieves high performance with several unique techniques for low power consumption. From perspective of energy efficiency, the buffer bridged scheduler and the energy efficient transaction technique can increase hardware utilization and performance. The approximated rendering scheme is proposed to provide power scalability for our GPU with the trade-off between image quality and power consumption. Moreover, configurable filtering unit (CFU) is also employed for accelerating image processing. Power and memory bandwidth consumption increase with an increase in display resolution. To overcome this problem, we propose a low-latency buffer compression/decompression engine with universal codec architecture that can handle both color and depth data with the same hardware unit; this reduces power consumption and memory bandwidth. In order to reduce bandwidth from texturing, we propose a new concept of texturing, a level-of-content (LOC) map, that provides a per-texel level-of-detail bias is generated based on the concept of wavelet transform. During a texture fetch, we first look up the LOC map and employ it to fetch at an appropriate level in the texture pyramid. This process maps similar texels at higher mipmap levels where they coincide, and it breaks the limitation of Block Truncation Coding (BTC)-based texture compression to share base colors across blocks, which can further reduce the data size than previous texture compression techniques. Moreover, the luminance difference compensation method is developed for recovering the image quality. Based on these new concepts, a rate adjustable encoding scheme is proposed with the data rate ranging from 2bpp (bits per pixel) to 4bpp while the random access ability is still sustained. Experiment results on a variety of textures show that a typical data rate of 3bpp is achieved with minor quality degradation compared with S3TC fixed in 4bpp. In this dissertation, an eight unified shader cores are implemented in the scalable architecture with configurable filter unit (CFU) as texture unit to deliver excessive computation performance of 43.8GFLOPS, and it can cooperate with graphics specific engine to provide graphics rendering performance of 1.2Gvertices/s and 2.4Gpixels/s for 3D graphics applications. This implementation is fabricated in 65nm CMOS technology with core size of 7.56mm2. The experiment results show that 34% power saving can be achieved and 2.5 times improvement can be achieved as compared with the state-of-the-art graphics processor with all the proposed energy efficiency improving techniques described above. Compared with state-of-the-art image signal processors (ISPs), 1.1 times to 7.2 times performance can be achieved by the proposed mobile graphics processor with CFU.	en
dc.description.provenance	Made available in DSpace on 2021-05-15T17:52:58Z (GMT). No. of bitstreams: 1 ntu-103-D94944011-1.pdf: 13801478 bytes, checksum: 83d9e81e426fffae39bbbdebf51a1b87 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	Abstract ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 3D Graphics Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 OpenGL ES and Draw Call . . . . . . . . . . . . . . . . . . . . . . . . .4 1.3 From Fixed Functions and Programmable Shader to Unified Architecture. . .6 1.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 1.5 Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 1.6 Research Contribution . . . . . . . . . . . . . . . . . . . . . . . . . .13 1.7 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . .15 2 Systems Analysis and Design . . . . . . . . . . . . . . . . . . . . . . . .17 2.1 Analysis for Functionality. . . . . . . . . . . . . . . . . . . . . . . .17 2.1.1 Shading Language and Shader Model . . . . . . . . . . . . . . . . . . .17 2.1.2 Graphics Specific Function Block. . . . . . . . . . . . . . . . . . . .21 2.2 Analysis for Scalable Unified Architecture . . . . . . . . . . . . . . .24 2.3 Analysis for Performance . . . . . . . . . . . . . . . . . . . . . . . .25 2.3.1 Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Balanced Rendering Pipeline . . . . . . . . . . . . . . . . . . . . . 27 2.3.3 Bandwidth Reduction. . . . . . . . . . . . . . . . . . . . . . . . . .29 3 Scalable Unified Architecture . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Unified Shader Model . . . . . . . . . . . . . . . . . . . . . . . . . .31 3.2 Scalable Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Barycentric Rasterization . . . . . . . . . . . . . . . . . . . . . . . 33 4 Balanced Rendering Pipeline Design . . . . . . . . . . . . . . . . . . . .39 4.1 Non-Blocking Fetch Unit . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Render state pipeline/Draw Call Partition . . . . . . . . . . . . . . . 41 5 Bandwidth Reduction Techniques . . . . . . . . . . . . . . . . . . . . . .47 5.1 Low-decoding-latency Universal Buffer Codec . . . . . . . . . . . . . . .48 5.1.1 Proposed Universal Buffer Codec Architecture . . . . . . . . . . . . . 51 5.1.2 Buffer Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.1.3 Buffer Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.4 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Level-of-Content Texturing . . . . . . . . . . . . . . . . . . . . . . . 55 5.2.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.2.2 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2.3 Texture Encoding Scheme . . . . . . . . . . . . . . . . . . . . . . . .67 5.2.4 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . . . . .72 6 Energy Efficiency Techniques . . . . . . . . . . . . . . . . . . . . . . .77 6.1 Buffer Bridged Scheduler . . . . . . . . . . . . . . . . . . . . . . . . 78 6.2 Energy-Efficient Transaction Technique . . . . . . . . . . . . . . . . .83 6.3 Approximated Rendering Technique . . . . . . . . . . . . . . . . . . . .85 6.4 Configurable Filter Unit . . . . . . . . . . . . . . . . . . . . . . . .91 7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99 7.1 Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . 99 7.1.1 C-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99 7.1.2 OpenGL ES 2.0 Driver . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.1.3 Verification Platform . . . . . . . . . . . . . . . . . . . . . . . . .108 7.2 Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . .110 7.2.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 7.2.2 Implementation Result . . . . . . . . . . . . . . . . . . . . . . . . .113 8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117 8.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . .119 Appendix A Shader Core ISA Specification . . . . . . . . . . . . . . . . . .121
dc.language.iso	en
dc.title	針對行動圖形處理器之高效率且可擴展核心架構設計與實作	zh_TW
dc.title	Efficient and Scalable Architecture Design and Implementation for Mobile 3D Graphics Processors	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	楊佳玲(Chia-Lin Yang),洪士灝(Shih-Hao Hung),陳維超(Wei-Chao Chen),范倫達(Lan-Da Van),李潤容(Ruen-Rone Lee)
dc.subject.keyword	三維圖形處理器,晶片,貼圖壓縮,	zh_TW
dc.subject.keyword	3D graphics processor,chip,texture compression,	en
dc.relation.page	153
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2014-08-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf	13.48 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。