針對行動圖形處理器之高效率且可擴展核心架構設計與實作

Chia-Ming Chang; 張家銘

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5173

標題:	針對行動圖形處理器之高效率且可擴展核心架構設計與實作 Efficient and Scalable Architecture Design and Implementation for Mobile 3D Graphics Processors
作者:	Chia-Ming Chang 張家銘
指導教授:	簡韶逸(Shao-Yi Chien)
關鍵字:	三維圖形處理器,晶片,貼圖壓縮, 3D graphics processor,chip,texture compression,
出版年 :	2014
學位:	博士
摘要:	在這幾年來，行動裝置的發展突飛猛進，不管是圖像的解析度越來越高，三維圖形的內容也越來越複雜，這都是推動了行動三維圖形處理器非常快速進化的因素。目前行動三維圖形處理器的架構早已是多核心的架構了，但如何設計一個高效率可擴展的架構是這篇博士論文中重要的議題。除了聚焦在這個問題之外，因為行動裝置都是使用電池來供給電力的。為了提升能源使用的效能，我們也探討了有效使用能源的設計。因此，我們提出了一種高效和可擴展的架構使得行動三維圖形處理器下在有限的能源下帶來高效能的圖形繪製能力。首先，我們分析涵蓋各個軸向的問題，包括了功能性、可擴展性以及效能。進而提出可擴展的統一渲染架構，平衡渲染管線設計，有效率使用能源技術和減少頻寬方法，來實現新一代的行動GPU。統一渲染架構可以提高平行運算而達到更高的性能，特別是Vertex和Pixel渲染工作負載不平衡的情況下，可發揮最好效能。除了統一渲染架構，渲染架構的可擴展性也是一個重要的議題。我們提出的設計可以經由簡單地擴展核心數量來達到有效擴充系統的性能。另外從圖形處理的效能的角度來分析，我們提出了可以平衡渲染管線進而提昇產出能力的方法。除了介紹Non-blocking資料提取單位的設計深入分析，我們也提出繪製狀態管線的設計，進而分割渲染管線狀態，使管線中可同時容納多個渲染工作，提昇工作的平行度和整體效能。我們也提出獨特的技術來實現高性能與低功耗的設計。從能源效率的角度來看，橋接緩衝暫存區之排程設計和節能資料異動技術可以提高硬體的使用效率並減少處理時間。值得一提的，近似渲染的架構可在選染圖像品質和更長持續的電力之間做最佳的取捨。此外，可組態的過濾單元（CFU）也被用於加速影像處理的應用。在越來越高的解析度的圖形顯示發展趨勢下，電力和頻寬的消耗也相對的越來越高。為了克服這個問題，我們提出了一個低延遲緩衝暫存區之壓縮/解壓縮引擎，並實現通用於色彩和深度資料的編解碼器架構，有效地降低了功耗以及頻寬的使用量。此外，為了減少貼圖頻寬，我們提出一個基於小波變換的概念，可依據貼圖內容細節層次（LOC）的貼圖方法。比以往的貼圖壓縮技術可進一步降低資料量。並且是個可調適Bit Rate的編碼方法，可達到2bpp（bit-per-pixel）到4bpp之間的壓縮率。從實驗結果來看，在3bpp壓縮率的影像品質並不會比4bpp的S3TC方法要來的差。在這篇博士論文中，我們實作了一個八個統一著色器核心且可具備擴展性的圖形處理器系統晶片。加上搭載了可組態的過濾單元（CFU），處理器可提供43.8GFLOPS的運算效能，並可達到1.2Gvertices/s和2.4Gpixels/s的圖形輸出能力。此晶片採用65納米CMOS製程，核心尺寸達7.56平方毫米。從實驗結果來看，可達到34％的省電效率，相較於其他先進的圖形處理器，此晶片增進了2.5倍的效能。而搭配CFU使用在影像處理應用上，比起其他先進的圖像信號處理器（ISP），可達到1.1倍至7.2倍的性能提昇。我們可以下一個簡單的結論，我們提出的架構設計，可以有效率的電力使用下，帶來高效能的圖形繪製能力。 Mobile devices are becoming one of the major drivers of GPU evolution. The fast increasing demands for rich contents push the progress of mobile GPU in a very fast way. Mobile GPU architecture is in the unified multi-core era. For mobile GPU, energy consumption is also a significant concern for battery-powered mobile devices. In this dissertation, we focus on the design for scalable architecture. Thus, we propose a scalable architecture for mobile 3D graphics processor to deliver high performance under efficient energy utilization. At first of dissertation, an analysis for high performance graphics processor is introduced. The analysis covers various topics, including functionality, scalable unified architecture and performance. Based on the analysis, scalable unified architecture, balanced rendering pipeline design, energy efficiency techniques and bandwidth reduction techniques are presented. Unified shader architecture can increase more data parallelism than non-unified architecture, especially in the condition of unbalance workload between vertex task and pixel task. In addition, a scalable architecture is proposed to scale system performance efficiently by simply extending the core number. From perspective of performance, proposed pipeline bubble reduction techniques can achieve a balanced pipeline and increase throughput. A non-blocking fetch unit is presented with a thorough design analysis. Moreover, the proposed render state pipeline is designed for draw call overlapping. Different from conventional graphics processors, this work achieves high performance with several unique techniques for low power consumption. From perspective of energy efficiency, the buffer bridged scheduler and the energy efficient transaction technique can increase hardware utilization and performance. The approximated rendering scheme is proposed to provide power scalability for our GPU with the trade-off between image quality and power consumption. Moreover, configurable filtering unit (CFU) is also employed for accelerating image processing. Power and memory bandwidth consumption increase with an increase in display resolution. To overcome this problem, we propose a low-latency buffer compression/decompression engine with universal codec architecture that can handle both color and depth data with the same hardware unit; this reduces power consumption and memory bandwidth. In order to reduce bandwidth from texturing, we propose a new concept of texturing, a level-of-content (LOC) map, that provides a per-texel level-of-detail bias is generated based on the concept of wavelet transform. During a texture fetch, we first look up the LOC map and employ it to fetch at an appropriate level in the texture pyramid. This process maps similar texels at higher mipmap levels where they coincide, and it breaks the limitation of Block Truncation Coding (BTC)-based texture compression to share base colors across blocks, which can further reduce the data size than previous texture compression techniques. Moreover, the luminance difference compensation method is developed for recovering the image quality. Based on these new concepts, a rate adjustable encoding scheme is proposed with the data rate ranging from 2bpp (bits per pixel) to 4bpp while the random access ability is still sustained. Experiment results on a variety of textures show that a typical data rate of 3bpp is achieved with minor quality degradation compared with S3TC fixed in 4bpp. In this dissertation, an eight unified shader cores are implemented in the scalable architecture with configurable filter unit (CFU) as texture unit to deliver excessive computation performance of 43.8GFLOPS, and it can cooperate with graphics specific engine to provide graphics rendering performance of 1.2Gvertices/s and 2.4Gpixels/s for 3D graphics applications. This implementation is fabricated in 65nm CMOS technology with core size of 7.56mm2. The experiment results show that 34% power saving can be achieved and 2.5 times improvement can be achieved as compared with the state-of-the-art graphics processor with all the proposed energy efficiency improving techniques described above. Compared with state-of-the-art image signal processors (ISPs), 1.1 times to 7.2 times performance can be achieved by the proposed mobile graphics processor with CFU.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/5173
全文授權:	同意授權(全球公開)
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf	13.48 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。