適用於三維繪圖系統之頂點與像素通用著色處理器
之硬體架構設計與實現

Yu-Cheng Lin; 林昱呈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28026

標題:	適用於三維繪圖系統之頂點與像素通用著色處理器之硬體架構設計與實現 Hardware Architecture Design and Implementation of Universal Vertex/Pixel Shader for 3D Graphics System
作者:	Yu-Cheng Lin 林昱呈
指導教授:	簡韶逸(Shao-Yi Chien)
關鍵字:	頂點與像素通用著色處理器,電腦圖學,可程式化處理器, universal vertex/pixel shader,graphics,programmable shader,
出版年 :	2007
學位:	碩士
摘要:	隨著電腦與娛樂技術的日益進步，人們對電腦繪圖的要求越來越高，希望電腦繪圖的效果能越逼真越好。另外一方面，手機的發展日趨進步，由電腦繪圖產生的特效也漸漸成為高階手機中的必備需求。然而，目前的硬體多為了桌上型平台所設計，著重於效能的表現，但是其功率消耗過大並不適合用於手持式的裝置上面。而少數專為手持式裝置設計的繪圖處理器又大多效能不高。手持裝置上受限的硬體資源和電力成為這項發展的重大限制。因此低功率且在合理成本下達到高效能之繪圖加速器便是具有價值之研究題目。本論文觀察到傳統電腦圖學管線中頂點著色處理器與像素著色處理器的負載不平衡情形，考慮到其指令的相似性頗高，提出了頂點與像素通用著色處理器的硬體架構，在硬體成本增加不多的條件下，大大提高了繪圖處理器的處理效能。本篇論文在硬體架構上有三項重要的貢獻：一、頂點與像素通用著色處理器，具有在執行期間隨著狀況不同而動態做出適當調整的能力，在負載不平衡的情況下，大幅提高硬體的利用率；二、可配置的記憶體陣列，可以依照應用的不同改變輸入及輸出快取的分配；三、低功率技術。低功率技術可以分成兩方面：轉置後提前丟棄和時脈閘。轉置後提前丟棄之技術可以將三維繪圖中三角形的資訊加入到以頂點為根本之頂點上色器中。應用此技術之下，多餘之打光運算可被去除進而降低功耗。在指令層級做時脈閘可以將不需要用到的邏輯閘及暫存器關閉，節省不必要之電源浪費。本論文提出之特點已經由實做之晶片驗證。實做之結果顯示結合上面所提到的各項架構上的優點，總共可以節省超過40%之處理時間。原型晶片利用聯電90nm技術製成，面積為3.500×3.500mm2。其處理速度為每秒200百萬頂點以及200百萬像素，等同於每秒64億浮點數運算。 3D graphics technology, which is developed since 1960s, is widely used in animations, games, and user interfaces. For real-time graphics applications, Graphics Processing Units (GPUs) are now mainly designed for the desk-top environments. In recent years, there are two important migrations in graphics accelerators. The first one is that the fixed-function pipeline in the early days is now gradually replaced by the programmable pipeline, shader pipeline. The shader pipeline provides the artists and programmers freedom to program the GPU, and extraordinary graphic effects are emerging in an endless stream. The second important migration is that graphics accelerators for mobile devices become more and more important. Powerful graphics functions are going to be integrated in hand-held devices to provide users better user interface and portable gaming environments. The limited resources on a mobile devices, including hardware resource and energy resource, cause the major drawback to provide 3D graphic capability on the handheld devices. Several low-power low-cost solutions have been proposed in these years with low performance. A more efficient solution, where the computing, memory, and power resources should be effectively allocated, is still required. In this thesis, low-power cost-efficient yet high performance universal vertex/pixel shaders, which are used to replace the vertex shader and the pixel shader in the traditional programmable pipeline, are proposed. There are three major contributions in hardware architecture in this thesis. First, the universal vertex/pixel shader, which unifies the functions of the vertex shader as well as the pixel shader and has the ability to make adaptive execution-time resource allocation based on the different scenarios, is proposed to solve the load-imbalance problems. Second, the configurable memory array (CMA) can be used as input/output vertex cache and can change the configurations dynamically to keep the memory usage efficiently for different applications. Finally, many low power design techniques are also proposed. The main low power techniques applied are early rejection after transformation (ERAT) and gated clock. The ERAT technique analyzes the contents of transformed primitives to avoid redundant lighting computation in order to reduce power consumption of the shaders. Instruction level gated clock can be achieved from the operation (OP) and the active vector codes. The clock of those data registers of the un-issued PEs are gated for saving dynamic power. The unused vector pipeline would be turned off and gated to save power. The proposed design techniques are verified by real implementation. Implementation results show that over 40 percent processing time could be saved with all the architecture advantages mentioned above. The prototype chip is fabricated by UMC 90nm technology. The die size is 3.500×3.500mm2. It is capable of processing 200 mega vertices per second and 200 mega pixels per second, which is equivalent to 6.4 giga floating point operations per second. The power consumption is 10.75mW in the worst case when the chip works at 200MHz.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/28026
全文授權:	有償授權
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 目前未授權公開取用	1.25 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。