動態線程束重組機制在繪圖晶片中對繪圖工作量的分析

Hsi-Feng Lin; 林希峰

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40109

標題:	動態線程束重組機制在繪圖晶片中對繪圖工作量的分析 Analysis of Dynamic Warp Formation on GPU for Graphics Workloads
作者:	Hsi-Feng Lin 林希峰
指導教授:	楊佳玲(Chia-Lin Yang)
關鍵字:	繪圖晶片架構,線程束分散,動態線程束重組,圖形工作量,單指令多任務執行模型, GPU Architecture,Warp Divergence,Dynamic Warp Formation,Graphics Workloads,SIMT execution model,
出版年 :	2011
學位:	碩士
摘要:	傳統的繪圖晶片處理器只處理繪圖方面的運算，直到後來平行程式的興起，繪圖晶片內的多核心架構也受到重視，因此越來越多研究開始利用繪圖晶片加速執行速度或是探討如何更進一步的提升它的執行效率。通用處理器就是利用繪圖晶片內的多核心架構來處理大量複雜的運算，例如：物裡引擎、生物技術等。另外，為了簡化繪圖晶片執行上的複雜情形，它是以單指令多任務的模型來執行指令的，而多個任務可以合成一個線程束。但是如果遇到判斷式指令，一個線程束內的任務可能會執行不同的路徑，這會造成線程束執行效率降低，而影響整體的效能。之前有人提出了動態線程束重組的方法來改善此問題，但主要是針對通用處理器。我們發現現在很多繪圖程式裡也有很多判斷式指令，因此也常常發生執行效率降低的情形。而在這篇論文中，我們將動態線程束重組機制實作在我們的繪圖晶片模擬器中，並且分析它所造成的影響。繪圖程式架構跟通用處理器架構上主要有兩個不同點，第一，線程束的執行排程管理延後到讀取指令之後，因此，動態線程束的重組也必須在讀取指定之後才行。第二，繪圖程式架構支援一個線程束多個指令被執行，但動態線程束重組機制使得此種機制必須被修改，我們修改成只有全滿的線程束能夠讓多個指令被執行，沒滿的線程束就必須等執行完一個指令後才能再執行下一個。接著我們分析了動態線程束重組機制對於繪圖程式的影響。我們發覺此機制對於整體的效能並沒有提升，還有可能造成效能下降。我們分成兩部分來探討：第一，動態線程束重組後不一定使得線程束內的任務個數增加，因為只有在判斷式路徑內時，此機制才會有好處，而一離開判斷式路徑，此機制沒有強制合併成全滿的線程束，反而導致線程束的利用率下降，進而影響效能。第二，如果判斷式內要讀取貼圖，動態線程束重組可能會造成第一層緩衝儲存器的失誤率增加，造成整個管線暫停，並使得整體讀取貼圖的時間增加，如果沒有足夠多可執行的線程束，則增加許多等待的時間。另一方面，我們也測試了判斷式類別跟排程策略的關係，發現他們並不是直接相關的，還必須考慮到貼圖的讀取和新產生線程束的影響。 Graphics Processing Units (GPUs) used to process graphics computing only. Nowadays, the rise of parallel computing encourage different utility of GPU. General Purpose GPU computing exploits the large number of cores in GPU to parallelly accelerate complexity algorithmes. Many GPU researches recently focus on GPGPU architecture design or application acceleration by GPU. GPU uses Single Instruction Multiple Thread (SIMT) execution model to simplify the flow control mechanism, that can reduce the control area and increase core numbers for more computing ability. SIMT allows several threads group together as a thread group and execute single instructions in community. However, whenever a group encounters the branch instruction, threads in the group may need to distribute into different paths. When branch divergence, GPU commonly uses stack method, which lowers the computing utilization and decreases the performance. Therefore, Dynamic Warp Formation (DWF) mechanism was proposed by Fung et al. and was proved useful to solve this problem in some GPGPU cases. In this thesis, we try to find out if DWF is also useful for graphics workloads and then we analyze our exeperiments results. We also describe the difference of GPU architecture and GPGPU-sim architecture and some hardware design decision we made for DWF mechanism. Besides, we propose two observations in our experiments: First, DWF increases the opportunity of Write-Buffer-Full stall (TU stall), which may also increase the No-Ready-Warp stall (SP stall). Second, the relation between scheduling policies and branch types is not direct, but other factors like texture access need to be considered. We will narrowly describe our obeservations in experimental parts.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40109
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 未授權公開取用	991.61 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。