適用於可攜式裝置之多核心三維繪圖系統之工作排程器與資源排程器

Chun-Yi Lin; 林春益

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48709

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	簡韶逸
dc.contributor.author	Chun-Yi Lin	en
dc.contributor.author	林春益	zh_TW
dc.date.accessioned	2021-06-15T07:09:40Z	-
dc.date.available	2010-10-22
dc.date.copyright	2010-10-22
dc.date.issued	2010
dc.date.submitted	2010-10-20
dc.identifier.citation	[1] You-Ming Tsao, Chih-Hao Sun, Yu-Cheng Lin, Ka-Hang Lok, Chia-Jung Hsu, Shao-Yi Chien, and Liang-Gee Chen, “A 26mw 6.4gflops multi-core stream processor for mobile multimedia applications,” jun. 2008, pp. 24 –25. [2] Silicon Graphics limited, “The OpenGL Graphics System: A Specification - Version 1.1,” . [3] H. Gouraud, “Continuous shading of curved surfaces,” IEEE Trans. Comput., vol. 20, no. 6, pp. 623–629, 1971. [4] Bui-Tuong Phong, “Illumination for Computer Generated Pictures,” vol. 18, no. 6, pp. 311–317, 1975. [5] NVIDIA, “The Infinite Effects GPU - GeForce3,” . [6] David B. Kirk andWen-meiW. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, Morgan Kaufmann, 1 edition, February 2010. [7] NVIDIA, “The Infinite Effects GPU - GeForce6,” . [8] Michael Oneppo, “Hlsl shader model 4.0,” in SIGGRAPH ’07: ACM SIGGRAPH 2007 courses, New York, NY, USA, 2007, pp. 112–152, ACM. [9] David Blythe, “The direct3d 10 system,” in SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, New York, NY, USA, 2006, pp. 724–734, ACM. [10] ARM, “ARM Mali 400MP,” . [11] Imagination Tech., “PowerVR Series,” . [12] Masatoshi Kameyama, Yoshiyuki Kato, Hitoshi Fujimoto, Hiroyasu Negishi, Yukio Kodama, Yoshitsugu Inoue, and Hiroyuki Kawai, “3d graphics lsi core for mobile phone ”z3d”,” in HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Aire-la- Ville, Switzerland, Switzerland, 2003, pp. 60–67, Eurographics Association. [13] NVIDIA, “GeForce 8 series,” . [14] R. Fromm, S. Perissakis, N. Cardwell, C. Kozyrakis, B. McGaughy, D. Patterson, T. Anderson, and K. Yelick, “The energy efficiency of iram architectures,” jun. 1997, pp. 327 –337. [15] Ramchan Woo, Sungdae Choi, Ju-Ho Sohn, Seong-Jun Song, Young-Don Bae, Chi-Weon Yoon, Byeong-Gyu Nam, Jeong-Ho Woo, Sung-Eun Kim, In-Cheol Park, Sungwon Shin, Kyung-Dong Yoo, Jin-Yong Chung, and Hoi-Jun Yoo, “A 210mw graphics lsi implementing full 3d pipeline with 264mtexels/s texturing for mobile multimedia applications,” feb. 2003, pp. 44 – 476 vol.1. [16] Masatoshi Kameyama, Yoshiyuki Kato, Hitoshi Fujimoto, Hiroyasu Negishi, Yukio Kodama, Yoshitsugu Inoue, and Hiroyuki Kawai, “3d graphics lsi core for mobile phone ”z3d”,” in HWWS ’03: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Aire-la- Ville, Switzerland, Switzerland, 2003, pp. 60–67, Eurographics Association. [17] Silicon Graphics limited, “OpenGL ES 2.X and the OpenGL ES Shading Language,” . [18] Ju-Ho Sohn, Ramchan Woo, and Hoi-Jun Yoo, “A programmable vertex shader with fixed-point simd datapath for low power wireless applications,” in Graphics Hardware, 2004, pp. 107–114. [19] Min wuk Lee, Byeong-Gyu Nam, Ju-Ho Sohn, Namjun Cho, Hyejung Kim, Kwanho Kim, and Hoi-Jun Yoo, “A fixed-point 3d graphics library with energy-efficient cache architecture for mobile multimedia systems,” in ISCAS (5), 2005, pp. 4602–4605. [20] Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong, Euljoo Jeong, Suk Joong Lee, and Hoi-Jun Yoo, “A low power multimedia soc with fully programmable 3d graphics and mpeg4/h.264/jpeg for mobile devices,” in ISLPED, 2007, pp. 238–243. [21] Jeong-Ho Woo, Ju-Ho Sohn, Hyejung Kim, Jongcheol Jeong, Euljoo Jeong, Suk Joong Lee, and Hoi-Jun Yoo, “A 152mw mobille multimedia soc with fully programmable 3d graphics and mpeg4/h.264/jpeg,” jun. 2007, pp. 220 –221. [22] The Khronos Group Inc., “The OpenGL Shading Language,” . [23] ARM Limited, “AMBA Specification (Rev 2.0),” . [24] KenW. Batcher and Robert A.Walker, “Dynamic round-robin task scheduling to reduce cache misses for embedded systems,” in DATE ’08: Proceedings of the conference on Design, automation and test in Europe, New York, NY, USA, 2008, pp. 260–263, ACM. [25] Kayvon Fatahalian, Edward Luong, Solomon Boulos, Kurt Akeley, William R. Mark, and Pat Hanrahan, “Data-parallel rasterization of micropolygons with defocus and motion blur,” in HPG ’09: Proceedings of the Conference on High Performance Graphics 2009, New York, NY, USA, 2009, pp. 59– 68, ACM. [26] M. Doggett and S. Laine, “Hardware implementation of micropolygon rasterization withmotion and defocus blur j.s.brunhaver, k.fatahalian, and p.hanrahan,” . [27] Morgan McGuire, Eric Enderton, Peter Shirley, and David Luebke, “Hardware-accelerated stochastic rasterization on conventional gpu architectures,” in Proceedings of High Performance Graphics 2010, June 2010. [28] You-Ming Tsao, Ka-Hang Lok, Yu-Cheng Lin, Chih-Hao Sun, Shao-Yi Chien, and Liang-Gee Chen, “A cost effective reconfigurable memory for multimedia multithreading streaming architecture,” may. 2008, pp. 3406 –3409. [29] Chih-Hao Sun, You-Ming Tsao, Ka-Hang Lok, and Shao-Yi Chien, “Universal rasterizer with edge equations and tile-scan triangle traversal algorithm for graphics processing units,” jun. 2009, pp. 1358 –1361. [30] Chih-Hao Sun, Ka-Hang Lok, You-Ming Tsao, Chia-Ming Chang, and Shao-Yi Chien, “Cfu: multi-purpose configurable filtering unit for mobile multimedia applications on graphics hardware,” in HPG ’09: Proceedings of the Conference on High Performance Graphics 2009, New York, NY, USA, 2009, pp. 29–36, ACM. [31] Byeong-Gyu Nam, Jeabin Lee, Kwanho Kim, Seung Jin Lee, and Hoi-Jun Yoo, “A 52.4mw 3d graphics processor with 141mvertices/s vertex shader and 3 power domains of dynamic voltage and frequency scaling,” feb. 2007, pp. 278 –603. [32] Jae-Sung Yoon, Donghyun Kim, Chang-Hyo Yu, and Lee-Sup Kim, “A 3d graphics processor with fast 4d vector inner product units and power aware texture cache,” sep. 2008, pp. 539 –542.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48709	-
dc.description.abstract	最近這幾年來，愈來愈多的多媒體應用在消費性電子商品上。發展蓬勃的手持式裝置如智慧型手機，除了基本的通話服務之外，還支援多種多媒體應用，如相機、錄影、網路、GPS導航、MP3隨身聽等等，大大地影響到人類的生活。而在手持式裝置上配備三維繪圖處理器(GPUs)的情況也越來越普遍。由於螢幕是人機之間最重要的介面，使用者對畫面上通常都會要求高畫質，這也意味著三維繪圖處理器所需要處理的運算也越來越複雜;除了高畫質的訴求，高解析度的螢幕不可或缺。高畫質及高解析度的螢幕要求，都直接的顯示出在三維繪圖處理器中，處理運算的核心單元也必須跟著變多。而這對資源極有限的手持式裝置來說，不失為一大挑戰。如何在固定的資源下，讓多個運算核心達到比較高的使用率以及讓各個核心中需要用來運算的資料使用最佳化，將會是一大考驗。在這篇論文中，我們提出了一個手提式多核心繪圖處理器(Multi-Core Mobile GPU)：利用動態配置的工作排程器(Task Scheduler)來分配各個需要運算的工作給各個運算核心單元，來達到各個運算核心之間的負擔平衡（Load balance）；也利用資源排程器(Resource Scheduler)，來提前搬移各個運算核心單元所需要的資料，使得各筆資料，能夠更有效率的被搬移並使用，進而提高每個著色處理器的管線效率，綜合以上技術，我們把原本只有兩個運算核心的手提式繪圖處理器，擴充到了具有八個運算核心，並加入了所提出的工作排程器和資源排程器，並將之實現成一個系統晶片平台，晶片利用台積電65nm 技術製成，面積為4.0×4.0mm2，其工作頻率為200MHz，最大消耗功率為212.84mW	zh_TW
dc.description.abstract	Recently, the multimedia application in consumer electronics has become more and more prosperous. Among all, the smart phones are not just phones. There are many multimedia applications such as games, camera, video, internet, GPS, MP3 etc embedded into the smart phones. Usually, there is a 3D graphics processing unit (GPU) which is embedded into mobile devices to enhance the processing capability of the multimedia applications. Since screen is the most important interface between the users and mobile devices, the high resolution and high quality screens are indispensible for users that use the mobile devices. However, higher resolution screen and higher quality display both mean that the processing power of the GPU must be increased which means the number of processing cores must be increased. This is a challenge since there are many limitations on mobile devices. How to maximize the utilization of the multi-core processor and how to make the efficient usage of the resource for processing will be a design challenge. In this thesis, we proposed a multi-core mobile GPU for mobile devices. We propose a dynamic task scheduler to dynamically assign the subtask to each processor to make the whole system more load balance; besides, we also propose a resource scheduler to efficiently prefetch the resource that is required by processor to make the whole processing pipeline more efficient. Based on the above techniques, we extend our original two processing shader cores GPU to a eight core unified shader GPU with task scheduler and resource scheduler added. We also realize these to a SOC platform, our chip is fabricated by TSMC 65nm technology. The chip area is 4mmx4mm and the chip working frequency is at 200 MHz. The maximum power consumption is 210mW. The maximum processing capabilities is 800MVtx/sec and 1600MPxl/sec.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T07:09:40Z (GMT). No. of bitstreams: 1 ntu-99-R97943002-1.pdf: 13667815 bytes, checksum: a6f0b8fc1426151d85ccffc261c7bd10 (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	Abstract xi 1 Introduction 1 1.1 The Graphic Pipeline . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Multi-Core Graphics Processing Units . . . . . . . . . . . . . . . 5 1.3 The Trend of Mobile Graphics Processing Units . . . . . . . . . . 8 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 PreviousWork 13 2.1 The latest work on desktop GPU . . . . . . . . . . . . . . . . . . 13 2.2 Previous work on mobile GPUs . . . . . . . . . . . . . . . . . . . 15 3 Universal Scheduler 19 3.1 Overview of shader . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Vertex Shader . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Pixel Shader . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Conventional instruction and data accessing . . . . . . . . . . . . 22 3.3 Proposed Resource Scheduler . . . . . . . . . . . . . . . . . . . . 25 3.3.1 Resource Scheduler Overview . . . . . . . . . . . . . . . 25 3.3.2 Instruction Scheduler . . . . . . . . . . . . . . . . . . . . 26 3.3.3 Constant Scheduler . . . . . . . . . . . . . . . . . . . . . 29 3.4 Proposed Task Scheduler . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Shader Configuration . . . . . . . . . . . . . . . . . . . . 40 3.4.2 Task Selection . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.3 Task Scheduling and Assignment . . . . . . . . . . . . . 41 3.5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5.1 Proposed Scheduler Architecture Overview . . . . . . . . 42 3.5.2 Task Scheduler . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.3 Resource Scheduler . . . . . . . . . . . . . . . . . . . . . 45 4 Experimental Results 59 4.1 Simulation Environment and Setup . . . . . . . . . . . . . . . . . 59 4.2 Experiments about task scheduler . . . . . . . . . . . . . . . . . 60 4.2.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.2 Task scheduling methods . . . . . . . . . . . . . . . . . . 63 4.3 Experiments about resource scheduler . . . . . . . . . . . . . . . 66 4.3.1 Simple Scenes with the Same Buffer Size . . . . . . . . . 67 4.3.2 Simple Scenes with Larger Buffer Size on Cache Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.3 Different Buffer Size . . . . . . . . . . . . . . . . . . . . 72 4.3.4 Rendering Resolution . . . . . . . . . . . . . . . . . . . 72 4.3.5 Multi-Core Configuration . . . . . . . . . . . . . . . . . 77 4.3.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5 The Implementation of Multi-Core Graphics Processing Units for Mobile Multimedia Applications 81 5.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Synthesis Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4 Chip Layout and Specification . . . . . . . . . . . . . . . . . . . 85 5.5 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6 Conclusion 91 Reference 93
dc.language.iso	zh-TW
dc.subject	資源排程器	zh_TW
dc.subject	多核心	zh_TW
dc.subject	三維繪圖系統	zh_TW
dc.subject	工作排程器	zh_TW
dc.subject	Mobile Devices	en
dc.subject	Resource Scheduler	en
dc.subject	Task Scheduler	en
dc.subject	Multi-Core GPUs	en
dc.title	適用於可攜式裝置之多核心三維繪圖系統之工作排程器與資源排程器	zh_TW
dc.title	Task Scheduler and Resource Scheduler on Multi-Core GPUs for Mobile Devices	en
dc.type	Thesis
dc.date.schoolyear	99-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳炳宇,楊佳玲,范倫達
dc.subject.keyword	多核心,三維繪圖系統,工作排程器,資源排程器,	zh_TW
dc.subject.keyword	Multi-Core GPUs,Mobile Devices,Task Scheduler,Resource Scheduler,	en
dc.relation.page	96
dc.rights.note	有償授權
dc.date.accepted	2010-10-20
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電子工程學研究所	zh_TW
顯示於系所單位：	電子工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-99-1.pdf 未授權公開取用	13.35 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。