OpenCL 2.0 模擬器開發及程式特性分析

Li Wang; 王立

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76452

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲
dc.contributor.author	Li Wang	en
dc.contributor.author	王立	zh_TW
dc.date.accessioned	2021-07-09T15:52:34Z	-
dc.date.available	2021-11-02
dc.date.copyright	2016-11-02
dc.date.issued	2016
dc.date.submitted	2016-08-15
dc.identifier.citation	[1] Amd’s app sdk download page. http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/. [2] clang: a c language family frontend for llvm. http://clang.llvm.org/. [3] Hsa foundation. http://www.hsafoundation.com/. [4] Khronos group. https://www.khronos.org/. [5] Nvidia fermi architecture whitepaper. http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. [6] Nvidia geforce gtx 980 whitepaper. http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF. [7] Nvidia kepler gk110 architecture whitepaper. http://www.nvidia.com.tw/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf. [8] The opencl 2.0 specification. https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf. [9] A. Ashari, S. Tatikonda, M. Boehm, B. Reinwald, K. Campbell, J. Keenleyside, and P. Sadayappan. On optimizing machine learning workloads via kernel fusion.SIGPLAN Not., 50(8):173–182, jan 2015. [10] A. Bakhoda, G. L. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on, pages 163–174, April 2009. [11] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7, Aug. 2011. [12] S. Che, B. M. Beckmann, S. K. Reinhardt, and K. Skadron. Pannotia: Understanding irregular gpgpu graph applications. 2013 IEEE International Symposium on Workload Characterization (IISWC), 2013. [13] S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC ’09, pages 44–54, Washington, DC, USA, 2009. IEEE Computer Society. [14] B. Gaster. Heterogeneous Computing with OpenCL. Morgan Kaufmann, 2012. [15] N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete fourier transforms on graphics processors. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC ’08, pages 2:1–2:12, Piscataway, NJ, USA, 2008. IEEE Press. [16] D. Kaeli, P. Mistry, D. Schaa, and D. Zhang. Heterogeneous Computing with OpenCL 2.0. Elsevier Science, 2015. [17] C. Lattner and V. Adve. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO ’04, pages 75–, Washington, DC, USA, 2004. IEEE Computer Society. [18] D. Lowe. Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image, Mar. 23 2004. US Patent 6,711,293. [19] D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, Nov. 2004. [20] B. D. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’81, pages 674–679, San Francisco, CA, USA, 1981. Morgan Kaufmann Publishers Inc. [21] S. Mukherjee, Y. Sun, P. Blinzer, A. K. Ziabari, and D. Kaeli. A comprehensive performance analysis of hsa and opencl 2.0. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 183–193, April 2016. [22] C. H. Nadungodage, Y. Xia, J. J. Lee, M. Lee, and C. S. Park. Gpu accelerated item-based collaborative filtering for big-data applications. In Big Data, 2013 IEEE International Conference on, pages 175–180, Oct 2013. [23] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120. [24] J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Heterogeneous system coherence for integrated cpu-gpu systems. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, pages 457–467, New York, NY, USA, 2013. ACM. [25] J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters, 14(1):34–36, Jan 2015. [26] P. Shirley and R. Morley. Realistic Ray Tracing, Second Edition. Ak Peters Series. Taylor & Francis, 2003. [27] J. E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science Engineering, 12(3):66–73, May 2010. [28] J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Center for Reliable and High-Performance Computing, 2012. [29] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Proceedings of the Sixth International Conference on Computer Vision, ICCV ’98, pages 839–, Washington, DC, USA, 1998. IEEE Computer Society. [30] R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, pages 335–344, New York, NY, USA, 2012. ACM. [31] J. Wang, N. Rubin, A. Sidelnik, and S. Yalamanchili. Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus. SIGARCH Comput. Archit. News, 43(3):528–540, June 2015. [32] J. Wang and S. Yalamanchili. Characterization and analysis of dynamic parallelism in unstructured gpu applications. In Workload Characterization (IISWC), 2014 IEEE International Symposium on, pages 51–60. IEEE, 2014. [33] P. H. Wang, G. H. Liu, J. C. Yeh, T. M. Chen, H. Y. Huang, C. L. Yang, S. L. Liu, and J. Greensky. Full system simulation framework for integrated cpu/gpu architecture. In VLSI Design, Automation and Test (VLSI-DAT), 2014 International Symposium on, pages 1–4, April 2014. [34] Q. Yang. Recursive bilateral filtering. In Proceedings of the 12th European Conference on Computer Vision - Volume Part I, ECCV’12, pages 399–413, Berlin, Heidelberg, 2012. Springer-Verlag. [35] J. yves Bouguet. Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 2000.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76452	-
dc.description.abstract	GPU 在異質系統中的定位，已經從過去的圖形加速器，演變到如今能夠處理各種類型的大量運算，也就是所謂的GPGPU 架構。為了能夠更好地運用GPU 強大的運算能力，在未來的異質系統架構上，CPU 和GPU 將會更加緊密地整合在一起。這種架構上的演進為系統架構研究的領域提供了許多不同設計方向上的可能性，然而因為學術界目前缺乏這樣的CPU 和GPU 整合的異質系統模擬器，直到目前在這個領域上並沒有太多的研究成果。本篇論文將修改一個現有的模擬器gem5-gpu，使其能支援異質運算標準OpenCL 2.0。選擇OpenCL 是因為OpenCL 現今已被各家廠商的硬體所支援，因此我們相信OpenCL 這個標準足夠代表未來的異質系統架構和運算標準。除此之外我們也會在修改過後的模擬器上估量OpenCL 2.0 標準中新增加的功能對程式效能的影響，這些功能包括了動態平行、共享虛擬記憶體和原子運算，它們提供了GPU 更強大的運算功能以及CPU 和GPU 間的資料共享，更能體現異質運算功能的強大。	zh_TW
dc.description.abstract	GPU as a computing node in a heterogeneous system, has evolved from an accelerator to a general-purpose computing device that can handle various kinds of tasks. To better utilize the computing power of GPUs, many future heterogeneous systems will integrate CPUs and GPUs more closely. Such heterogeneous system architecture exposes many future architecture research domain, but the lack of a heterogeneous system simulator stops researchers from further exploring this domain. In this thesis, we’ll extend the existing integrated CPU-GPU simulator gem5-gpu to support OpenCL 2.0 standard. We believe that OpenCL as a standard widely adapted by industry will best represent the future design of heterogeneous systems. In addition, we’ll conduct some evaluation on our simulator to see the impact of the new features introduced in OpenCL 2.0. These features including device kernel enqueue, shared virtual memory, and enhanced atomic operations, make GPUs computing capability even stronger and enable the opportunity of fine-grained data sharing between CPUs and GPUs, which can demonstrate the powerfulness of heterogeneous computing.	en
dc.description.provenance	Made available in DSpace on 2021-07-09T15:52:34Z (GMT). No. of bitstreams: 1 ntu-105-R03922025-1.pdf: 1363671 bytes, checksum: 8e8c83704e8221300c2cab91fd9ffb84 (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	致謝 i 摘要 iii Abstract iv 1 Introduction 1 2 Background 3 2.1 Baseline Heterogeneous System and gem5-gpu 3 2.2 GPU Architecture and GPU Programming Model 3 2.3 OpenCL 5 2.3.1 Shared Virtual Memory 6 2.3.2 Dynamic Parallelism 7 2.3.3 Platform Atomics and Enhanced Atomic Operations 8 2.3.4 Work-Group Built-in Functions 8 2.4 HSA 1.0 9 3 Simulator Development 11 3.1 Simulator Overview 12 3.2 Customized OpenCL Host API 14 3.3 OpenCL to PTX Compiler 14 3.4 Support newer version of PTX 15 3.4.1 Dynamic Parallelism 15 3.4.2 Platform Atomics and Enhanced Atomic Functions 16 3.5 Work-Group Built-in Functions 17 4 Evaluation 18 4.1 Benchmarks 18 4.2 Experimental Results 20 4.2.1 Validation 20 4.2.2 Shared Virtual Memory 21 4.2.3 Dynamic Parallelism 22 4.2.4 Work-Group Built-in Functions 24 4.2.5 Platform Atomics 26 5 Related Works 29 5.1 Heterogeneous Computing Simulator 29 5.2 Heterogeneous Workload Analysis 29 6 Conclusion 31 Bibliography 33
dc.language.iso	en
dc.subject	OpenCL	zh_TW
dc.subject	GPGPU 運算	zh_TW
dc.subject	異質運算	zh_TW
dc.subject	模擬器	zh_TW
dc.subject	GPGPU computing	en
dc.subject	Simulator	en
dc.subject	Heterogeneous computing	en
dc.subject	OpenCL	en
dc.title	OpenCL 2.0 模擬器開發及程式特性分析	zh_TW
dc.title	Workload characterization and Simulator Development for OpenCL 2.0	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳依蓉,呂仁碩,陳坤志
dc.subject.keyword	異質運算,GPGPU 運算,OpenCL,模擬器,	zh_TW
dc.subject.keyword	Heterogeneous computing,GPGPU computing,OpenCL,Simulator,	en
dc.relation.page	37
dc.identifier.doi	10.6342/NTU201602672
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2016-08-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
dc.date.embargo-lift	2021-11-02	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-105-R03922025-1.pdf	1.33 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。