通用圖形處理器程序在圖形處理器叢集上之動態遷移

Ming-Che Chiang; 江明哲

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	楊佳玲
dc.contributor.author	Ming-Che Chiang	en
dc.contributor.author	江明哲	zh_TW
dc.date.accessioned	2021-06-16T17:42:50Z	-
dc.date.available	2017-08-20
dc.date.copyright	2012-08-28
dc.date.issued	2012
dc.date.submitted	2012-08-14
dc.identifier.citation	[1] Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA, 2009. [2] VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units, 2010. [3] Dawning. Nebulae - dawning tc3600 blade system, 2010. [4] J. Duato, A. Pen anda, F. Silla, R. Mayo, and E. Quintana-Orti. rcuda: Reduc- ing the number of gpu-based accelerators in high performance clusters. In High Performance Computing and Simulation (HPCS), 2010 International Confer- ence on, pages 224 {231, 28 2010-july 2 2010. [5] W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp for- mation: E cient mimd control ow on simd graphics hardware. ACM Trans. Archit. Code Optim., 6(2):7:1{7:37, July 2009. [6] G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent vir- tualization component for high performance computing clouds. In Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, EuroPar'10, pages 379{391, Berlin, Heidelberg, 2010. Springer-Verlag. [7] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17{24, New York, NY, USA, 2009. ACM. [8] S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. R. Ra- jkumar. Rgem: A responsive gpgpu execution model for runtime engines. In Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium, RTSS '11, pages 57{66, Washington, DC, USA, 2011. IEEE Computer Society. [9] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: Gpu scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC'11, pages 2{2, Berkeley, CA, USA, 2011. USENIX Association. [10] S. Laosooksathit, C. Chandler, and K. Chanchio. Lightweight checkpoint mech- anism and modeling in gpgpu environment. Computing HPC, 12:13, 2010. [11] S. Laosooksathit, N. Naksinehaboon, and C. Leangsuksan. Two-level check- point/restart modeling for gpgpu. In Computer Systems and Applications (AICCSA), 2011 9th IEEE/ACS International Conference on, pages 276 {283, dec. 2011. [12] Y. Liu, D. Maskell, and B. Schmidt. Cudasw++: optimizing smith-waterman sequence database searches for cuda-enabled graphics processing units. BMC Research Notes, 2(1):73, 2009. [13] A. Nukada, H. Takizawa, and S. Matsuoka. Nvcr: A transparent checkpoint- restart library for nvidia cuda. In Parallel and Distributed Processing Work- shops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 104 {113, may 2011. [14] NVIDIA. Nvidias next generation cuda compute architecture: Fermi, 2010. [15] NVIDIA. Nvidia cuda architecture, 2011. [16] N. U. of Defense Technology. Tianhe-1 - nudt th-1 cluster, 2010. [17] T. I. of Technology. Tsubame grid cluster, 2011. [18] V. Podlozhnyuk. Black-scholes option pricing, 2007. [19] L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance com- puting in virtual machines. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1 {11, may 2009. [20] L. Solano-Quinde, B. Bode, and A. Somani. Coarse grain computation- communication overlap for e cient application-level checkpointing for gpus. In Electro/Information Technology (EIT), 2010 IEEE International Conference on, pages 1 {5, may 2010. [21] H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. Checl: Transparent checkpointing and process migration of opencl applications. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 864{876, Washington, DC, USA, 2011. IEEE Computer Society. [22] H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi. Checuda: A check- point/restart tool for cuda applications. In Parallel and Distributed Comput- ing, Applications and Technologies, 2009 International Conference on, pages 408 {413, dec. 2009. [23] S. Xiao, P. Balaji, J. Dinan, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. chun Feng. Transparent accelerator migration in a virtual- ized gpu environment. Cluster Computing and the Grid, IEEE International Symposium on, 0:124{131, 2012.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361	-
dc.description.abstract	Unlike classic video cards, modern Graphics Processing Unit (GPU) has been widely used for general-purpose computation acceleration. GPU clusters have become very popular in the field of High Performance Computing (HPC), three of the top five super-computers utilize GPUs. Migration is a critical capability in such large-scale GPU cluster system environment because it supports resource management, load balancing, system maintenance and fault tolerance. In this thesis, we propose a mechanism for GPGPU applications. We have two contributions. The first contribution is we support preempt-like functionality in GPUs. When migration is triggered, CPU can pro-actively terminate GPU compu- tation instead of waiting task complete, therefore the migration can start earlier. The second contribution is we hide data transfer overhead by overlapping compu- tation with transfers. We send back the part of completion data while GPU is computing. When migration is triggered, we pre-copy the part of completion data to overlap computation with transfers. In the experimental results, we demonstrate that our proposed mechanism quickly releases GPU resource when migration is triggered and also decreases the migration overhead resulted from transfers.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T17:42:50Z (GMT). No. of bitstreams: 1 ntu-101-R99922088-1.pdf: 1837308 bytes, checksum: f75325ee340c82f0b9e7d42d016b386b (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	Abstract i 1 Introduction 1 2 Related Works 5 2.1 Checkpoint/Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Task Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Overview of GPU Design 8 3.1 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 GPGPU Programming Model . . . . . . . . . . . . . . . . . . . . . . 9 3.3 CUDA Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 GPU Virtualization and API Remoting . . . . . . . . . . . . . . . . . 11 4 GPU Live Migration Design 13 4.1 Migration Challenges and Our Solutions by Using CUDA Streams . . 13 4.1.1 Wait for Kernel Completion Time . . . . . . . . . . . . . . . . 13 4.1.2 Network Performance Overhead . . . . . . . . . . . . . . . . . 14 4.2 Reduce Wait for Kernel Completion Time . . . . . . . . . . . . . . . 15 4.2.1 Support Preempt-like Functionality . . . . . . . . . . . . . . . 15 4.2.2 Reduce Re-Computing Overhead . . . . . . . . . . . . . . . . 16 4.3 Reduce Migration Performance Overhead . . . . . . . . . . . . . . . . 17 4.3.1 Overlap Computation with PCIe Transfers and Network Trans- fers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.2 Record Memory Layout Accessed by Blocks . . . . . . . . . . 19 ii 4.3.3 Memory Layout Analysis . . . . . . . . . . . . . . . . . . . . . 20 4.4 Add Instruction by Modifying PTX . . . . . . . . . . . . . . . . . . . 23 5 Experimental Results 24 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Wait for Kernel Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Total Migration Overhead . . . . . . . . . . . . . . . . . . . . . . . . 28 5.5 Additional Storage Overhead . . . . . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 32 Bibliography 33
dc.language.iso	en
dc.subject	虛擬化	zh_TW
dc.subject	統一計算架構串流	zh_TW
dc.subject	斷點重啟	zh_TW
dc.subject	圖形處理器	zh_TW
dc.subject	動態遷移	zh_TW
dc.subject	Migration	en
dc.subject	GPU	en
dc.subject	Virtualization	en
dc.subject	Checkpoint/Restart	en
dc.subject	CUDA Stream	en
dc.title	通用圖形處理器程序在圖形處理器叢集上之動態遷移	zh_TW
dc.title	Live Migration for GPGPU Applications on GPU Cluster	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳維超,洪士灝
dc.subject.keyword	動態遷移,圖形處理器,虛擬化,斷點重啟,統一計算架構串流,	zh_TW
dc.subject.keyword	Migration,GPU,Virtualization,Checkpoint/Restart,CUDA Stream,	en
dc.relation.page	35
dc.rights.note	有償授權
dc.date.accepted	2012-08-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	1.79 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。