請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 楊佳玲 | |
| dc.contributor.author | Ming-Che Chiang | en |
| dc.contributor.author | 江明哲 | zh_TW |
| dc.date.accessioned | 2021-06-16T17:42:50Z | - |
| dc.date.available | 2017-08-20 | |
| dc.date.copyright | 2012-08-28 | |
| dc.date.issued | 2012 | |
| dc.date.submitted | 2012-08-14 | |
| dc.identifier.citation | [1] Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing
Data with CUDA, 2009. [2] VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units, 2010. [3] Dawning. Nebulae - dawning tc3600 blade system, 2010. [4] J. Duato, A. Pen anda, F. Silla, R. Mayo, and E. Quintana-Orti. rcuda: Reduc- ing the number of gpu-based accelerators in high performance clusters. In High Performance Computing and Simulation (HPCS), 2010 International Confer- ence on, pages 224 {231, 28 2010-july 2 2010. [5] W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp for- mation: E cient mimd control ow on simd graphics hardware. ACM Trans. Archit. Code Optim., 6(2):7:1{7:37, July 2009. [6] G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent vir- tualization component for high performance computing clouds. In Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, EuroPar'10, pages 379{391, Berlin, Heidelberg, 2010. Springer-Verlag. [7] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of the 3rd ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt '09, pages 17{24, New York, NY, USA, 2009. ACM. [8] S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. R. Ra- jkumar. Rgem: A responsive gpgpu execution model for runtime engines. In Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium, RTSS '11, pages 57{66, Washington, DC, USA, 2011. IEEE Computer Society. [9] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: Gpu scheduling for real-time multi-tasking environments. In Proceedings of the 2011 USENIX conference on USENIX annual technical conference, USENIXATC'11, pages 2{2, Berkeley, CA, USA, 2011. USENIX Association. [10] S. Laosooksathit, C. Chandler, and K. Chanchio. Lightweight checkpoint mech- anism and modeling in gpgpu environment. Computing HPC, 12:13, 2010. [11] S. Laosooksathit, N. Naksinehaboon, and C. Leangsuksan. Two-level check- point/restart modeling for gpgpu. In Computer Systems and Applications (AICCSA), 2011 9th IEEE/ACS International Conference on, pages 276 {283, dec. 2011. [12] Y. Liu, D. Maskell, and B. Schmidt. Cudasw++: optimizing smith-waterman sequence database searches for cuda-enabled graphics processing units. BMC Research Notes, 2(1):73, 2009. [13] A. Nukada, H. Takizawa, and S. Matsuoka. Nvcr: A transparent checkpoint- restart library for nvidia cuda. In Parallel and Distributed Processing Work- shops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on, pages 104 {113, may 2011. [14] NVIDIA. Nvidias next generation cuda compute architecture: Fermi, 2010. [15] NVIDIA. Nvidia cuda architecture, 2011. [16] N. U. of Defense Technology. Tianhe-1 - nudt th-1 cluster, 2010. [17] T. I. of Technology. Tsubame grid cluster, 2011. [18] V. Podlozhnyuk. Black-scholes option pricing, 2007. [19] L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance com- puting in virtual machines. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1 {11, may 2009. [20] L. Solano-Quinde, B. Bode, and A. Somani. Coarse grain computation- communication overlap for e cient application-level checkpointing for gpus. In Electro/Information Technology (EIT), 2010 IEEE International Conference on, pages 1 {5, may 2010. [21] H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. Checl: Transparent checkpointing and process migration of opencl applications. In Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, IPDPS '11, pages 864{876, Washington, DC, USA, 2011. IEEE Computer Society. [22] H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi. Checuda: A check- point/restart tool for cuda applications. In Parallel and Distributed Comput- ing, Applications and Technologies, 2009 International Conference on, pages 408 {413, dec. 2009. [23] S. Xiao, P. Balaji, J. Dinan, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen, J. Hong, and W. chun Feng. Transparent accelerator migration in a virtual- ized gpu environment. Cluster Computing and the Grid, IEEE International Symposium on, 0:124{131, 2012. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361 | - |
| dc.description.abstract | Unlike classic video cards, modern Graphics Processing Unit (GPU) has been widely
used for general-purpose computation acceleration. GPU clusters have become very popular in the field of High Performance Computing (HPC), three of the top five super-computers utilize GPUs. Migration is a critical capability in such large-scale GPU cluster system environment because it supports resource management, load balancing, system maintenance and fault tolerance. In this thesis, we propose a mechanism for GPGPU applications. We have two contributions. The first contribution is we support preempt-like functionality in GPUs. When migration is triggered, CPU can pro-actively terminate GPU compu- tation instead of waiting task complete, therefore the migration can start earlier. The second contribution is we hide data transfer overhead by overlapping compu- tation with transfers. We send back the part of completion data while GPU is computing. When migration is triggered, we pre-copy the part of completion data to overlap computation with transfers. In the experimental results, we demonstrate that our proposed mechanism quickly releases GPU resource when migration is triggered and also decreases the migration overhead resulted from transfers. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T17:42:50Z (GMT). No. of bitstreams: 1 ntu-101-R99922088-1.pdf: 1837308 bytes, checksum: f75325ee340c82f0b9e7d42d016b386b (MD5) Previous issue date: 2012 | en |
| dc.description.tableofcontents | Abstract i
1 Introduction 1 2 Related Works 5 2.1 Checkpoint/Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Task Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Overview of GPU Design 8 3.1 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 GPGPU Programming Model . . . . . . . . . . . . . . . . . . . . . . 9 3.3 CUDA Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.4 GPU Virtualization and API Remoting . . . . . . . . . . . . . . . . . 11 4 GPU Live Migration Design 13 4.1 Migration Challenges and Our Solutions by Using CUDA Streams . . 13 4.1.1 Wait for Kernel Completion Time . . . . . . . . . . . . . . . . 13 4.1.2 Network Performance Overhead . . . . . . . . . . . . . . . . . 14 4.2 Reduce Wait for Kernel Completion Time . . . . . . . . . . . . . . . 15 4.2.1 Support Preempt-like Functionality . . . . . . . . . . . . . . . 15 4.2.2 Reduce Re-Computing Overhead . . . . . . . . . . . . . . . . 16 4.3 Reduce Migration Performance Overhead . . . . . . . . . . . . . . . . 17 4.3.1 Overlap Computation with PCIe Transfers and Network Trans- fers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3.2 Record Memory Layout Accessed by Blocks . . . . . . . . . . 19 ii 4.3.3 Memory Layout Analysis . . . . . . . . . . . . . . . . . . . . . 20 4.4 Add Instruction by Modifying PTX . . . . . . . . . . . . . . . . . . . 23 5 Experimental Results 24 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Wait for Kernel Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Total Migration Overhead . . . . . . . . . . . . . . . . . . . . . . . . 28 5.5 Additional Storage Overhead . . . . . . . . . . . . . . . . . . . . . . . 31 6 Conclusion 32 Bibliography 33 | |
| dc.language.iso | en | |
| dc.subject | 虛擬化 | zh_TW |
| dc.subject | 統一計算架構串流 | zh_TW |
| dc.subject | 斷點重啟 | zh_TW |
| dc.subject | 圖形處理器 | zh_TW |
| dc.subject | 動態遷移 | zh_TW |
| dc.subject | Migration | en |
| dc.subject | GPU | en |
| dc.subject | Virtualization | en |
| dc.subject | Checkpoint/Restart | en |
| dc.subject | CUDA Stream | en |
| dc.title | 通用圖形處理器程序在圖形處理器叢集上之動態遷移 | zh_TW |
| dc.title | Live Migration for GPGPU Applications on GPU Cluster | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 100-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳維超,洪士灝 | |
| dc.subject.keyword | 動態遷移,圖形處理器,虛擬化,斷點重啟,統一計算架構串流, | zh_TW |
| dc.subject.keyword | Migration,GPU,Virtualization,Checkpoint/Restart,CUDA Stream, | en |
| dc.relation.page | 35 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2012-08-14 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-101-1.pdf 未授權公開取用 | 1.79 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
