Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor楊佳玲
dc.contributor.authorMing-Che Chiangen
dc.contributor.author江明哲zh_TW
dc.date.accessioned2021-06-16T17:42:50Z-
dc.date.available2017-08-20
dc.date.copyright2012-08-28
dc.date.issued2012
dc.date.submitted2012-08-14
dc.identifier.citation[1] Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing
Data with CUDA, 2009.
[2] VOCL: An Optimized Environment for Transparent Virtualization of Graphics
Processing Units, 2010.
[3] Dawning. Nebulae - dawning tc3600 blade system, 2010.
[4] J. Duato, A. Pen anda, F. Silla, R. Mayo, and E. Quintana-Orti. rcuda: Reduc-
ing the number of gpu-based accelerators in high performance clusters. In High
Performance Computing and Simulation (HPCS), 2010 International Confer-
ence on, pages 224 {231, 28 2010-july 2 2010.
[5] W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt. Dynamic warp for-
mation: E cient mimd control
ow on simd graphics hardware. ACM Trans.
Archit. Code Optim., 6(2):7:1{7:37, July 2009.
[6] G. Giunta, R. Montella, G. Agrillo, and G. Coviello. A gpgpu transparent vir-
tualization component for high performance computing clouds. In Proceedings
of the 16th international Euro-Par conference on Parallel processing: Part I,
EuroPar'10, pages 379{391, Berlin, Heidelberg, 2010. Springer-Verlag.
[7] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V. Talwar, and
P. Ranganathan. Gvim: Gpu-accelerated virtual machines. In Proceedings of
the 3rd ACM Workshop on System-level Virtualization for High Performance
Computing, HPCVirt '09, pages 17{24, New York, NY, USA, 2009. ACM.
[8] S. Kato, K. Lakshmanan, A. Kumar, M. Kelkar, Y. Ishikawa, and R. R. Ra-
jkumar. Rgem: A responsive gpgpu execution model for runtime engines. In
Proceedings of the 2011 IEEE 32nd Real-Time Systems Symposium, RTSS '11,
pages 57{66, Washington, DC, USA, 2011. IEEE Computer Society.
[9] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa. Timegraph: Gpu
scheduling for real-time multi-tasking environments. In Proceedings of the 2011
USENIX conference on USENIX annual technical conference, USENIXATC'11,
pages 2{2, Berkeley, CA, USA, 2011. USENIX Association.
[10] S. Laosooksathit, C. Chandler, and K. Chanchio. Lightweight checkpoint mech-
anism and modeling in gpgpu environment. Computing HPC, 12:13, 2010.
[11] S. Laosooksathit, N. Naksinehaboon, and C. Leangsuksan. Two-level check-
point/restart modeling for gpgpu. In Computer Systems and Applications
(AICCSA), 2011 9th IEEE/ACS International Conference on, pages 276 {283,
dec. 2011.
[12] Y. Liu, D. Maskell, and B. Schmidt. Cudasw++: optimizing smith-waterman
sequence database searches for cuda-enabled graphics processing units. BMC
Research Notes, 2(1):73, 2009.
[13] A. Nukada, H. Takizawa, and S. Matsuoka. Nvcr: A transparent checkpoint-
restart library for nvidia cuda. In Parallel and Distributed Processing Work-
shops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on,
pages 104 {113, may 2011.
[14] NVIDIA. Nvidias next generation cuda compute architecture: Fermi, 2010.
[15] NVIDIA. Nvidia cuda architecture, 2011.
[16] N. U. of Defense Technology. Tianhe-1 - nudt th-1 cluster, 2010.
[17] T. I. of Technology. Tsubame grid cluster, 2011.
[18] V. Podlozhnyuk. Black-scholes option pricing, 2007.
[19] L. Shi, H. Chen, and J. Sun. vcuda: Gpu accelerated high performance com-
puting in virtual machines. In Parallel Distributed Processing, 2009. IPDPS
2009. IEEE International Symposium on, pages 1 {11, may 2009.
[20] L. Solano-Quinde, B. Bode, and A. Somani. Coarse grain computation-
communication overlap for e cient application-level checkpointing for gpus. In
Electro/Information Technology (EIT), 2010 IEEE International Conference
on, pages 1 {5, may 2010.
[21] H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi. Checl:
Transparent checkpointing and process migration of opencl applications. In
Proceedings of the 2011 IEEE International Parallel & Distributed Processing
Symposium, IPDPS '11, pages 864{876, Washington, DC, USA, 2011. IEEE
Computer Society.
[22] H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi. Checuda: A check-
point/restart tool for cuda applications. In Parallel and Distributed Comput-
ing, Applications and Technologies, 2009 International Conference on, pages
408 {413, dec. 2009.
[23] S. Xiao, P. Balaji, J. Dinan, Q. Zhu, R. Thakur, S. Coghlan, H. Lin, G. Wen,
J. Hong, and W. chun Feng. Transparent accelerator migration in a virtual-
ized gpu environment. Cluster Computing and the Grid, IEEE International
Symposium on, 0:124{131, 2012.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64361-
dc.description.abstractUnlike classic video cards, modern Graphics Processing Unit (GPU) has been widely
used for general-purpose computation acceleration. GPU clusters have become very
popular in the field of High Performance Computing (HPC), three of the top five
super-computers utilize GPUs. Migration is a critical capability in such large-scale
GPU cluster system environment because it supports resource management, load
balancing, system maintenance and fault tolerance.
In this thesis, we propose a mechanism for GPGPU applications. We have two
contributions. The first contribution is we support preempt-like functionality in
GPUs. When migration is triggered, CPU can pro-actively terminate GPU compu-
tation instead of waiting task complete, therefore the migration can start earlier.
The second contribution is we hide data transfer overhead by overlapping compu-
tation with transfers. We send back the part of completion data while GPU is
computing. When migration is triggered, we pre-copy the part of completion data
to overlap computation with transfers.
In the experimental results, we demonstrate that our proposed mechanism quickly
releases GPU resource when migration is triggered and also decreases the migration
overhead resulted from transfers.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T17:42:50Z (GMT). No. of bitstreams: 1
ntu-101-R99922088-1.pdf: 1837308 bytes, checksum: f75325ee340c82f0b9e7d42d016b386b (MD5)
Previous issue date: 2012
en
dc.description.tableofcontentsAbstract i
1 Introduction 1
2 Related Works 5
2.1 Checkpoint/Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Task Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Overview of GPU Design 8
3.1 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 GPGPU Programming Model . . . . . . . . . . . . . . . . . . . . . . 9
3.3 CUDA Stream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 GPU Virtualization and API Remoting . . . . . . . . . . . . . . . . . 11
4 GPU Live Migration Design 13
4.1 Migration Challenges and Our Solutions by Using CUDA Streams . . 13
4.1.1 Wait for Kernel Completion Time . . . . . . . . . . . . . . . . 13
4.1.2 Network Performance Overhead . . . . . . . . . . . . . . . . . 14
4.2 Reduce Wait for Kernel Completion Time . . . . . . . . . . . . . . . 15
4.2.1 Support Preempt-like Functionality . . . . . . . . . . . . . . . 15
4.2.2 Reduce Re-Computing Overhead . . . . . . . . . . . . . . . . 16
4.3 Reduce Migration Performance Overhead . . . . . . . . . . . . . . . . 17
4.3.1 Overlap Computation with PCIe Transfers and Network Trans-
fers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.2 Record Memory Layout Accessed by Blocks . . . . . . . . . . 19
ii
4.3.3 Memory Layout Analysis . . . . . . . . . . . . . . . . . . . . . 20
4.4 Add Instruction by Modifying PTX . . . . . . . . . . . . . . . . . . . 23
5 Experimental Results 24
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Wait for Kernel Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Total Migration Overhead . . . . . . . . . . . . . . . . . . . . . . . . 28
5.5 Additional Storage Overhead . . . . . . . . . . . . . . . . . . . . . . . 31
6 Conclusion 32
Bibliography 33
dc.language.isoen
dc.subject虛擬化zh_TW
dc.subject統一計算架構串流zh_TW
dc.subject斷點重啟zh_TW
dc.subject圖形處理器zh_TW
dc.subject動態遷移zh_TW
dc.subjectMigrationen
dc.subjectGPUen
dc.subjectVirtualizationen
dc.subjectCheckpoint/Restarten
dc.subjectCUDA Streamen
dc.title通用圖形處理器程序在圖形處理器叢集上之動態遷移zh_TW
dc.titleLive Migration for GPGPU Applications on GPU Clusteren
dc.typeThesis
dc.date.schoolyear100-2
dc.description.degree碩士
dc.contributor.oralexamcommittee陳維超,洪士灝
dc.subject.keyword動態遷移,圖形處理器,虛擬化,斷點重啟,統一計算架構串流,zh_TW
dc.subject.keywordMigration,GPU,Virtualization,Checkpoint/Restart,CUDA Stream,en
dc.relation.page35
dc.rights.note有償授權
dc.date.accepted2012-08-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-101-1.pdf
  未授權公開取用
1.79 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved