Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84656
Title: | 多GPU系統的大型神經網路訓練 Large NN Model Support in Multi-GPU System |
Authors: | Shao-Fu Lin 林芍甫 |
Advisor: | 楊佳玲(Chia-Lin Yang) 楊佳玲(Chia-Lin Yang | yangc.ntu@gmail.com | ), |
Keyword: | 大型模型訓練,GPU,資料平行, Large Model Training,GPU,Data Parallelism, |
Publication Year : | 2022 |
Degree: | 碩士 |
Abstract: | 隨著深度神經網絡 (DNN) 模型越來越加深與加大,克服有限的 GPU 記憶體容量成為訓練大規模神經網絡的主要挑戰之一。應對這一挑戰的一種常用解決方案是利用主記憶體作為外部記憶體,對於 GPU 記憶體進行張量交換(Tensor Swapping) 。但是,由於 PCIe 通道競爭(Channel Contention),在資料平行的訓練系統中,張量交換機制的有效性可能會受到影響。在本文中,我們提出第一個的大型模型訓練框架,該框架協調 GPU 之間的張量交換,從而減輕了 PCIe 的通道競爭。我們設計了兩種類型的協調機制。第一種機制同步不同 GPU 中的執行,以避免同時發出張量交換指令。在第二種機制中,透過為每個 GPU 選擇不相交的(Disjoint)張量,使共享 PCIe 通道的搬移可以錯開。這些方法的有效性取決於 GPU 需要多久同步一次梯度(Gradient)。實驗結果表明,與忽略通道競爭的大型模型訓練相比,所提出的解決方案平均實現了 15% 的加速。 As deep neural networks (DNNs) models are growing deeper and wider, overcoming the limited GPU memory capacity becomes one of the main challenges for training large-scale neural networks. One commonly used solution for this challenge is utilizing the host memory as the external memory for swapping tensors in and out of GPU memories. However, the effectiveness of the tensor swapping mechanism could be impaired in a data-parallel training system due to the contention on the shared PCIe channel to the host. In this paper, we propose the first large model support framework that coordinates the tensor movements among GPUs such that the PCIe channel contention is alleviated. We design two types of coordination mechanisms. The first one synchronizes thread executions in different GPUs to avoid issuing tensor swapping commands at the same time. In the second mechanism, the shared PCIe channel accesses from different GPUs are interleaved via selecting disjoint swapped-out tensors for each GPU. The effectiveness of these two methods depends on how often GPUs need to synchronize on gradients. The experimental results show that, compared to the large model support oblivious of channel contention, the proposed solution achieves 15% speedup on average. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84656 |
DOI: | 10.6342/NTU202201995 |
Fulltext Rights: | 同意授權(限校園內公開) |
metadata.dc.date.embargo-lift: | 2022-09-16 |
Appears in Collections: | 資訊網路與多媒體研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
U0001-0308202200204800.pdf Access limited in NTU ip range | 1.89 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.