請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96374完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 劉邦鋒 | zh_TW |
| dc.contributor.advisor | Pangfeng Liu | en |
| dc.contributor.author | 林采鋒 | zh_TW |
| dc.contributor.author | Cai-Feng Lin | en |
| dc.date.accessioned | 2025-02-13T16:10:55Z | - |
| dc.date.available | 2025-02-14 | - |
| dc.date.copyright | 2025-02-13 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-02-06 | - |
| dc.identifier.citation | [1] E. G. Coffman, Jr, M. R. Garey, and D. S. Johnson. An application of bin-packing to multiprocessor scheduling. SIAM Journal on Computing, 7(1):1–17, 1978.
[2] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin. Graph neural networks for social recommendation. In The world wide web conference, pages 417–426, 2019. [3] M. Fey and J. E. Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019. [4] D. K. Friesen. Tighter bounds for the multifit processor scheduling algorithm. SIAM Journal on Computing, 13(1):170–181, 1984. [5] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sg d: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017. [6] R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969. [7] W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017. [8] S. Hochreiter. Long short-term memory. Neural Computation MIT-Press, 1997. [9] T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016. [10] S. Li, Y. Zhao, R. Varma, O. Salpekar, P. Noordhuis, T. Li, A. Paszke, J. Smith, B. Vaughan, P. Damania, et al. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704, 2020. [11] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019. [12] M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018. [13] K. S. Tai, R. Socher, and C. D. Manning. Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075, 2015. [14] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017. [15] M. Y. Wang. Deep graph library: Towards efficient and scalable deep learning on graphs. In ICLR workshop on representation learning on graphs and manifolds, 2019. [16] X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In The world wide web conference, pages 2022–2032, 2019. [17] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 974–983, 2018. [18] Y. You, J. Li, S. Reddi, J. Hseu, S. Kumar, S. Bhojanapalli, X. Song, J. Demmel, K. Keutzer, and C.-J. Hsieh. Large batch optimization for deep learning: Training bert in 76 minutes. arXiv preprint arXiv:1904.00962, 2019. [19] B. Yu, H. Yin, and Z. Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875, 2017. [20] M. Yue. On the exact upper bound for the multifit processor scheduling algorithm. Annals of Operations Research, 24(1):233–259, 1990. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96374 | - |
| dc.description.abstract | 圖神經網路是深度學習中處理結構化數據的重要工具,其中圖由節點與邊組成,用於表示實體及其關係。當圖神經網路呈樹狀結構時會面臨各種挑戰,例如不規則的結構與變化的深度。將其分配與並行執行於多張圖形處理器上極具挑戰性。此外,樹狀結構的相依性要求先處理父節點再處理子節點,這嚴重限制了執行的並行性。
本研究旨在提升樹狀得圖神經網路在多圖形處理器系統上的訓練速度。首先,我們引入一個成本模型來估算多圖形處理器訓練的執行時間。接著,我們證明在該成本模型下,尋找最佳的樹狀數據分配方式是一個NP完全問題。因此,我們提出一種啟發式方法來分配數據,以提高效率並維持訓練品質。此方法首先基於成本模型將數據分配至不同批次,然後在每個批次內將數據分配至多個圖形處理器。我們進一步證明,我們的分配演算法是一個四倍近似演算法,確保其在實務上具有效率。 我們實作了該演算法並進行實驗。結果顯示,我們的演算法在訓練時間上獲得顯著提升。當使用兩張圖形處理器時加速比達 1.86,四張圖形處理器達 3.43,而八張圖形處理器達 7.25。 | zh_TW |
| dc.description.abstract | Graph Neural Network (GNN) is an important tool in deep learning to handle structured data, where graphs with nodes and edges represent entities and their relationships. Various challenges arise when GNN is tree-shaped, with irregular connectivity patterns and varying depth. It is difficult to distribute and process the dynamic structure for parallel execution on multiple GPUs. In addition, tree data dependency demands the processing of parent nodes before their children, severely limiting execution parallelism.
This research aims to improve the training speed of tree-shaped GNN on multi-GPU systems. First, we introduce a cost model that estimates the running time of the training across multiple GPUs. Then, we demonstrate that finding an optimal way to distribute tree-structured data across GPUs is an NP-complete problem on this cost model. We then propose a practical heuristic method for distributing data that improves efficiency while maintaining training quality. The heuristic method first assigns data to batches based on our cost model and then assigns data in each batch to the devices. We also show that our device-assigning algorithm is a 4-approximation algorithm. That is, it guarantees that its cost is four times the optimal running time in each training batch, ensuring that it performs effectively in practice. We implement the algorithm and conduct the experiments. The results show that our algorithm achieves a significant increase in training time. The speedup is up to 1.86 for two GPUs, 3.43 for four GPUs, and 7.25 for eight GPUs. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-02-13T16:10:55Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-02-13T16:10:55Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員審定書 i
Acknowledgements ii 摘要 iv Abstract v Contents vii List of Figures ix List of Tables x Chapter 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 2 Related Work 4 2.1 Graph Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Data Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Processing Tree-Shaped Data . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Partition Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3 Tree Grouping Problem 9 3.1 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 NP-Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 4 Scheme 13 4.1 Greedy Grouping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.1 Assigning Trees to Batches . . . . . . . . . . . . . . . . . . . . . . 13 4.1.2 Assigning Trees to Devices in a Batch . . . . . . . . . . . . . . . . . 14 4.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 Sequential Execution with One Device . . . . . . . . . . . . . . . . . . . 19 Chapter 5 Experiment 22 5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.2 Hardware Configuration . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1.5 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Cost Model Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Analysis of Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4 Analysis of Model Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 6 Conclusion and Future Work 30 References 31 | - |
| dc.language.iso | en | - |
| dc.subject | NP完備 | zh_TW |
| dc.subject | 近似演算法 | zh_TW |
| dc.subject | 分區問題 | zh_TW |
| dc.subject | 圖神經網路 | zh_TW |
| dc.subject | 深度學習 | zh_TW |
| dc.subject | Approximation Algorithm | en |
| dc.subject | Deep Learning | en |
| dc.subject | Graph Neural Network | en |
| dc.subject | Partition Problem | en |
| dc.subject | NP-Completeness | en |
| dc.title | 多圖形處理器環境下訓練樹狀模型的高效能資料分群演算法 | zh_TW |
| dc.title | A Grouping Algorithm for Training Tree-Shaped Models on Multiple GPUs with High Efficiency | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 吳真貞;洪鼎詠 | zh_TW |
| dc.contributor.oralexamcommittee | Jan-Jan Wu;Ding-Yong Hong | en |
| dc.subject.keyword | 深度學習,圖神經網路,分區問題,NP完備,近似演算法, | zh_TW |
| dc.subject.keyword | Deep Learning,Graph Neural Network,Partition Problem,NP-Completeness,Approximation Algorithm, | en |
| dc.relation.page | 33 | - |
| dc.identifier.doi | 10.6342/NTU202500427 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2025-02-06 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| dc.date.embargo-lift | 2025-02-14 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf | 694.87 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
