利用虛擬節點實現GPU上密集圖高效GNN聚合

林柏杰; Po-Chieh Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99489

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	郭斯彥	zh_TW
dc.contributor.advisor	Sy-Yen Kuo	en
dc.contributor.author	林柏杰	zh_TW
dc.contributor.author	Po-Chieh Lin	en
dc.date.accessioned	2025-09-10T16:26:46Z	-
dc.date.available	2025-09-11	-
dc.date.copyright	2025-09-10	-
dc.date.issued	2025	-
dc.date.submitted	2025-07-28	-
dc.identifier.citation	[1]Z. Zhou, S. Wang, Y. Sun, and M. Zeng, "Graph Neural Networks: A Review ofMethods and Applications," AI Open, vol. 2, pp. 100010, 2021. [2]W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin,“Graph neural networksfor social recommendation,” in The World Wide Web Conference, ser. WWW ’19.New York, NY, USA: Association for Computing Machinery, 2019, p. 417–426.[Online]. Available: https://doi.org/10.1145/3308558.3313488 [3]R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec,“Graph Convolutional Neural Networks for Web-Scale Recommender Systems,”in Proc 24th ACM SIGKDD International Conference on Knowledge Discovery &Data Mining (KDD), London, UK, 2018, pp. 974–983. [4]A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interface prediction usinggraph convolutional networks,” Advances in neural information processing systems,vol. 30, 2017. [5]V. Fung, J. Zhang, E. Juarez, and B. G. Sumpter, “Benchmarking graph neuralnetworks for materials chemistry,” npj Computational Materials, vol. 7, no. 1, p. 84,2021. [6]Q. Tan, N. Liu, and X. Hu, “Deep representation learning for social network analysis,”Frontiers in big Data, vol. 2, p. 2, 2019. [7]K. Huang, J. Zhai, Z. Zheng, Y. Yi, and X. Shen, “Understanding and bridging thegaps in current gnn performance optimizations,” in Proceedings of the 26th ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming, ser.PPoPP ’21. New York, NY, USA: Association for Computing Machinery, 2021, p.119–132. [Online]. Available: https://doi.org/10.1145/3437801.3441585 [8] Y. Wang, B. Feng, G. Li, S. Li, L. Deng, Y. Xie, and Y. Ding, “GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs,” in 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). USENIX Association, Jul. 2021, pp. 515–531. [Online]. Available: https://www.usenix.org/conference/osdi21/presentation/wang-yuk [9] J. Arai, H. Shiokawa, T. Yamamuro, M. Onizuka, and S. Iwamura,“Rabbit order: Just-in-time parallel reordering for fast graph analysis,” in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016, pp. 22–31. [10] Z. Hu, J. Sun, Z. Li, and G. Sun, “Pckgnn: Optimizing aggregation operators with packing strategies in graph neural networks,” in 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2024, pp. 2–13. [11] Z. Jia, S. Lin, R. Ying, J. You, J. Leskovec, and A. Aiken, “Redundancy- free computation for graph neural networks,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ser. KDD ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 997–1005. [Online]. Available: https://doi.org/10.1145/3394486.3403142 [12] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu,“A Comprehensive Survey on Graph Neural Networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021. [13] T. N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," in International Conference on Learning Representations (ICLR), Toulon, France, 2017. [14] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive Representation Learning on Large Graphs,” in Proc 31st Conference on Neural Information Processing Systems(NeurIPS), Long Beach, CA, USA, 2017, pp. 1024–1034. [15]P. Veličković, I. Casanova, A. F. Ortega, M. O. L. S. Saerens, and J. M. Tompson,"Graph Attention Networks," in International Conference on LearningRepresentations (ICLR), Vancouver, BC, Canada, 2018. [16]J. Li, P. He, Y. Li, S. Wu, Y. Wen, and J. Wu, "Efficient Sparse Matrix-Dense MatrixMultiplication on GPUs," IEEE Transactions on Parallel and Distributed Systems,vol. 31, no. 8, pp. 1968-1981, Aug. 2020. [17]P. Fu, Y. Gong, T. Tian, W. Tan, and G. Long, "Graph Processing on GPUs: ASurvey," ACM Computing Surveys, vol. 54, no. 12s, pp. 248:1-248:38, Dec. 2022. [18]NVIDIA, "NVIDIA CUDA C++ Programming Guide," NVIDIA, Santa Clara, CA,USA, Document Version 12.4, Mar. 2024. [Online]. Available:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html [19]N. Merkel, P. Toussing, R. Mayer, and H.-A. Jacobsen, "Can Graph ReorderingSpeed Up Graph Neural Network Training? An Experimental Study," in Proceedingsof the VLDB Endowment (PVLDB), vol. 18, no. 13, pp. 2486-2499, Sep. 2025. [20]S. Markidis, S. W. D. Chien, E. Laure, I. B. Peng, and J. S. Vetter,“NVIDIA TensorCore Programmability, Performance & Precision,” in Proc. IEEE Int’l Conf. Paralleland Distributed Processing Symposium Workshops (IPDPSW), 2018, pp. 522–531.doi:10.1109/IPDPSW.2018.00091 [21]NVIDIA Corporation, “cuBLAS Library User Guide,” 2024.[Online]. Available: https://docs.nvidia.com/cuda/cublas/ [22]NVIDIA Corporation, “WMMA API :: CUDA Toolkit Documentation,” 2024.[Online]. Available:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma-api [23]C.-Y. Hsieh, C.-M. Chang, P.-H. Chen, and S.-Y. Kuo, “Accelerating Maximal Biclique Enumeration on GPUs,” arXiv preprint arXiv:2401 05039, 2024. [Online]. Available: https://arxiv.org/abs/2401.05039 [24]J. Yang, Y. Peng, and W. Zhang, “(p,q)-biclique Counting and Enumeration for LargeSparse Bipartite Graphs,” in Proc VLDB Endowment, vol. 15, no. 1, pp. 141–153,2022. [Online]. Available: https://www.vldb.org/pvldb/vol15/p141-yang.pdf [25]Z. Pan, S. He, X. Li, X. Zhang, R. Wang, and G. Chen, “Efficient Maximal BicliqueEnumeration on GPUs,” in Proc The International Conference for HighPerformance Computing, Networking, Storage and Analysis (SC ’23), Denver, CO,USA, Nov. 12–17, 2023, pp. 1–13. doi:10.1145/3581784.3607059 [26]Y. Wang, B. Feng, Z. Wang, G. Huang, and Y. Ding, TC‑GNN: Bridging SparseGNN Computation and Dense Tensor Cores on GPUs,” in Proc. 2023 USENIXAnnual Technical Conf. (ATC), Boston, MA, USA, Jul. 10–12, 2023, pp. 149–160. [27]W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec,“Open Graph Benchmark: Datasets for Machine Learning on Graphs,”in Advances in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp.22118–22133. [28]J. Leskovec and A. Krevl, “SNAP Datasets: Stanford Large Network DatasetCollection,” Stanford University, June 2014. [Online]. Available:http://snap.stanford.edu/data [29]M. Fey and J. E. Lenssen, “Fast Graph Representation Learning with PyTorchGeometric,” in Proceedings of the ICLR Workshop on Representation Learning onGraphs and Manifolds, 2019. [Online]. Available: https://arxiv.org/abs/1903.02428 [30]NVIDIA Corporation, “Nsight Compute User Guide,” NVIDIA DeveloperDocumentation, 2024. [Online]. Available: https://docs.nvidia.com/nsight-compute/index.html	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/99489	-
dc.description.abstract	圖神經網路（GNN）已廣泛應用於圖結構資料的學習任務，但由於其鄰居聚合階段具備不規則記憶體存取與低運算密度的特性，使得在 GPU 上高效執行仍具挑戰性。雖然現有方法如 binpack（如 PCKGNN）與冗餘消除（如 HAG）在加速GNN 上已取得成果，但前者在 bin 內仍存在重複記憶體存取，後者則需高昂的預處理成本。本研究提出 BinHAG，一種在 bin 層級進行階層式聚合的策略，透過引入虛擬節點（vnode）來消除 bin 中的冗餘計算。實驗結果顯示，BinHAG 在密集圖上相較於先前方法最高可達 31% 的效能提升。	zh_TW
dc.description.abstract	Graph Neural Networks (GNNs) have become essential for learning on graphstructured data, yet their efficiency remains a challenge due to irregular memory access and low arithmetic intensity. While techniques such as binpacking (e.g., PCKGNN) and redundancy elimination (e.g., HAG) improve GPU performance, each faces limitation: bin-based methods suffer from intra-bin redundancy, and global redundancy reduction incurs high preprocessing costs. In this work, we propose BinHAG, a bin-level hierarchical aggregation strategy that introduces virtual nodes (vnodes) to eliminate redundant memory accesses within dense bins. Experiments show that BinHAG achieves up to 31% speedup over prior approaches on dense graphs.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-10T16:26:46Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2025-09-10T16:26:46Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	誌謝 i 摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES vii Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 GNN 4 2.2 GPU 5 2.2.1 GPU Execution Hierarchy 6 2.2.2 GPU Memory Hierarchy 6 2.2.3 Tensor Cores and WMMA Interface 7 2.3 GNN aggregation acceleration techniques 7 2.4 PCKGNN: Locality-Aware Aggregation Framework 9 2.5 Biclique Structures and HAG Construction 9 Chapter 3 Methodology: BINHAG 11 3.1 System Overview 12 3.2 Preprocessing Details 13 3.2.1 Biclique Mining 15 3.2.2 Vnode Construction 18 3.2.3 Binpack and RemainCSR Assignment 19 3.3 Kernel Implementation 22 3.3.1 Vnode Kernel 22 3.3.2 Binpack Kernel 23 3.3.3 RemainCSR Kernel 23 Chapter 4 Optimization Strategy for vnode kernel 24 4.1 Handling Irregular Vnode Neighborhoods 24 4.2 Tensor Core Acceleration for Extremely Dense Graphs 25 Chapter 5 Experiment 27 5.1 Experiment Setup 27 5.2 Dataset 28 5.3 Results 30 5.3.1 Overall Speedup 30 5.3.2 Kernel Execution Time Comparison 31 5.3.3 Different vnode kernel implementations 33 5.3.4 Reuse Ratio vs. Speedup 34 5.3.5 Scalability Limitations of Global Biclique Mining 36 5.3.6 GPU Occupancy Across Kernels 37 5.3.7 L1 / L2 Cache Hit Rate Comparison 38 5.3.8 Total global memory traffic comparison 39 Chapter 6 Conclusion 41 REFERENCE 42	-
dc.language.iso	en	-
dc.subject	冗餘消除	zh_TW
dc.subject	圖神經網路	zh_TW
dc.subject	分箱	zh_TW
dc.subject	bin-packing	en
dc.subject	GNN	en
dc.subject	Hierarchical Aggregation	en
dc.title	利用虛擬節點實現GPU上密集圖高效GNN聚合	zh_TW
dc.title	Efficient GNN Aggregation on Dense Graphs via Virtual Nodes on GPU	en
dc.type	Thesis	-
dc.date.schoolyear	113-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	劉智弘;雷欽隆;顏嗣鈞;袁世一	zh_TW
dc.contributor.oralexamcommittee	Chih-Hung Liu;Chin-Laung Lei;Hsu-chun Yen;Shih-Yi Yuan	en
dc.subject.keyword	圖神經網路,冗餘消除,分箱,	zh_TW
dc.subject.keyword	GNN,Hierarchical Aggregation,bin-packing,	en
dc.relation.page	45	-
dc.identifier.doi	10.6342/NTU202502516	-
dc.rights.note	同意授權(限校園內公開)	-
dc.date.accepted	2025-07-30	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
dc.date.embargo-lift	2030-07-25	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 未授權公開取用	1.96 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。