Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電子工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89129
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor郭斯彥zh_TW
dc.contributor.advisorSy-Yen Kuoen
dc.contributor.author鄭博修zh_TW
dc.contributor.authorPo-Hsiu Chengen
dc.date.accessioned2023-08-16T17:15:09Z-
dc.date.available2023-11-09-
dc.date.copyright2023-08-16-
dc.date.issued2023-
dc.date.submitted2023-08-01-
dc.identifier.citationR. E. Bank and C. C. Douglas. Sparse matrix multiplication package (smmp). Adv. Comput. Math., 1(1):127–137, 1993.
S. Beamer, K. Asanovic, and D. Patterson. Direction-optimizing breadth-first search. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pages 1–10, Nov 2012.
A. Gaihre, Z. Wu, F. Yao, and H. Liu. Xbfs: Exploring runtime optimizations for breadth-first search on gpus. In Proceedings of the 28th International Symposium on High­Performance Parallel and Distributed Computing, HPDC ’19, page 121–131, New York, NY, USA, 2019. Association for Computing Machinery.
A. V. Goldberg, S. Hed, H. Kaplan, R. E. Tarjan, and R. F. Werneck. Maximum flows by incremental breadth-first search. In Algorithms–ESA 2011: 19th Annual European Symposium, Saarbrücken, Germany, September 5-9, 2011. Proceedings 19, pages 457–468. Springer, 2011.
P. Gupta. The cuda programming model. NVIDIA Developer Blog, 2020.
P. Harish and P. J. Narayanan. Accelerating large graph algorithms on the gpu using cuda. In S. Aluru, M. Parashar, R. Badrinath, and V. K. Prasanna, editors, High Performance Computing – HiPC 2007, pages 197–208, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
M. Harris and K. Perelygin. Cooperative groups: Flexible cuda thread programming. NVIDIA Developer Blog, 2017.
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun. Accelerating cuda graph algorithms at maximum warp. SIGPLAN Not., 46(8):267–276, feb 2011.
C.-Y. Hsieh, P.-H. Cheng, C.-M. Chang, and S.-Y. Kuo. A decentralized frontier queue for improving scalability of breadth-first-search on gpus. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6, April 2023.
Y. Ji, H. Liu, Y. Hu, and H. H. Huang. Ispan: Parallel identification of strongly connected components with spanning trees. ACM Trans. Parallel Comput., 9(3), aug 2022.
S. Jung and S. Pramanik. Hiti graph model of topographical road maps in navigation systems. In Proceedings of the Twelfth International Conference on Data Engineering, pages 76–84, Feb 1996.
J. Leskovec and A. Krevl. Snap datasets: Stanford large network dataset collection, 2014.
H. Liu and H. H. Huang. Enterprise: Breadth-first graph traversal on gpus. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, New York, NY, USA, 2015. Association for Computing Machinery.
H. Liu, H. Kou, C. Yan, and L. Qi. Link prediction in paper citation network to construct paper correlation graph. EURASIP Journal on Wireless Communications and Networking, 2019(1):1–12, 2019.
L. Luo, M. Wong, and W.-m. Hwu. An effective gpu implementation of breadth-first search. In Proceedings of the 47th Design Automation Conference, DAC ’10, page 52–55, New York, NY, USA, 2010. Association for Computing Machinery.
A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the gpu. In SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 572–583, Nov 2014.
D. Merrill, M. Garland, and A. Grimshaw. Scalable gpu graph traversal. SIGPLAN Not., 47(8):117–128, feb 2012.
S. A. Myers, A. Sharma, P. Gupta, and J. Lin. Information network or social network? the structure of the twitter follow graph. In Proceedings of the 23rd International Conference on World Wide Web, WWW ’14 Companion, page 493–498, New York, NY, USA, 2014. Association for Computing Machinery.
NVIDIA. Cuda toolkit, 2022. Accessed Jul. 8, 2022.
L. Nyland and S. Jones. Understanding and using atomic memory operations. In 4th GPU Technology Conf.(GTC'13), March, pages 1–61, 2013.
S. Ortmanns, H. Ney, and X. Aubert. A word graph algorithm for large vocabulary continuous speech recognition. Computer Speech & Language, 11(1):43–72, 1997.
R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015.
D. Troendle, T. Ta, and B. Jang. A specialized concurrent queue for scheduling irregular workloads on gpus. In Proceedings of the 48th International Conference on Parallel Processing, ICPP ’19, New York, NY, USA, 2019. Association for Computing Machinery.
Y. Wang, Y. Pan, A. Davidson, Y. Wu, C. Yang, L. Wang, M. Osama, C. Yuan, W. Liu, A. T. Riffel, and J. D. Owens. Gunrock: Gpu graph analytics. ACM Trans. Parallel Comput., 4(1), aug 2017.
S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui. Graph neural networks in recommender systems: A survey. ACM Comput. Surv., 55(5), dec 2022.
Y. Xia and V. K. Prasanna. Topologically adaptive parallel breadth-first search on multicore processors. In Proc. 21st Int'l. Conf. on Parallel and Distributed Computing Systems (PDCS'09). Citeseer, 2009.
R. Yasaei, L. Chen, S.-Y. Yu, and M. A. A. Faruque. Hardware trojan detection using graph neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pages 1–1, 2022.
S. Zhang, Y. Liu, and L. Xie. Molecular mechanics-driven graph neural network with multiplex graph for molecular structures, 2020.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89129-
dc.description.abstract圖(Graph)是一種常見的資料結構,在導航、語音辨識和推薦系統等方面具有廣泛的應用。其中,廣度優先搜索(BFS)是探索圖中節點的基本算法,在獲取各種graph性質的方面起著至關重要的作用。圖形處理器(GPU)為常用的硬體加速器,具備卓越的計算能力和存儲容量。現在已有許多BFS演算法被移植到GPU上以提高效能,例如並行BFS(PBFS)演算法。
本研究主要為一種改進傳統PBFS演算法的可擴展性之方法。採用了去中心化前沿佇列(Decentralized Frontier Queue)、即時佇列排空(Real-time Queue Draining)、兩級鄰居訪問(Two-level Neighbor Visiting)及狀態陣列原子掃描(Atomic Status Array Scanning)等設計。這些機制成功緩解GPU上的爭用(Contension)、降低記憶體消耗、解決負載不平衡問題,在實現了具競爭力的運行速度的同時,改進了PBFS演算法在GPU上的可擴展性。本論文介紹了此方法的設計、評估,以及未來改進的方向。
zh_TW
dc.description.abstractGraph is a common data structure widely used in navigation, speech recognition, and recommendation systems. Breadth-First Search (BFS) is a fundamental algorithm for graph traversal and plays a crucial role in obtaining various graph properties. Graphic Processing Unit (GPU) is a commonly used hardware accelerators with remarkable computing power and storage capacity. Many BFS algorithms have been ported to GPUs to improve performance, such as the Parallel BFS (PBFS) algorithm.
This study proposes some approach to improve the scalability of the traditional PBFS algorithm, includes Decentralized Frontier Queue, Real-time Queue Draining, Two-level Neighbor Visiting, and Atomic Status Array Scanning. These mechanism successfully alleviate contention on GPUs, reduce memory consumption, and solve load imbalance issues. We achieved competitive executing speeds and improved the scalability of the PBFS algorithm on GPUs. This thesis shows the design, evaluation, and future works.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T17:15:09Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-08-16T17:15:09Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontentsVerification Letter from the Oral Examination Committee i
Acknowledgements ii
摘要 iv
Abstract v
Contents vi
List of Figures ix
List of Tables x
Chapter 1 Introduction 1
Chapter 2 Background 5
2.1 Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Compute Unified Device Architecture . . . . . . . . . . . . . . . . 6
2.1.3 Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.4 Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Compressed Sparse Row . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3 Related Works 12
3.1 Parallel Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Advanced Related Works . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 First BFS on GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Hierarchical Frontier Queue . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 Virtual Warp-centric BFS . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.4 Prefix-sum on Frontier Queue . . . . . . . . . . . . . . . . . . . . 15
3.2.5 Enterprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.6 Gunrock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.7 Specialized Concurrent Queue . . . . . . . . . . . . . . . . . . . . 16
3.2.8 XBFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 4 Challenges 17
4.1 Centralized Frontier Queue . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Scalability Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.2 Queue Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2.3 Scalability Concerns . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 5 Methodology 23
5.1 Decentralized Frontier Queue . . . . . . . . . . . . . . . . . . . . . 24
5.2 Real-time Queue Draining . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.1 Execution Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2 SA Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.3 Two-level Neighbor Visiting . . . . . . . . . . . . . . . . . . . . . 32
Chapter 6 Evaluation 33
6.1 Experimental Configuration . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.4 Sub-queue Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 7 Conclusion 40
References 43
-
dc.language.isoen-
dc.subject平行計算zh_TW
dc.subject圖形處理器zh_TW
dc.subject廣度優先搜尋zh_TW
dc.subjectGPUen
dc.subjectparallel computingen
dc.subjectbreadth-­first ­searchen
dc.title去中心化前沿佇列提升廣度優先搜尋在圖形處理器上的可擴展性zh_TW
dc.titleImproving the Scalability of Breadth-First Search on GPUs via Frontier Queue Decentralizationen
dc.typeThesis-
dc.date.schoolyear111-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee雷欽隆;顏嗣鈞;陳英一;林振緯zh_TW
dc.contributor.oralexamcommitteeChin-Laung Lei;Hsu-chun Yen;Ing-Yi Chen;Jenn-Wei Linen
dc.subject.keyword圖形處理器,平行計算,廣度優先搜尋,zh_TW
dc.subject.keywordGPU,parallel computing,breadth-­first ­search,en
dc.relation.page46-
dc.identifier.doi10.6342/NTU202302373-
dc.rights.note未授權-
dc.date.accepted2023-08-04-
dc.contributor.author-college電機資訊學院-
dc.contributor.author-dept電子工程學研究所-
顯示於系所單位:電子工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-111-2.pdf
  未授權公開取用
926.58 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved