在圖形處理器上之並行極大二分團列舉

張家鳴; Chia-Ming Chang

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88512

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	郭斯彥	zh_TW
dc.contributor.advisor	Sy-Yen Kuo	en
dc.contributor.author	張家鳴	zh_TW
dc.contributor.author	Chia-Ming Chang	en
dc.date.accessioned	2023-08-15T16:37:56Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-15	-
dc.date.issued	2023	-
dc.date.submitted	2023-07-31	-
dc.identifier.citation	[1] A. Abidi, R. Zhou, L. Chen, and C. Liu. Pivot-based maximal biclique enumeration. In IJCAI, pages 3558–3564, 2020. [2] A. Adinets. Cuda dynamic parallelism api and principles, 2014. Accessed June. 3, 2023. [3] G. Alexe, S. Alexe, Y. Crama, S. Foldes, P. L. Hammer, and B. Simeone. Consensus algorithms for the generation of all maximal bicliques. Discrete Applied Mathematics, 145(1):11–21, 2004. [4] M. Almasri, Y.-H. Chang, I. E. Hajj, R. Nagi, J. Xiong, and W.-m. Hwu. Parallelizing maximal clique enumeration on gpus. arXiv preprint arXiv:2212.01473, 2022. [5] L. Chen, C. Liu, R. Zhou, J. Xu, and J. Li. Efficient maximal biclique enumeration for large sparse bipartite graphs. Proceedings of the VLDB Endowment, 15(8):1559–1571, 2022. [6] A. Das, S.-V. Sanei-Mehri, and S. Tirthapura. Shared-memory parallel maximal clique enumeration. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pages 62–71. IEEE, 2018. [7] M. Harris. Using shared memory in cuda c/c++, 2013. Accessed June. 1, 2023. [8] G. Liu, K. Sim, and J. Li. Efficient mining of large maximal bicliques. In Data Warehousing and Knowledge Discovery: 8th International Conference, DaWaK 2006, Krakow, Poland, September 4-8, 2006. Proceedings 8, pages 437–448. Springer, 2006. [9] A. P. Mukherjee and S. Tirthapura. Enumerating maximal bicliques from a large graph using mapreduce. IEEE Transactions on Services Computing, 10(5):771–784, 2016. [10] R. Nataraj and S. Selvan. Parallel mining of large maximal bicliques using order preserving generators. 2009. [11] Nvidia. https://docs.nvidia.com/cuda/archive/9.0/, 2018. Accessed June. 1, 2023. [12] Nvidia. Cuda toolkit, 2023. Accessed June. 3, 2023. [13] R. Peeters. The maximum edge biclique problem is np-complete. Discrete Applied Mathematics, 131(3):651–654, 2003. [14] E. Prisner. Bicliques in graphs i: Bounds on their number. Combinatorica, 20(1):109–117, 2000. [15] X. Tang, A. Pattnaik, H. Jiang, O. Kayiran, A. Jog, S. Pai, M. Ibrahim, M. T. Kandemir, and C. R. Das. Controlled kernel launch for dynamic parallelism in gpus. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 649–660. IEEE, 2017. [16] Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler, and M. A. Langston. On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types. BMC bioinformatics, 15:1–18, 2014.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/88512	-
dc.description.abstract	本論文透過圖形處理器優化極大二分團列舉(Maximal Biclique Enumeration)的執行效能。過往最大二分團列舉之演算法皆採用深度優先搜尋技巧，同時需要大量的遞迴呼叫與集合操作。然而，因為圖形處理器在硬體上的限制，使得深度優先搜尋與遞迴呼叫並不可行，並且圖形處理器上的計憶體限制也使得過往的資料結構，會導致計憶體不足的問題。同時，在解決前述之硬體限制之後，也因為圖形處理器本身高平化度的特性，使得如何平均有效率的將工作量分配給每個執行單元，變成影響效能的最大問題之一。本研究改寫過往之極大二分團列舉演算法，將遞迴呼叫改為迴圈形式，減緩在圖形處理器上呼叫函式所造成的高負擔；同時，也使用新的資料結構，以解決原集合在圖形處理器上不可用之問題，進而減少了宣告集合對記憶體的需求。後續進一步使用了輕量化候選項選擇、工作竊取等技巧，緩解在圖形處理器上執行緒區塊之間工作量不平衡之情形；並使用平行交集計算與雙層平行計算，進一步利用執行緒區塊內之平行度。在提出極大二分團列舉在圖形處理器上之可能性的同時，提高此問題在圖形處理器上之效能。	zh_TW
dc.description.abstract	Maximal biclique is a crucial property in many applications, such as social network analysis and computational biology. By Maximal Biclique Enumeration (MBE), we can list all the maximal biclique in a given graph and thus extract the insight of it. Alone with the growth of graph size, previous studies have introduced different algorithms to solve the MBE problem with parallelism on CPUs to improve the scalability ; however, to the best of author’s knowledge, the problem has not been implemented on Graphic Processing Unit (GPU). In this study, it aims to improve scalability and runtime of MBE through the parallelism provided by GPUs. First, to successfully migrate the MBE algorithm to the GPUs, it proposes a loop-based algorithm to replace the recursion, and then uses a new data structure to solve the memory explosion problem. Lastly, profiting from the parallelism of GPUs, it develops lightweight candidate selection and work-stealing techniques to alleviates the workload imbalance between thread blocks. Furthermore, it leverages parallel intersection computation and two-level parallelism within each thread block to effectively utilize the inherent parallelism of these blocks.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-15T16:37:56Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-15T16:37:56Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee i Acknowledgements ii 摘要 iv Abstract v Contents vii Chapter 1 Introduction 1 Chapter 2 Background 5 2.1 GPU architecture 5 2.2 GPU memory hierarchy 7 2.3 Compute Unified Device Architecture 7 2.3.1 CUDA synchronization granularity 7 2.3.2 CUDA Dynamic Parallelism 8 2.4 Maximal Biclique Enumeration 9 Chapter 3 Related work 10 3.1 iMBEA 10 3.1.1 Tree Pruning 13 3.1.2 Candidate Selection 13 3.2 ShareMemory parMBE 14 3.2.1 First level recursion tree 14 3.2.2 Memory consumption 16 3.2.3 Parallel Intersection Computation 16 3.3 PMCE on GPUs 17 3.3.1 Compact Representation 17 Chapter 4 Migration the parallel MBE Algorithm to GPUs 18 4.1 Obstacles and Overview of Implementation 18 4.2 Loopbased DFS Control Flow on GPUs 19 4.3 Compact Representation in cuMBE 21 Chapter 5 Optimization of Parallel MBE Algorithm on GPUs 22 5.1 Workload Imbalance 22 5.2 Lightweight Candidate Selection 23 5.2.1 First Level Early Stop 24 5.2.2 Second Level Early Stop 25 5.3 Work Stealing 26 5.4 Parallel Intersection Computation 27 5.5 Two Level Parallelism 28 Chapter 6 Evaluation 30 6.1 Configurations 30 6.1.1 Platforms and Datasets 30 6.1.2 CPU Baselines 32 6.2 Overall Performance 32 6.3 Workload Balance 32 References 35	-
dc.language.iso	en	-
dc.subject	二分團	zh_TW
dc.subject	圖形處理器	zh_TW
dc.subject	平行運算	zh_TW
dc.subject	parallel computing	en
dc.subject	GPU	en
dc.subject	Biclique	en
dc.title	在圖形處理器上之並行極大二分團列舉	zh_TW
dc.title	Parallel Maximal Biclique Enumeration on GPUs	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	雷欽隆;顏嗣鈞;陳英一;林振緯	zh_TW
dc.contributor.oralexamcommittee	Chin-Laung Lei;Hsu-chun Yen;Ing-Yi Chen;Jenn-Wei Lin	en
dc.subject.keyword	二分團,圖形處理器,平行運算,	zh_TW
dc.subject.keyword	Biclique,GPU,parallel computing,	en
dc.relation.page	38	-
dc.identifier.doi	10.6342/NTU202302235	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-02	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	電機工程學系	-
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf	794.17 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。