在GPU 架構下以粒子法求解三維不可壓縮黏性流方程的高效平行計算方法

Cheng-Tao Wu; 吳政道

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	許文翰
dc.contributor.author	Cheng-Tao Wu	en
dc.contributor.author	吳政道	zh_TW
dc.date.accessioned	2021-06-17T09:06:49Z	-
dc.date.available	2019-12-26
dc.date.copyright	2019-12-26
dc.date.issued	2019
dc.date.submitted	2019-12-23
dc.identifier.citation	[1] “Whitepaper: NVIDIA Tesla V100 GPU architecture.” http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. Accessed: Mar 26, 2019. [2] F. W. Roos and W. W. Willmarth, “Some experimental results on sphere and disk drag,” AIAA Journal, vol. 9, no. 2, pp. 285–291, 1971. [3] T. A. Johnson and V. C. PATEL, “Flow past a sphere up to a reynolds number of 300,” Journal of Fluid Mechanics, vol. 378, p. 19–70, 1999. [4] F. H. Harlow, “Fluid dynamics in group T-3 los alamos national laboratory: (la-ur-03-3852),” Journal of Computational Physics, vol. 195, no. 2, pp. 414 – 433, 2004. [5] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A unified graphics and computing architecture,” IEEE Micro, vol. 28, pp. 39 – 55, March 2008. [6] J. Tolke and M. Krafczyk, “Teraflop computing on a desktop PC with GPUs for 3D CFD,” International Journal of Computational Fluid Dynamics, vol. 22, pp. 443–456, Aug 2008. [7] F. Vázquez, E. M. Garzon, J. Martinez, and J. Fernández, “The sparse matrix vector product on GPUs,” Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, vol. 2, Jan 2009. [8] V. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” ACM SIGARCH Computer Architecture News, vol. 38, pp. 451–460, Jan 2010. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, pp. 84–90, May 2017. [10] J. Cohen and M. Molemaker, “A fast double precision CFD code using CUDA,” Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, Jan 2009. [11] J. Thibault and I. Senocak, “CUDA implementation of a Navier-Stokes solver on multi-GPU desktop platforms for incompressible flows,” Inanc Senocak, Jan 2009. [12] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on CUDA,” NVIDIA Technical Report, Jan 2009. [13] M. Ament, G. Knittel, D. Weiskopf, and W. StraSSer, “A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform,” in Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010, pp. 583–592, Feb 2010. [14] N. Zhao and X. Wang, “A parallel preconditioned bi-conjugate gradient stabilized solver for the Poisson problem,” Journal of Computers, vol. 7, Dec 2012. [15] R. Liu, T. Sheu, Y.-H. Hwang, and K. C. Ng, “High-order particle method for solving incompressible navier-stokes equations within a mixed lagrangianeulerian framework,” Computer Methods in Applied Mechanics and Engineering, vol. 325, Jul 2017. [16] M. Rumpf and R. Strzodka, “Using graphics cards for quantized FEM computations,” in Using Graphics Cards for Quantized FEM Computations, pp. 193 – 202, 2001. [17] Zhe Fan, Feng Qiu, A. Kaufman, and S. Yoakum-Stover, “GPU cluster for high performance computing,” in SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, pp. 47–47, Nov 2004. [18] N. Galoppo, N. Govindaraju, M. Henson, and D. Manocha, “LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware.,” in Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC’05, vol. 2005, p. 3, Jan 2005. [19] 柳冠碩, 發展一耦合拉格朗日-尤拉粒子法與隱式外力沉浸邊界法以模擬具有複雜幾何的三維不可壓縮黏性流場. 國立臺灣大學工程科學及海洋工程學研究所博士論文, 2019. [20] J.-L. Guermond and L. Quartapelle, “On stability and convergence of projection methods based on pressure Poisson equation,” International Journal for Numerical Methods in Fluids, vol. 26, no. 9, pp. 1039–1053, 1998. [21] J. Guermond, P. Minev, and J. Shen, “An overview of projection methods for incompressible flows,” Computer Methods in Applied Mechanics and Engineering, vol. 195, no. 44, pp. 6011 – 6045, 2006. [22] “Matrix algebra on GPU and multicore architectures.” http://icl.cs.utk.edu/projectsfiles/magma/doxygen/. Accessed: 2017-11. [23] S. Tomov, J. Dongarra, and M. Baboulin, “Towards dense linear algebra for hybrid GPU accelerated manycore systems,” Parallel Computing, vol. 36, pp. 232–240, Jun 2010. [24] J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Accelerating numerical dense linear algebra calculations with GPUs,” Numerical Computations with GPUs, pp. 1–26, 2014. [25] I. Yamazaki, T. Dong, R. Solcà, S. Tomov, J. Dongarra, and T. C. Schulthess, “Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, Oct 2013. [26] R. Nath, S. Tomov, and J. Dongarra, Blas for GPUs, ch. 4. Chapman & Hall/CRC Computational Science, Boca Raton, Florida: CRC Press, 2010. [27] V. Faber and T. Manteuffel, “Necessary and sufficient conditions for the existence of a conjugate gradient method,” SIAM Journal on Numerical Analysis, vol. 21, no. 2, pp. 352–362, 1984. [28] R. Fletcher, “Conjugate gradient methods for indefinite systems,” in Numerical Analysis (G. A. Watson, ed.), (Berlin, Heidelberg), pp. 73–89, Springer Berlin Heidelberg, 1976. [29] Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986. [30] P. Sonneveld, “CGS, a fast lanczos-type solver for nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 10, no. 1, pp. 36–52, 1989. [31] H. A. van der Vorst, “Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, 1992. [32] D. C. Lo, K. Murugesan, and D. L. Young, “Numerical solution of threedimensional velocity–vorticity navier–stokes equations by finite difference method,” International Journal for Numerical Methods in Fluids, vol. 47, no. 12, pp. 1469–1487, 2005. [33] C. Shu, L. Wang, and Y. T. Chew, “Numerical computation of threedimensional incompressible navier–stokes equations in primitive variable form by DQ method,” International Journal for Numerical Methods in Fluids, vol. 43, no. 4, pp. 345–368, 2003. [34] G. Oyarzun, R. Borrell, A. Gorobets, and A. Oliva, “MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner,” Computers and Fluids, vol. 92, p. 244–252, Mar 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743	-
dc.description.abstract	本論文使用CUDA 程式語言將實驗室內部所開發的改良混合拉格朗日-尤拉法(improved mixed Lagrangian-Eulerian method) 以及隱式外力沉浸邊界法(implicit forcing immersed boundary method) 平行並執行在多顯示卡上，以達到計算加速。為了確保程式的高效率，計算空間會被分割層數份並分配給數張計算卡，同時，三個重要數值方法的平行技巧會被進一步的討論。這三個方法分別為高精度的緊緻有限差分法(combined compact difference scheme)、移動最小方差法(moving leastsquares) 以及基於網格的共軛梯度法(grid-based conjugate gradient)。值得一提，新提出的基於網格的共軛梯度法和傳統的壓縮行稀疏矩陣(compressed sparse row) 共軛梯度法有兩個主要的優勢。該優勢分別為避免使用定址矩陣以減少資料的需求，以及相對容易將資料分配給多顯示卡。使用顯示卡所平行的改良混合拉格朗日-尤拉法及隱式外力沉浸邊界法和文獻相比，可以確定其得以準確的模擬拉穴流(lid-drivencavity flow) 以及流經過球體(flow past a sphere) 這兩個問題。同時，在求解200^3網格點的問題時，和12 執行緒的程式相比，使用4 張顯示卡進行運算之效能可以達到27 倍加速。	zh_TW
dc.description.abstract	In this thesis, the in-house improved mixed Lagrangian-Eulerian (IMLE) method and the implicit forcing immersed boundary (IFIB) method are going to be parallelized by using CUDA programming language and can be executed on multiple GPUs. To make sure the code can execute in high efficiency, the computing domain will be decomposed and distributed to several GPUs, and the parallelization strategy for three important schemes will be detailed. These important schemes are the Cell-Centered Combined Compact Difference (CC- CCD) scheme, Moving Least Square (MLS) Interpolation scheme, grid-based conjugate gradient (CG) solver. It is worthy to note that the newly proposed grid-based CG method has two main benefits over the compressed sparse row (CSR) CG method. The grid-based format does not require index array which can reduce the memory requirement and it is easier to decompose the matrix into several domains. The IMLE-IFIB method simulates the lid-driven cavity flow and the flow past a sphere problems and the results are highly consistent with the reference data. Finally, the CUDA parallelized IMLE-IFIB method can reach up to 27 times speedup on 4 GPUs compared with the 12 threads CPU performance when solving the problem with 2003 lattice points.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T09:06:49Z (GMT). No. of bitstreams: 1 ntu-108-R06525062-1.pdf: 5790174 bytes, checksum: 8df33cb5418c3553ab33c36f3ebdb068 (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	誌謝............................................iii 摘要.............................................iv Abstract..........................................v Table of contents...............................vii List of Figures..................................ix List of Tables....................................x Chapter 1 Introduction............................1 1.1 Literature review............................3 1.2 Objective....................................4 1.3 Outline......................................5 Chapter 2 Parallel Computing Environment..........6 2.1 GPU hardware architecture....................8 2.2 CUDA programming language...................10 Chapter 3 GPU Implementation on 3D Incompressible Navier-Stokes Particle code...15 3.1 Improved Mixed Lagrangian-Eulerian method...17 3.2 Implicit Forcing Immersed Boundary particle code.......................20 3.3 Investigation on PPE solver.................23 3.3.1 Direct Solver.............................24 3.3.2 Iterative Solver..........................26 3.3.3 Performance Evaluation for Both the Direct and Iterative Solvers.................30 Chapter 4 Parallelization Strategy...............34 4.1 Domain Decomposition........................34 4.2 Parallelization on Three Numerical Scheme...35 4.2.1 LU Solver with Multiple Right-Hand Side...35 4.2.2 Gaussian Elimination Method with Multiple Small Matrices......................37 4.2.3 Grid Based Conjugate Gradient Method......38 4.3 Grid Based BiCGSTAB Method..................40 4.4 Problem Description.........................41 4.4.1 Lid-Driven Cavity Flow....................41 4.4.2 Flow Past a Sphere........................43 4.5 Speedup Performance.........................46 4.5.1 Speedup for Lid-Driven Cavity Flow........48 4.5.2 Speedup for flow passed a sphere..........52 Chapter 5 Concluding Remarks.....................55 5.1 Conclusions.................................55 5.2 Future works................................56
dc.language.iso	en
dc.subject	高效能計算	zh_TW
dc.subject	改良之混合拉格朗日-尤拉法	zh_TW
dc.subject	平行計算	zh_TW
dc.subject	Navier Stokes 方程組	zh_TW
dc.subject	顯示卡計算	zh_TW
dc.subject	GPU	en
dc.subject	Navier Stokes Equation	en
dc.subject	IMLE	en
dc.subject	HPC	en
dc.subject	Parallel Computing	en
dc.title	在GPU 架構下以粒子法求解三維不可壓縮黏性流方程的高效平行計算方法	zh_TW
dc.title	Effective Parallelization on Particle Code in GPUs for 3D Incompressible Viscous Flow Equations	en
dc.type	Thesis
dc.date.schoolyear	108-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳明志,蔡順?,羅弘岳,柳冠碩
dc.subject.keyword	顯示卡計算,Navier Stokes 方程組,平行計算,高效能計算,改良之混合拉格朗日-尤拉法,	zh_TW
dc.subject.keyword	GPU,Navier Stokes Equation,Parallel Computing,HPC,IMLE,	en
dc.relation.page	61
dc.identifier.doi	10.6342/NTU201904408
dc.rights.note	有償授權
dc.date.accepted	2019-12-24
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工程科學及海洋工程學研究所	zh_TW
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	5.65 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。