在GPU 架構下以粒子法求解三維不可壓縮黏性流方程的高效平行計算方法

Cheng-Tao Wu; 吳政道

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743

標題:	在GPU 架構下以粒子法求解三維不可壓縮黏性流方程的高效平行計算方法 Effective Parallelization on Particle Code in GPUs for 3D Incompressible Viscous Flow Equations
作者:	Cheng-Tao Wu 吳政道
指導教授:	許文翰
關鍵字:	顯示卡計算,Navier Stokes 方程組,平行計算,高效能計算,改良之混合拉格朗日-尤拉法, GPU,Navier Stokes Equation,Parallel Computing,HPC,IMLE,
出版年 :	2019
學位:	碩士
摘要:	本論文使用CUDA 程式語言將實驗室內部所開發的改良混合拉格朗日-尤拉法(improved mixed Lagrangian-Eulerian method) 以及隱式外力沉浸邊界法(implicit forcing immersed boundary method) 平行並執行在多顯示卡上，以達到計算加速。為了確保程式的高效率，計算空間會被分割層數份並分配給數張計算卡，同時，三個重要數值方法的平行技巧會被進一步的討論。這三個方法分別為高精度的緊緻有限差分法(combined compact difference scheme)、移動最小方差法(moving leastsquares) 以及基於網格的共軛梯度法(grid-based conjugate gradient)。值得一提，新提出的基於網格的共軛梯度法和傳統的壓縮行稀疏矩陣(compressed sparse row) 共軛梯度法有兩個主要的優勢。該優勢分別為避免使用定址矩陣以減少資料的需求，以及相對容易將資料分配給多顯示卡。使用顯示卡所平行的改良混合拉格朗日-尤拉法及隱式外力沉浸邊界法和文獻相比，可以確定其得以準確的模擬拉穴流(lid-drivencavity flow) 以及流經過球體(flow past a sphere) 這兩個問題。同時，在求解200^3網格點的問題時，和12 執行緒的程式相比，使用4 張顯示卡進行運算之效能可以達到27 倍加速。 In this thesis, the in-house improved mixed Lagrangian-Eulerian (IMLE) method and the implicit forcing immersed boundary (IFIB) method are going to be parallelized by using CUDA programming language and can be executed on multiple GPUs. To make sure the code can execute in high efficiency, the computing domain will be decomposed and distributed to several GPUs, and the parallelization strategy for three important schemes will be detailed. These important schemes are the Cell-Centered Combined Compact Difference (CC- CCD) scheme, Moving Least Square (MLS) Interpolation scheme, grid-based conjugate gradient (CG) solver. It is worthy to note that the newly proposed grid-based CG method has two main benefits over the compressed sparse row (CSR) CG method. The grid-based format does not require index array which can reduce the memory requirement and it is easier to decompose the matrix into several domains. The IMLE-IFIB method simulates the lid-driven cavity flow and the flow past a sphere problems and the results are highly consistent with the reference data. Finally, the CUDA parallelized IMLE-IFIB method can reach up to 27 times speedup on 4 GPUs compared with the 12 threads CPU performance when solving the problem with 2003 lattice points.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743
DOI:	10.6342/NTU201904408
全文授權:	有償授權
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 未授權公開取用	5.65 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。