Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor許文翰
dc.contributor.authorCheng-Tao Wuen
dc.contributor.author吳政道zh_TW
dc.date.accessioned2021-06-17T09:06:49Z-
dc.date.available2019-12-26
dc.date.copyright2019-12-26
dc.date.issued2019
dc.date.submitted2019-12-23
dc.identifier.citation[1] “Whitepaper: NVIDIA Tesla V100 GPU architecture.” http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. Accessed: Mar 26, 2019.
[2] F. W. Roos and W. W. Willmarth, “Some experimental results on sphere and disk drag,” AIAA Journal, vol. 9, no. 2, pp. 285–291, 1971.
[3] T. A. Johnson and V. C. PATEL, “Flow past a sphere up to a reynolds number of 300,” Journal of Fluid Mechanics, vol. 378, p. 19–70, 1999.
[4] F. H. Harlow, “Fluid dynamics in group T-3 los alamos national laboratory: (la-ur-03-3852),” Journal of Computational Physics, vol. 195, no. 2, pp. 414 – 433, 2004.
[5] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, “NVIDIA Tesla: A unified graphics and computing architecture,” IEEE Micro, vol. 28, pp. 39 – 55, March 2008.
[6] J. Tolke and M. Krafczyk, “Teraflop computing on a desktop PC with GPUs for 3D CFD,” International Journal of Computational Fluid Dynamics, vol. 22, pp. 443–456, Aug 2008.
[7] F. Vázquez, E. M. Garzon, J. Martinez, and J. Fernández, “The sparse matrix vector product on GPUs,” Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, vol. 2, Jan 2009.
[8] V. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, “Debunking the 100x GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU,” ACM SIGARCH Computer Architecture News, vol. 38, pp. 451–460, Jan 2010.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, pp. 84–90, May 2017.
[10] J. Cohen and M. Molemaker, “A fast double precision CFD code using CUDA,” Parallel Computational Fluid Dynamics: Recent Advances and Future Directions, Jan 2009.
[11] J. Thibault and I. Senocak, “CUDA implementation of a Navier-Stokes solver on multi-GPU desktop platforms for incompressible flows,” Inanc Senocak, Jan 2009.
[12] N. Bell and M. Garland, “Efficient sparse matrix-vector multiplication on CUDA,” NVIDIA Technical Report, Jan 2009.
[13] M. Ament, G. Knittel, D. Weiskopf, and W. StraSSer, “A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform,” in Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2010, pp. 583–592, Feb 2010.
[14] N. Zhao and X. Wang, “A parallel preconditioned bi-conjugate gradient stabilized solver for the Poisson problem,” Journal of Computers, vol. 7, Dec 2012.
[15] R. Liu, T. Sheu, Y.-H. Hwang, and K. C. Ng, “High-order particle method for solving incompressible navier-stokes equations within a mixed lagrangianeulerian framework,” Computer Methods in Applied Mechanics and Engineering, vol. 325, Jul 2017.
[16] M. Rumpf and R. Strzodka, “Using graphics cards for quantized FEM computations,” in Using Graphics Cards for Quantized FEM Computations, pp. 193 – 202, 2001.
[17] Zhe Fan, Feng Qiu, A. Kaufman, and S. Yoakum-Stover, “GPU cluster for high performance computing,” in SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, pp. 47–47, Nov 2004.
[18] N. Galoppo, N. Govindaraju, M. Henson, and D. Manocha, “LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware.,” in Proceedings of the ACM/IEEE 2005 Supercomputing Conference, SC’05, vol. 2005, p. 3, Jan 2005.
[19] 柳冠碩, 發展一耦合拉格朗日-尤拉粒子法與隱式外力沉浸邊界法以模擬具有複雜幾何的三維不可壓縮黏性流場. 國立臺灣大學工程科學及海洋工程學研究所博士論文, 2019.
[20] J.-L. Guermond and L. Quartapelle, “On stability and convergence of projection methods based on pressure Poisson equation,” International Journal for Numerical Methods in Fluids, vol. 26, no. 9, pp. 1039–1053, 1998.
[21] J. Guermond, P. Minev, and J. Shen, “An overview of projection methods for incompressible flows,” Computer Methods in Applied Mechanics and Engineering, vol. 195, no. 44, pp. 6011 – 6045, 2006.
[22] “Matrix algebra on GPU and multicore architectures.” http://icl.cs.utk.edu/projectsfiles/magma/doxygen/. Accessed: 2017-11.
[23] S. Tomov, J. Dongarra, and M. Baboulin, “Towards dense linear algebra for hybrid GPU accelerated manycore systems,” Parallel Computing, vol. 36, pp. 232–240, Jun 2010.
[24] J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and I. Yamazaki, “Accelerating numerical dense linear algebra calculations with GPUs,” Numerical Computations with GPUs, pp. 1–26, 2014.
[25] I. Yamazaki, T. Dong, R. Solcà, S. Tomov, J. Dongarra, and T. C. Schulthess, “Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems,” Concurrency and Computation: Practice and Experience, Oct 2013.
[26] R. Nath, S. Tomov, and J. Dongarra, Blas for GPUs, ch. 4. Chapman & Hall/CRC Computational Science, Boca Raton, Florida: CRC Press, 2010.
[27] V. Faber and T. Manteuffel, “Necessary and sufficient conditions for the existence of a conjugate gradient method,” SIAM Journal on Numerical Analysis, vol. 21, no. 2, pp. 352–362, 1984.
[28] R. Fletcher, “Conjugate gradient methods for indefinite systems,” in Numerical Analysis (G. A. Watson, ed.), (Berlin, Heidelberg), pp. 73–89, Springer Berlin Heidelberg, 1976.
[29] Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 7, no. 3, pp. 856–869, 1986.
[30] P. Sonneveld, “CGS, a fast lanczos-type solver for nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 10, no. 1, pp. 36–52, 1989.
[31] H. A. van der Vorst, “Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems,” SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, pp. 631–644, 1992.
[32] D. C. Lo, K. Murugesan, and D. L. Young, “Numerical solution of threedimensional velocity–vorticity navier–stokes equations by finite difference method,” International Journal for Numerical Methods in Fluids, vol. 47, no. 12, pp. 1469–1487, 2005.
[33] C. Shu, L. Wang, and Y. T. Chew, “Numerical computation of threedimensional incompressible navier–stokes equations in primitive variable form by DQ method,” International Journal for Numerical Methods in Fluids, vol. 43, no. 4, pp. 345–368, 2003.
[34] G. Oyarzun, R. Borrell, A. Gorobets, and A. Oliva, “MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner,” Computers and Fluids, vol. 92, p. 244–252, Mar 2014.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/74743-
dc.description.abstract本論文使用CUDA 程式語言將實驗室內部所開發的改良混合拉格
朗日-尤拉法(improved mixed Lagrangian-Eulerian method) 以及隱式外力沉浸邊界法(implicit forcing immersed boundary method) 平行並執行在多顯示卡上,以達到計算加速。為了確保程式的高效率,計算空間會被分割層數份並分配給數張計算卡,同時,三個重要數值方法的平行技巧會被進一步的討論。這三個方法分別為高精度的緊緻有限差分法(combined compact difference scheme)、移動最小方差法(moving leastsquares) 以及基於網格的共軛梯度法(grid-based conjugate gradient)。值得一提,新提出的基於網格的共軛梯度法和傳統的壓縮行稀疏矩陣(compressed sparse row) 共軛梯度法有兩個主要的優勢。該優勢分別為避免使用定址矩陣以減少資料的需求,以及相對容易將資料分配給多顯示卡。使用顯示卡所平行的改良混合拉格朗日-尤拉法及隱式外力沉浸邊界法和文獻相比,可以確定其得以準確的模擬拉穴流(lid-drivencavity flow) 以及流經過球體(flow past a sphere) 這兩個問題。同時,在求解200^3網格點的問題時,和12 執行緒的程式相比,使用4 張顯示卡進行運算之效能可以達到27 倍加速。
zh_TW
dc.description.abstractIn this thesis, the in-house improved mixed Lagrangian-Eulerian (IMLE) method and the implicit forcing immersed boundary (IFIB) method are going to be parallelized by using CUDA programming language and can be executed on multiple GPUs. To make sure the code can execute in high efficiency, the computing domain will be decomposed and distributed to several GPUs, and the parallelization strategy for three important schemes will be detailed. These important schemes are the Cell-Centered Combined Compact Difference (CC-
CCD) scheme, Moving Least Square (MLS) Interpolation scheme, grid-based conjugate gradient (CG) solver. It is worthy to note that the newly proposed grid-based CG method has two main benefits over the compressed sparse row (CSR) CG method. The grid-based format does not require index array which
can reduce the memory requirement and it is easier to decompose the matrix into several domains. The IMLE-IFIB method simulates the lid-driven cavity flow and the flow past a sphere problems and the results are highly consistent with the reference data. Finally, the CUDA parallelized IMLE-IFIB method can reach up to 27 times speedup on 4 GPUs compared with the 12 threads CPU performance when solving the problem with 2003 lattice points.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T09:06:49Z (GMT). No. of bitstreams: 1
ntu-108-R06525062-1.pdf: 5790174 bytes, checksum: 8df33cb5418c3553ab33c36f3ebdb068 (MD5)
Previous issue date: 2019
en
dc.description.tableofcontents誌謝............................................iii
摘要.............................................iv
Abstract..........................................v
Table of contents...............................vii
List of Figures..................................ix
List of Tables....................................x
Chapter 1 Introduction............................1
1.1 Literature review............................3
1.2 Objective....................................4
1.3 Outline......................................5
Chapter 2 Parallel Computing Environment..........6
2.1 GPU hardware architecture....................8
2.2 CUDA programming language...................10
Chapter 3 GPU Implementation on 3D
Incompressible Navier-Stokes Particle code...15
3.1 Improved Mixed Lagrangian-Eulerian method...17
3.2 Implicit Forcing Immersed
Boundary particle code.......................20
3.3 Investigation on PPE solver.................23
3.3.1 Direct Solver.............................24
3.3.2 Iterative Solver..........................26
3.3.3 Performance Evaluation for Both the
Direct and Iterative Solvers.................30
Chapter 4 Parallelization Strategy...............34
4.1 Domain Decomposition........................34
4.2 Parallelization on Three Numerical Scheme...35
4.2.1 LU Solver with Multiple Right-Hand Side...35
4.2.2 Gaussian Elimination Method with
Multiple Small Matrices......................37
4.2.3 Grid Based Conjugate Gradient Method......38
4.3 Grid Based BiCGSTAB Method..................40
4.4 Problem Description.........................41
4.4.1 Lid-Driven Cavity Flow....................41
4.4.2 Flow Past a Sphere........................43
4.5 Speedup Performance.........................46
4.5.1 Speedup for Lid-Driven Cavity Flow........48
4.5.2 Speedup for flow passed a sphere..........52
Chapter 5 Concluding Remarks.....................55
5.1 Conclusions.................................55
5.2 Future works................................56
dc.language.isoen
dc.subject高效能計算zh_TW
dc.subject改良之混合拉格朗日-尤拉法zh_TW
dc.subject平行計算zh_TW
dc.subjectNavier Stokes 方程組zh_TW
dc.subject顯示卡計算zh_TW
dc.subjectGPUen
dc.subjectNavier Stokes Equationen
dc.subjectIMLEen
dc.subjectHPCen
dc.subjectParallel Computingen
dc.title在GPU 架構下以粒子法求解三維不可壓縮黏性流方程的高效平行計算方法zh_TW
dc.titleEffective Parallelization on Particle Code in GPUs for 3D Incompressible Viscous Flow Equationsen
dc.typeThesis
dc.date.schoolyear108-1
dc.description.degree碩士
dc.contributor.oralexamcommittee陳明志,蔡順?,羅弘岳,柳冠碩
dc.subject.keyword顯示卡計算,Navier Stokes 方程組,平行計算,高效能計算,改良之混合拉格朗日-尤拉法,zh_TW
dc.subject.keywordGPU,Navier Stokes Equation,Parallel Computing,HPC,IMLE,en
dc.relation.page61
dc.identifier.doi10.6342/NTU201904408
dc.rights.note有償授權
dc.date.accepted2019-12-24
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept工程科學及海洋工程學研究所zh_TW
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-108-1.pdf
  未授權公開取用
5.65 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved