Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 理學院
  3. 數學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66064
Title: 使用多子矩陣法結合中央處理器和圖形處理器解決大型稀疏線性系統
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
Authors: Chenhan D. Yu
余承翰
Advisor: 王偉仲(Weichung Wang)
Keyword: 稀疏矩陣,圖形處理器,平行計算,直接法,線性系統,高效能計算,多子矩陣法,對稱正定,
sparse matrix,GPU,parallel computing,direct method,linear system,high performance computing,Multifrontal,symmetric positive definite,
Publication Year : 2012
Degree: 碩士
Abstract: 大型稀疏線性系統長久以來在科學計算和工程領域佔有極為重要的地
位。 在眾多直接解法中,我們特別專注在多子矩陣法的研究。多子矩陣法
利用 分解樹(非對稱問題使用行分解樹)將大型稀疏線性系統分成許多較
小的 全滿矩陣計算,以利於使用中央處理器和圖形顯示器的結合的計算環
境。 我們分析演算法和程式實作層面以探討如何使用一張或多張圖形顯示
卡來 對計算進行加速,並複習及回顧多子矩陣法的發展和演進,從對稱正
定, 對稱不定到非對稱問題。
我們成功的將理想的演算法在對稱正定的問題上實作。此稀疏分解的實
作成果 其計算效能得以接近全滿矩陣的效率。對於對稱不定問題則實作方
式類似於對 稱正定問題,我門預估其效率只會略低於對稱正定問題。但是
此種理想的方法 無法應用在先進的非對稱問題上,因為先進的非對稱解法
為了縮小使用的記憶 體空間和計算量選擇在分解過程中動態的進行行重
排,此種方法能減少分解後 的額外記憶體需求,但是必須犧牲掉兩個次方
三的運算,這種降階的計算對於 圖形顯示卡的計算效能上有極大的影響,
並且增加了中央處理器和圖形顯示卡 之間的傳輸量,進而影響計算效率。
我們在非對稱的時作上盡可能的降低傳輸 所帶來的影響,其中使用諸多的
高效能記憶體等技術。這些技術在對稱正定的 問題上同樣被使用,並切增
添了效能的理論分析。在將總執行時間的模型推導 出後,我們隨即提出方
法試圖尋找最佳的排程以最小化總執行時間。理論和可 計算的誤差上界,
定理,證明,數值時間在文章中都有提供。
最後對於各類型的方法,我們提出適應型的方法以利於在更大型的計算
環境下 其計算能力得以倍增,而在文中提出的分析模型仍能適用。我們以
日前最流行 的塔型伺服器的規格(可以容納四張圖形處理器)上限作為目
標,以避免討論 不同量級的傳輸速度(網路傳輸),在對稱問題上,我們
更提出一種新的方法 以避免圖形顯示卡之間的傳輸以利於整體的效率。
Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases.
We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation.
For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/66064
Fulltext Rights: 有償授權
Appears in Collections:數學系

Files in This Item:
File SizeFormat 
ntu-101-1.pdf
  Restricted Access
663.09 kBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved