請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63353完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 王偉仲(Weichung Wang) | |
| dc.contributor.author | Yaohung Tsai | en |
| dc.contributor.author | 蔡曜鴻 | zh_TW |
| dc.date.accessioned | 2021-06-16T16:36:35Z | - |
| dc.date.available | 2013-11-22 | |
| dc.date.copyright | 2012-11-22 | |
| dc.date.issued | 2012 | |
| dc.date.submitted | 2012-10-18 | |
| dc.identifier.citation | [1] Matrix algebra on gpu and multicore architectures.
[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. [3] Christian H. Bischof. Adaptive blocking in the QR factorization. The Jour- nal of Supercomputing, 3(3):193{208, 1989. [4] Takeshi Fukaya, Yusaku Yamamoto, and Shao-Liang Zhang. A dynamic pro- gramming approach to optimizing the blocking strategy for the Householder QR decomposition. In CLUSTER, pages 402{410. IEEE, 2008. [5] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996. [6] Intel. Math kernel library. [7] NVIDIA Corporation. NVIDIA CUDA C programming guide, 2012. Version 4.2. [8] Robert Schreiber and Charles van Loan. A storage-e cient WY representa- tion for products of householder transformations. SIAM J. Sci. Stat. Com- put., 10(1):53{57, January 1989. [9] Vasily Volkov and James Demmel. LU, QR and Cholesky factorizations using vector capabilities of GPUs. Technical Report UCB/EECS-2008-49, EECS Department, University of California, Berkeley, May 2008. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63353 | - |
| dc.description.abstract | 在CPU-GPU的混合系統中,因為MAGMA的QR分解採用的固定區塊大小造成CPU的閒置。為了增進效能,我們提出了一個自動調校區塊大小的方法。首先,將CPU和GPU上的子程式分別建立各自的迴歸模型。再來,我們使用了一個最佳化方法來決定最好的區塊大小。目標函數的設計是針對降低CPU和GPU閒置造成的效能損失。最後,我們提出了數值結果來展示我們的方法得到的效能提升。 | zh_TW |
| dc.description.abstract | In CPU-GPU hybrid systems, the QR factorization in MAGMA re-
sults in CPU idle due to the xed block size. To improve the computa- tional e ciency of MAGMA QR factorization, we propose a dynamic block size auto-tuning scheme on CPU-GPU hybrid systems. Our approach is a data-driven approach. First we model the CPU and GPU costs in MAGMA QR factorization via two independent regression models based on collecting training data. Next, according to these tting models, we propose a block size optimization scheme to tune the block size adaptively and therefore to minimize a cost objective function. The cost objective function is designed to balance the workloads between CPU and GPU based on the performance models. Several numerical results demonstrate the performance gains due to the novel QR factorization algorithm. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T16:36:35Z (GMT). No. of bitstreams: 1 ntu-101-R98221032-1.pdf: 958521 bytes, checksum: da0d0ae8e1129881bb3dcc63efaa4cd9 (MD5) Previous issue date: 2012 | en |
| dc.description.tableofcontents | 1 Introduction 4
2 Background 5 2.1 Householder QR Factorization . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Block Householder QR Factorization . . . . . . . . . . . . . . . . . . 7 2.3 CPU-GPU Hybrid System . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Numerical Linear Algebra Packages . . . . . . . . . . . . . . . . . . . 9 3 QR Factorization with Dynamic Block Size 10 3.1 MAGMA's QR Factorization Algorithm . . . . . . . . . . . . . . . . . 10 3.2 Fixed Block Size Algorithm . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Dynamic Block Size Algorithm . . . . . . . . . . . . . . . . . . . . . 15 4 Data-driven Auto-tuning Procedure 19 4.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1 Complexity for CPU . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.2 Complexity for GPU . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 Data Collection and Regression on Related Terms . . . . . . . . . . . 20 4.3 Auto-Tuning Procedure for the Block Size . . . . . . . . . . . . . . . 21 4.4 Threshold to Switch the Updating Job Back to GPU . . . . . . . . . 22 4.5 Including the Variation . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.6 Performance Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Optimal Block Sizes 26 5.1 Denitions and Assumptions of the Optimization . . . . . . . . . . . 27 5.2 Approximate Optimization of Blocking Strategies . . . . . . . . . . . 28 5.3 Shortest Path Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.4 Algorithm for Solving the Optimal Blocking Strategy . . . . . . . . . 30 5.5 Numerical Results and Discussion . . . . . . . . . . . . . . . . . . . . 33 6 Multiple Models with Real-time Performance Monitoring 35 6.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.2 Performance Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 7 Conclusion and Future Directions 37 | |
| dc.language.iso | en | |
| dc.subject | 自動調校 | zh_TW |
| dc.subject | QR分解 | zh_TW |
| dc.subject | GPU | zh_TW |
| dc.subject | QR Factorization | en |
| dc.subject | GPU | en |
| dc.subject | Auto Tuning | en |
| dc.title | CPU-GPU混合系統上QR分解的區塊大小調整 | zh_TW |
| dc.title | Tuning Block Size for QR Factorization on CPU-GPU Hybrid Systems | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 101-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.coadvisor | 陳瑞彬(Ray-Bing Chen) | |
| dc.contributor.oralexamcommittee | 李哲榮(Che-Rung Lee) | |
| dc.subject.keyword | GPU,QR分解,自動調校, | zh_TW |
| dc.subject.keyword | GPU,QR Factorization,Auto Tuning, | en |
| dc.relation.page | 39 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2012-10-18 | |
| dc.contributor.author-college | 理學院 | zh_TW |
| dc.contributor.author-dept | 數學研究所 | zh_TW |
| 顯示於系所單位: | 數學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-101-1.pdf 未授權公開取用 | 936.06 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
