請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2362
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 王偉仲(Weichung Wang) | |
dc.contributor.author | Mu Yang | en |
dc.contributor.author | 楊慕 | zh_TW |
dc.date.accessioned | 2021-05-13T06:39:27Z | - |
dc.date.available | 2017-08-14 | |
dc.date.available | 2021-05-13T06:39:27Z | - |
dc.date.copyright | 2017-08-14 | |
dc.date.issued | 2017 | |
dc.date.submitted | 2017-08-11 | |
dc.identifier.citation | [1] N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011.
[2] V. Rokhlin, A. Szlam, and M. Tygert, “A randomized algorithm for principal component analysis,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1100–1124, 2009. [3] T.-L. Chen, D. D. Chang, S.-Y. Huang, H. Chen, C. Lin, and W. Wang, “Integrating multiple random sketches for singular value decomposition,” arXiv preprint arXiv:1608.08285, 2016. [4] S. Fiori, T. Kaneko, and T. Tanaka, “Mixed maps for learning a Kolmogoroff-Nagumo-type average element on the compact Stiefel manifold,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 4518–4522. [5] Z. Wen and W. Yin, “A feasible method for optimization with orthogonality constraints,” Mathematical Programming, vol. 142, no. 1-2, pp. 397–434, 2013. [6] D. D. Chang, “Theoretical and performance analysis for integrated randomized singular value decomposition,” Master’s thesis, National Taiwan University, 2017, unpublished. [7] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, et al., ScaLAPACK Users’ Guide. SIAM, 1997. [8] J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, “Communication-optimalparallel and sequential QR and LU factorizations,” SIAM Journal on Scientific Computing, vol. 34, no. 1, pp. A206–A239, 2012. [9] S. Tomov, J. Dongarra, and M. Baboulin, “Towards dense linear algebra for hybrid GPU accelerated manycore systems,” Parallel Computing, vol. 36, no. 5, pp. 232– 240, 2010. [10] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q.Wu, and Y. Wang, “Intel math kernel library,” in High-Performance Computing on the Intel® Xeon Phi. Springer, 2014, pp. 167–188. [11] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, et al., LAPACK Users’ Guide. SIAM, 1999. [12] C. S. Committee and Others, “ISO/IEC 14882: 2011, standard for programming language C++,” Technical report, 2011. http://www.open-std.org/jtc1/sc22/wg21, Tech. Rep., 2011. [13] P. Canning, W. Cook, W. Hill, W. Olthoff, and J. C. Mitchell, “F-bounded polymorphism for object-oriented programming,” in Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture. ACM, 1989, pp. 273–280. [14] J. O. Coplien, “Curiously recurring template patterns,” C++ Report, vol. 7, no. 2, pp. 24–27, 1995. [15] D. Abrahams and A. Gurtovoy, C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond, Portable Documents. Pearson Education, 2004. [16] E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, et al., “Open MPI: Goals, concept, and design of a next generation MPI implementation,” in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2004, pp. 97–104. [17] I. T. Todorov, W. Smith, K. Trachenko, and M. T. Dove, “Dl_poly_3: new dimensions in molecular dynamics simulations via massive parallelism,” Journal of Materials Chemistry, vol. 16, no. 20, pp. 1911–1918, 2006. [18] 1000 Genomes Project Consortium and others, “A global reference for human genetic variation,” Nature, vol. 526, no. 7571, p. 68, 2015. [19] E. Anderson, J. Dongarra, and S. Ostrouchov, LAPACK Working Note 41: Installation Guide for LAPACK. University of Tennessee, Computer Science Department, 1992. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2362 | - |
dc.description.abstract | 低秩近似在大數據分析中佔了重要的地位,整合奇異值分解(Integrated Singular Value Decomposition,iSVD)是一種用於計算大矩陣的低秩近似奇異值分解的演算法。iSVD集成了從多個隨機子空間抽樣而獲得的不同的低秩奇異值分解,並達到更高的精準度和更好的穩定性。雖然多個隨機抽樣與合併的過程需要更高的計算成本,但由於這些操作可以平行化,iSVD 仍然可以節省計算時間。我們在多核心計算集群上平行此演算法,並對計算方法及資料結構進行了修改,以增加可擴展性並減少資料傳輸。透過平行化,iSVD 可以找到巨大矩陣的近似奇異值分解,達到相對於矩陣尺寸和機器數量接近線性的可擴展性,並透過使用 GPU 在抽樣的步驟達到四倍的加速。我們用 C++ 實作此演算法,並應用了幾種提高可維護性、可擴展性和可用性的技術。我們在使用混合 CPU-GPU 的超級電腦系統上使用 iSVD 求解一些大規模的應用問題。 | zh_TW |
dc.description.abstract | Low-rank approximation plays an important role in big data analysis. Integrated Singular Value Decomposition (iSVD) is an algorithm for computing low-rank approximate singular value decomposition of large size matrices. The iSVD integrates different low-rank SVDs obtained by multiple random subspace sketches and achieve higher accuracy and better stability. While iSVD takes higher computational costs due to multiple random sketches and the integration process, these operations can be parallelized to save computational time. We parallelize iSVD for multicore clusters, and modify the algorithms and data structures to increase the scalability and reduce communication. With parallelization, iSVD can find the approximate SVD of matrices with huge size, and achieve near-linear scalability with respect to the matrix size and the number of machines, and gained further 4X faster timing performance on sketching by using GPU. We implement the algorithms in C++, with several techniques for high maintainability, extensibility, and usability. The iSVD is applied on some huge size application using hybrid CPU-GPU supercomputer systems. | en |
dc.description.provenance | Made available in DSpace on 2021-05-13T06:39:27Z (GMT). No. of bitstreams: 1 ntu-106-R04246003-1.pdf: 2224917 bytes, checksum: 93c9c1778d5412e5ff76e3b6428150fc (MD5) Previous issue date: 2017 | en |
dc.description.tableofcontents | 口試委員會審定書 iii
誌謝 v Acknowledgements vii 摘要 ix Abstract xi 1 Introduction 1 2 Algorithms 5 2.1 Integrated Singular Value Decomposition 5 2.2 Stage I: Sketching 6 2.3 Stage II: Orthogonalization 7 2.4 Stage III: Integration 8 2.5 Stage IV: Postprocessing 11 3 Improvements of Integration 15 3.1 Optimization Problem 15 3.2 Improvements of Kolmogorov-Nagumo Integration 17 3.3 Improvements of Wen-Yin Integration 19 3.4 Integration Using the Gramian of Q 22 4 Parallelism 27 4.1 Naïve Parallelism 27 4.2 Block-Row Parallelism for Integration 28 4.3 Block-Row Parallelism for Gramian Integration 32 4.4 Block-Row Parallelism for Sketching, Orthogonalization, and Postprocessing 34 4.5 Block-Column Parallelism 38 4.6 GPU Acceleration 41 5 Comparison of Algorithms 43 6 Implementation 45 6.1 C++ Implementation 45 6.1.1 Object Oriented Programming 45 6.1.2 C++11 Standard 46 6.1.3 Template Meta-Programming 46 6.1.4 Curiously Recurring Template Pattern 46 6.1.5 Data Storage and Linear Algebra Wrappers 47 6.2 Parallelism 47 6.2.1 MPI 47 6.2.2 OpenMP 47 6.2.3 GPU 47 6.3 Tools 48 6.3.1 CMake 48 6.3.2 Google Test 48 6.4 Package Structure 49 7 Numerical Results 51 7.1 Comparison of Parallelisms 51 7.2 Scalability of Integration Algorithms 53 7.3 Implementation and Big Data Applications 55 7.3.1 Comparison with MATLAB 56 7.3.2 Facebook 100k 56 7.3.3 1000 Genomes Project 57 8 Conclusion 59 Bibliography 61 A Derivations 65 A.1 Derivation of Hierarchical Reduction Integration 65 A.2 Derivation of Tall Skinny QR 66 B Complexity 69 B.1 Canonical Methods 70 B.1.1 Y Orthogonalization 71 B.2 Sequential Algorithms 72 B.2.1 Stage I: Sketching 72 B.2.2 Stage II: Orthogonalization 72 B.2.3 Stage III: Integration 73 B.2.4 Stage IV: Postprocessing 76 B.3 Parallel Algorithms 77 B.3.1 Stage I: Sketching (Parallelism) 77 B.3.2 Stage II: Orthogonalization (Parallelism) 78 B.3.3 Stage III: Integration (Parallelism) 79 B.3.4 Stage IV: Postprocessing (Parallelism) 84 | |
dc.language.iso | en | |
dc.title | 利用高度平行之演算法整合多重隨機奇異值分解並應用於巨量資料分析 | zh_TW |
dc.title | Highly Scalable Parallelism of Integrated Randomized Singular Value Decomposition with Big Data Applications | en |
dc.type | Thesis | |
dc.date.schoolyear | 105-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 陳定立(Ting-Li Chen),陳素雲(Su-Yun Huang) | |
dc.subject.keyword | 奇異值分解,平行演算法,分散式演算法,隨機演算法,圖形處理器,大數據分析, | zh_TW |
dc.subject.keyword | Singular value decomposition,Parallel algorithms,Distributed algorithms,Randomized algorithms,Graphics processing units,Big data analysis, | en |
dc.relation.page | 88 | |
dc.identifier.doi | 10.6342/NTU201702960 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2017-08-11 | |
dc.contributor.author-college | 理學院 | zh_TW |
dc.contributor.author-dept | 應用數學科學研究所 | zh_TW |
顯示於系所單位: | 應用數學科學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-1.pdf | 2.17 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。