利用高度平行之演算法整合多重隨機奇異值分解並應用於巨量資料分析

Mu Yang; 楊慕

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2362

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王偉仲(Weichung Wang)
dc.contributor.author	Mu Yang	en
dc.contributor.author	楊慕	zh_TW
dc.date.accessioned	2021-05-13T06:39:27Z	-
dc.date.available	2017-08-14
dc.date.available	2021-05-13T06:39:27Z	-
dc.date.copyright	2017-08-14
dc.date.issued	2017
dc.date.submitted	2017-08-11
dc.identifier.citation	[1] N. Halko, P.-G. Martinsson, and J. A. Tropp, “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions,” SIAM review, vol. 53, no. 2, pp. 217–288, 2011. [2] V. Rokhlin, A. Szlam, and M. Tygert, “A randomized algorithm for principal component analysis,” SIAM Journal on Matrix Analysis and Applications, vol. 31, no. 3, pp. 1100–1124, 2009. [3] T.-L. Chen, D. D. Chang, S.-Y. Huang, H. Chen, C. Lin, and W. Wang, “Integrating multiple random sketches for singular value decomposition,” arXiv preprint arXiv:1608.08285, 2016. [4] S. Fiori, T. Kaneko, and T. Tanaka, “Mixed maps for learning a Kolmogoroff-Nagumo-type average element on the compact Stiefel manifold,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp. 4518–4522. [5] Z. Wen and W. Yin, “A feasible method for optimization with orthogonality constraints,” Mathematical Programming, vol. 142, no. 1-2, pp. 397–434, 2013. [6] D. D. Chang, “Theoretical and performance analysis for integrated randomized singular value decomposition,” Master’s thesis, National Taiwan University, 2017, unpublished. [7] L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, et al., ScaLAPACK Users’ Guide. SIAM, 1997. [8] J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, “Communication-optimalparallel and sequential QR and LU factorizations,” SIAM Journal on Scientific Computing, vol. 34, no. 1, pp. A206–A239, 2012. [9] S. Tomov, J. Dongarra, and M. Baboulin, “Towards dense linear algebra for hybrid GPU accelerated manycore systems,” Parallel Computing, vol. 36, no. 5, pp. 232– 240, 2010. [10] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q.Wu, and Y. Wang, “Intel math kernel library,” in High-Performance Computing on the Intel® Xeon Phi. Springer, 2014, pp. 167–188. [11] E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, et al., LAPACK Users’ Guide. SIAM, 1999. [12] C. S. Committee and Others, “ISO/IEC 14882: 2011, standard for programming language C++,” Technical report, 2011. http://www.open-std.org/jtc1/sc22/wg21, Tech. Rep., 2011. [13] P. Canning, W. Cook, W. Hill, W. Olthoff, and J. C. Mitchell, “F-bounded polymorphism for object-oriented programming,” in Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture. ACM, 1989, pp. 273–280. [14] J. O. Coplien, “Curiously recurring template patterns,” C++ Report, vol. 7, no. 2, pp. 24–27, 1995. [15] D. Abrahams and A. Gurtovoy, C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond, Portable Documents. Pearson Education, 2004. [16] E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, et al., “Open MPI: Goals, concept, and design of a next generation MPI implementation,” in Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer, 2004, pp. 97–104. [17] I. T. Todorov, W. Smith, K. Trachenko, and M. T. Dove, “Dl_poly_3: new dimensions in molecular dynamics simulations via massive parallelism,” Journal of Materials Chemistry, vol. 16, no. 20, pp. 1911–1918, 2006. [18] 1000 Genomes Project Consortium and others, “A global reference for human genetic variation,” Nature, vol. 526, no. 7571, p. 68, 2015. [19] E. Anderson, J. Dongarra, and S. Ostrouchov, LAPACK Working Note 41: Installation Guide for LAPACK. University of Tennessee, Computer Science Department, 1992.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/2362	-
dc.description.abstract	低秩近似在大數據分析中佔了重要的地位，整合奇異值分解（Integrated Singular Value Decomposition，iSVD）是一種用於計算大矩陣的低秩近似奇異值分解的演算法。iSVD集成了從多個隨機子空間抽樣而獲得的不同的低秩奇異值分解，並達到更高的精準度和更好的穩定性。雖然多個隨機抽樣與合併的過程需要更高的計算成本，但由於這些操作可以平行化，iSVD 仍然可以節省計算時間。我們在多核心計算集群上平行此演算法，並對計算方法及資料結構進行了修改，以增加可擴展性並減少資料傳輸。透過平行化，iSVD 可以找到巨大矩陣的近似奇異值分解，達到相對於矩陣尺寸和機器數量接近線性的可擴展性，並透過使用 GPU 在抽樣的步驟達到四倍的加速。我們用 C++ 實作此演算法，並應用了幾種提高可維護性、可擴展性和可用性的技術。我們在使用混合 CPU-GPU 的超級電腦系統上使用 iSVD 求解一些大規模的應用問題。	zh_TW
dc.description.abstract	Low-rank approximation plays an important role in big data analysis. Integrated Singular Value Decomposition (iSVD) is an algorithm for computing low-rank approximate singular value decomposition of large size matrices. The iSVD integrates different low-rank SVDs obtained by multiple random subspace sketches and achieve higher accuracy and better stability. While iSVD takes higher computational costs due to multiple random sketches and the integration process, these operations can be parallelized to save computational time. We parallelize iSVD for multicore clusters, and modify the algorithms and data structures to increase the scalability and reduce communication. With parallelization, iSVD can find the approximate SVD of matrices with huge size, and achieve near-linear scalability with respect to the matrix size and the number of machines, and gained further 4X faster timing performance on sketching by using GPU. We implement the algorithms in C++, with several techniques for high maintainability, extensibility, and usability. The iSVD is applied on some huge size application using hybrid CPU-GPU supercomputer systems.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T06:39:27Z (GMT). No. of bitstreams: 1 ntu-106-R04246003-1.pdf: 2224917 bytes, checksum: 93c9c1778d5412e5ff76e3b6428150fc (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	口試委員會審定書 iii 誌謝 v Acknowledgements vii 摘要 ix Abstract xi 1 Introduction 1 2 Algorithms 5 2.1 Integrated Singular Value Decomposition 5 2.2 Stage I: Sketching 6 2.3 Stage II: Orthogonalization 7 2.4 Stage III: Integration 8 2.5 Stage IV: Postprocessing 11 3 Improvements of Integration 15 3.1 Optimization Problem 15 3.2 Improvements of Kolmogorov-Nagumo Integration 17 3.3 Improvements of Wen-Yin Integration 19 3.4 Integration Using the Gramian of Q 22 4 Parallelism 27 4.1 Naïve Parallelism 27 4.2 Block-Row Parallelism for Integration 28 4.3 Block-Row Parallelism for Gramian Integration 32 4.4 Block-Row Parallelism for Sketching, Orthogonalization, and Postprocessing 34 4.5 Block-Column Parallelism 38 4.6 GPU Acceleration 41 5 Comparison of Algorithms 43 6 Implementation 45 6.1 C++ Implementation 45 6.1.1 Object Oriented Programming 45 6.1.2 C++11 Standard 46 6.1.3 Template Meta-Programming 46 6.1.4 Curiously Recurring Template Pattern 46 6.1.5 Data Storage and Linear Algebra Wrappers 47 6.2 Parallelism 47 6.2.1 MPI 47 6.2.2 OpenMP 47 6.2.3 GPU 47 6.3 Tools 48 6.3.1 CMake 48 6.3.2 Google Test 48 6.4 Package Structure 49 7 Numerical Results 51 7.1 Comparison of Parallelisms 51 7.2 Scalability of Integration Algorithms 53 7.3 Implementation and Big Data Applications 55 7.3.1 Comparison with MATLAB 56 7.3.2 Facebook 100k 56 7.3.3 1000 Genomes Project 57 8 Conclusion 59 Bibliography 61 A Derivations 65 A.1 Derivation of Hierarchical Reduction Integration 65 A.2 Derivation of Tall Skinny QR 66 B Complexity 69 B.1 Canonical Methods 70 B.1.1 Y Orthogonalization 71 B.2 Sequential Algorithms 72 B.2.1 Stage I: Sketching 72 B.2.2 Stage II: Orthogonalization 72 B.2.3 Stage III: Integration 73 B.2.4 Stage IV: Postprocessing 76 B.3 Parallel Algorithms 77 B.3.1 Stage I: Sketching (Parallelism) 77 B.3.2 Stage II: Orthogonalization (Parallelism) 78 B.3.3 Stage III: Integration (Parallelism) 79 B.3.4 Stage IV: Postprocessing (Parallelism) 84
dc.language.iso	en
dc.subject	大數據分析	zh_TW
dc.subject	奇異值分解	zh_TW
dc.subject	平行演算法	zh_TW
dc.subject	分散式演算法	zh_TW
dc.subject	隨機演算法	zh_TW
dc.subject	圖形處理器	zh_TW
dc.subject	Singular value decomposition	en
dc.subject	Big data analysis	en
dc.subject	Graphics processing units	en
dc.subject	Randomized algorithms	en
dc.subject	Distributed algorithms	en
dc.subject	Parallel algorithms	en
dc.title	利用高度平行之演算法整合多重隨機奇異值分解並應用於巨量資料分析	zh_TW
dc.title	Highly Scalable Parallelism of Integrated Randomized Singular Value Decomposition with Big Data Applications	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳定立(Ting-Li Chen),陳素雲(Su-Yun Huang)
dc.subject.keyword	奇異值分解,平行演算法,分散式演算法,隨機演算法,圖形處理器,大數據分析,	zh_TW
dc.subject.keyword	Singular value decomposition,Parallel algorithms,Distributed algorithms,Randomized algorithms,Graphics processing units,Big data analysis,	en
dc.relation.page	88
dc.identifier.doi	10.6342/NTU201702960
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2017-08-11
dc.contributor.author-college	理學院	zh_TW
dc.contributor.author-dept	應用數學科學研究所	zh_TW
顯示於系所單位：	應用數學科學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf	2.17 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。