在叢集伺服器上高效能多維奇異值分解及基於閉曲線積分的特徵值分解

Yu-Hsiang Tsai; 蔡宇翔

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/1358

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	王偉仲(Weichung Wang)
dc.contributor.author	Yu-Hsiang Tsai	en
dc.contributor.author	蔡宇翔	zh_TW
dc.date.accessioned	2021-05-12T09:37:07Z	-
dc.date.available	2019-08-18
dc.date.available	2021-05-12T09:37:07Z	-
dc.date.copyright	2018-08-18
dc.date.issued	2018
dc.date.submitted	2018-08-15
dc.identifier.citation	[1] B. W. Bader and T. G. Kolda. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software (TOMS), 32(4):635–653, 2006. [2] L. De Lathauwer, B. De Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253– 1278, 2000. [3] J. Demmel, L. Grigori, M. Hoemmen, and J. Langou. Communication-optimal parallel and sequential qr and lu factorizations. SIAM Journal on Scientific Computing, 34(1):A206–A239, 2012. [4] J. W. Demmel. Applied numerical linear algebra, volume 56. Siam, 1997. [5] E. Di Napoli, E. Polizzi, and Y. Saad. Efficient estimation of eigenvalue counts in  an interval. Numerical Linear Algebra with Applications, 23(4):674–692, 2016. [6] Y. Futamura, H. Tadano, and T. Sakurai. Parallel stochastic estimation method of  eigenvalue distribution. JSIAM Letters, 2:127–130, 2010. [7] G. H. Golub and C. F. Van Loan. Matrix computations, volume 3. JHU Press, 2012. [8] M. L. Hammock, A. Chortos, B. C.-K. Tee, J. B.-H. Tok, and Z. Bao. 25th anniversary article: the evolution of electronic skin (e-skin): a brief history, design considerations, and recent progress. Advanced materials, 25(42):5997–6038, 2013. [9] T. Hoshi. ELSES matrix library. http://www.elses.jp/matrix/. [10] T. Hoshi, H. Imachi, K. Kumahata, M. Terai, K. Miyamoto, K. Minami, and F. Shoji. Extremely scalable algorithm for 108-atom quantum material simulation on the full system of the k computer. In Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), 2016 7th Workshop on, pages 33–40. IEEE, 2016. [11] T. Hoshi, H. Imachi, S. Yokoyama, and T. Kaji. Numerical linear algebraic problems in large-scale massively parallel electronic state calculations on the k computer. In International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing, 2015. [12] T. Ikegami and T. Sakurai. Contour integral eigensolver for non-hermitian systems: a rayleigh-ritz-type approach. Taiwanese Journal of Mathematics, pages 825–837, 2010. [13] T. Ikegami, T. Sakurai, and U. Nagashima. A filter diagonalization for generalized eigenvalue problems based on the sakurai–sugiura projection method. Journal of Computational and Applied Mathematics, 233(8):1927–1936, 2010. [14] H. Imachi. Numerical Methods for Large-scale Quantum Material Simulations.PhD thesis, Tottori University, 2017. [15] H. Imachi and T. Hoshi. Hybrid numerical solvers for massively parallel eigenvalue computations and their benchmark with electronic structure calculations. Journal of Information Processing, 24(1):164–172, 2016. [16] A. Imakura, L. Du, and T. Sakurai. Accuracy analysis on the rayleigh-ritz type of the contour integral based eigensolver for solving generalized eigenvalue problems. Technical report, Citeseer, 2014. [17] A. Kapteyn, H. Neudecker, and T. Wansbeek. An approach ton-mode components analysis. Psychometrika, 51(2):269–275, 1986. [18] A. Kerr, D. Campbell, and M. Richards. Qr decomposition on gpus. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pages 71–78. ACM, 2009. [19] J. Kestyn, E. Polizzi, and P. T. Peter Tang. Feast eigensolver for non-hermitian problems. SIAM Journal on Scientific Computing, 38(5):S772–S799, 2016. [20] E. Kofidis and P. A. Regalia. On the best rank-1 approximation of higher-order super-symmetric tensors. SIAM Journal on Matrix Analysis and Applications, 23(3):863– 884, 2002. [21] T. G. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009. [22] P. M. Kroonenberg and J. De Leeuw. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika, 45(1):69–97, 1980. [23] J. Levin. Three-mode factor analysis. Psychological Bulletin, 64(6):442, 1965. [24] Y. Maeda, Y. Futamura, and T. Sakurai. Stochastic estimation method of eigenvalue density for nonlinear eigenvalue problem on the complex plane. JSIAM letters, 3:61– 64, 2011. [25] H. Matsui, A. S. Mishchenko, T. Hasegawa, et al. Distribution of localized states from fine analysis of electron spin resonance spectra in organic transistors. Physical review letters, 104(5):056602, 2010. [26] P. T. Peter Tang and E. Polizzi. Feast as a subspace iteration eigensolver accelerated by approximate spectral projection. SIAM Journal on Matrix Analysis and Applications, 35(2):354–390, 2014. [27] E. Polizzi. Density-matrix-based algorithm for solving eigenvalue problems. Physical Review B, 79(11):115112, 2009. [28] Y. Saad. Numerical Methods for Large Eigenvalue Problems: Revised Edition. SIAM, 2011. [29] T. Sakurai, H. Tadano, et al. Cirr: a rayleigh-ritz type method with contour integral for generalized eigenvalue problems. Hokkaido mathematical journal, 36(4):745– 757, 2007. [30] O. Schenk and K. Gärtner. Solving unsymmetric sparse systems of linear equations with pardiso. Future Generation Computer Systems, 20(3):475–487, 2004. [31] A. Stathopoulos. Locking issues for finding a large number of eigenvectors of hermitian matrices. Technical report, Tech Report WM-CS-2005-09, Computer Science, The College of William & Mary, 2005. [32] K. Tanaka, H. Imachi, T. Fukumoto, T. Fukaya, Y. Yamamoto, and T. Hoshi. Eigenkernel-a middleware for parallel generalized eigenvalue solvers to attain high scalability and usability. arXiv preprint arXiv:1806.00741, 2018. [33] L. R. Tucker. Implications of factor analysis of three-way matrices for measurement of change. Problems in measuring change, 15:122–137, 1963. [34] L. R. Tucker. The extension of factor analysis to three-dimensional matrices. Contributions to mathematical psychology, 110119, 1964. [35] L. R. Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966. [36] N. Vannieuwenhoven, R. Vandebril, and K. Meerbergen. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing, 34(2):A1027–A1052, 2012. [37] M. A. O. Vasilescu and D. Terzopoulos. Multilinear analysis of image ensembles: Tensorfaces. In European Conference on Computer Vision, pages 447–460. Springer, 2002. [38] E. Wang, Q. Zhang, B. Shen, G. Zhang, X. Lu, Q. Wu, and Y. Wang. Intel math kernel library. In High-Performance Computing on the Intel® Xeon PhiTM, pages 167–188. Springer, 2014. [39] Y. Xi and Y. Saad. Computing partial spectra with least-squares rational filters. SIAM Journal on Scientific Computing, 38(5):A3020–A3045, 2016.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/handle/123456789/1358	-
dc.description.abstract	我們改進了'多維奇異值分解'及'基於閉曲線積分的特徵值分解'。在'多維奇異值分解'中，我們使用不同的方式實作兩個演算法，並提升解決巨量資料的能力。隨著資料越來越多樣，找到方法去快速且有效得壓縮和分析資料中的多角關係對於高效能計算中是非常重要的。我們實做了兩個已經存在的演算法：多維奇異值分解及逐步降維多維奇異值分解將多維矩陣分解，利用 GPU 來加速他們是非常困難的，因為我們幾乎無法把資料一次放進 GPU 的記憶體當中。所以我們利用 QR 和 Gram 來協助我們縮減問題的大小，而我們實作 QR 和 Gram 是將資料分成一部份一部份來解決，這樣不但能讓 GPU 有能力處理，還能利用計算的部分遮蓋掉資料搬移的時間，最後我們相對於原先利用 cuda 函式庫時做的能加速 163.21 倍，在未來希望能夠將這個方法使用在實際問題上。在'基於閉曲線積分的特徵值分解'中，我們藉由基於閉曲線積分的特徵值分解，展示了一個分治法的處理方式去解決區域中有太多的特徵對的問題，並應用它在有機材料模擬。在許多應用上，解特徵值分解扮演了重要的角色，這些矩陣通常都是稀疏的大矩陣，但通常實際上我們只需要一小部分的特徵對，現在有 FEAST 或 CIRR 能夠幫忙解決這類的問題，然而當區域中有許多特徵對時，會變得非常慢，所以必須將問題切成許多小問題。決定分區雖然困難但非常重要，要讓每個小問題都能被輕鬆解決，另外當有些特徵對收料的比較早時，原先的方法還是會花時間繼續迭代他們。我們介紹了兩種分區方式：藉由預測特徵對的數量來分解、藉由先備知識來分解。我們提升解決區域中有過多的特徵對問題，且藉由恰當的分區，分治法會比較原先的還快，也利用凍結技巧去讓程式不要花費時間在已經收斂的特徵對上。之後想設計一個方法能夠讓各個分區解決的時間差不多。	zh_TW
dc.description.abstract	We implement, accelerate, and improve 'Higher-Order Singular Value Decomposition' and 'Contour Integral based Eigen Decomposition'. In 'Higher-Order Singular Value Decomposition', we implemented two methods with different strategies and improved the ability to solve the large tensor problem. With the explosion of big data, finding ways of compressing and analyzing large data sets with the multi-way relationship - i.e., tensors - quickly and efficiently have become critical in High-Performance Computing. We implement two existed methods which are Higher-Order Singular Value Decomposition and Sequential Truncated Higher-Order Singular Value Decomposition to achieve Tucker Decomposition. Implementing them with GPU is very difficult because we usually can not store the whole tensor into GPU memory. We use QR method and Gram method to reduce the problem size to make its size allowed by GPU memory. We also implemented QR method and Gram by part-by-part. It can help us to solve the large data problem and use computing to cover data transferring. Finally, We achieve 163.21x speedup over a CUDA library-based solution. In the future, we want to apply it to the real application. In 'Contour Integral based Eigen Decomposition', we proposed a divide-and-conquer flow to solving the certain eigenpairs in the specific region containing many eigenpairs with eigensolver based on contour integral with the locking technique, and use it to solve the generalized eigenvalue problem from the organic material simulation. Solving eigenvalue problems is an essential part of many applications. Those matrices are often large and sparse, but the eigenpairs only are required in the region of interest. Several solvers can solve the eigenpairs in the selected region such as FEAST and CIRR. When there are many eigenpairs in the selected region, the performance is slow, so the partition of the region is needed. Deciding the partition is very difficult but critical such that solving each sub-region should be efficient. When some eigenvector is converged early, the solver still spends time on them. We introduce the two partition method, uniform dividing by the estimated eigenvalue number and dividing by domain acknowledgment. We increase the eigensolver ability to solve the region containing many eigenpairs and get better performance with the proper partition. We also use the locking technique to avoid spending the time on converged eigenpairs. In the future, we would like to design an automatic flow to generate the partition whose sub-region spends almost the same executing time.	en
dc.description.provenance	Made available in DSpace on 2021-05-12T09:37:07Z (GMT). No. of bitstreams: 1 ntu-107-R05246003-1.pdf: 5632435 bytes, checksum: ba16d36d7b63e3c760d4e92dd479b936 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	誌謝 v Acknowledgements vii 摘要 ix Abstract xi I Higher-Order Singular Value Decomposition 1 1 Introduction 3 2 Preliminary 5 2.1 Tensor Formula............................... 5 2.2 Tucker Decomposition ........................... 7 3 Methods and Results 9 3.1 Algorithm.................................. 10 3.2 cuSOLVER SVD .............................. 11 3.3 QR method ................................. 11 3.4 Householder QR .............................. 13 3.5 Modified Block QR............................. 14 3.6 Tall Skinny QR ............................... 16 3.7 Gram Method................................ 18 3.8 Error..................................... 19 4 Implementation 21 4.1 Transpose.................................. 21 4.2 CUDA Stream................................ 22 4.3 Multiple GPUs ............................... 23 5 Conclusion 25 II Contour Integral based Eigenvalue Decomposition 27 6 Introduction 29 7 Preliminary 33 7.1 Eigensolver based on Contour Integral................... 33 7.2 Quadrature rules............................... 34 8 Theorem 37 8.1 Deflation .................................. 37 8.2 Theorem................................... 38 8.3 Cases .................................... 39 9 Algorithm 41 9.1 Estimation.................................. 41 9.2 FEAST ................................... 41 9.3 LockingTechnique ............................. 42 9.4 FEAST with locking on general propose.................. 43 9.5 ProcessingFlow............................... 43 10 Implementation 45 10.1 Linear Solver ................................ 45 10.2 Data Structure................................ 46 10.3 Partition List ................................ 46 11 Partition 49 11.1 Partitionand Estimation .......................... 49 11.2 Conquering Partition ............................ 50 12 Results 53 12.1 Application ................................. 53 12.2 Environment ................................ 54 12.3 MKL FEAST vs FEAST vs FEAST with locking . . . . . . . . . . . . . 54 12.4 Locking Effect ............................... 56 12.5 Dividing Partition.............................. 57 12.6 Conquering Partition ............................ 57 13 Conclusion 59 Bibliography 61
dc.language.iso	en
dc.subject	巨量資料	zh_TW
dc.subject	多維陣列	zh_TW
dc.subject	多維奇異值分解	zh_TW
dc.subject	特徵值問題	zh_TW
dc.subject	凍結	zh_TW
dc.subject	閉曲線積分	zh_TW
dc.subject	加速	zh_TW
dc.subject	Locking	en
dc.subject	Big Data	en
dc.subject	Acceralating	en
dc.subject	Contour Integral	en
dc.subject	Tensor	en
dc.subject	Higher-Order Singular Value Decomposition	en
dc.subject	Generalized Eigenvalue Problem	en
dc.title	在叢集伺服器上高效能多維奇異值分解及基於閉曲線積分的特徵值分解	zh_TW
dc.title	Efficient Higher-Order Singular Value Decomposition and Contour Integral Based Eigen Decomposition on CPU/GPU Cluster	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃聰明,陳素雲
dc.subject.keyword	多維陣列,多維奇異值分解,特徵值問題,凍結,閉曲線積分,加速,巨量資料,	zh_TW
dc.subject.keyword	Tensor,Higher-Order Singular Value Decomposition,Generalized Eigenvalue Problem,Locking,Contour Integral,Acceralating,Big Data,	en
dc.relation.page	65
dc.identifier.doi	10.6342/NTU201803532
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2018-08-15
dc.contributor.author-college	理學院	zh_TW
dc.contributor.author-dept	應用數學科學研究所	zh_TW
顯示於系所單位：	應用數學科學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf	5.5 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。