於異質性平台上之高度平行化序列模式探勘

Yu-Heng Hsieh; 謝友恆

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲
dc.contributor.author	Yu-Heng Hsieh	en
dc.contributor.author	謝友恆	zh_TW
dc.date.accessioned	2021-06-17T06:34:38Z	-
dc.date.available	2021-08-18
dc.date.copyright	2018-08-18
dc.date.issued	2018
dc.date.submitted	2018-08-16
dc.identifier.citation	[1] R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993. [2] R. Agrawal and R. Srikant. Quest synthetic data generator. IBM Almaden Research Center, 1994. [3] W. Andrzejewski and R. Wrembel. Gpu-wah: Applying gpus to compressing bitmap indexes with word aligned hybrid. In DEXA, 2010. [4] C. Antunes and A. Oliveira. Sequential pattern mining algorithms: trade-offs between speed and memory, 2004. [5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In KDD, 2002. [6] K. Beedkar and R. Gemulla. Lash: Large-scale sequence mining with hierarchies. In SIGMOD, 2015. [7] P. Bertoldi, M. Avgerinou, and L. Castellazzi. Trends in data centre energy consumption under the european code of conduct for data centre energy efficiency. 2017. [8] G. Buehrer, R. L. de Oliveira, D. Fuhry, and S. Parthasarathy. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In ICDE. IEEE, 2015. [9] P. Cellier, T. Charnois, M. Plantevit, C. Rigotti, B. Crémilleux, O. Gandrillon, J. Kléma, and J.-L. Manguin. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts. J Biomed Semantics, 2015. [10] C.-C. Chen, H.-H. Shuai, and M.-S. Chen. Distributed and scalable sequential pattern mining through stream processing. KAIS, 2017. [11] S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, 2007. [12] P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas. Fast vertical mining of sequential patterns using co-occurrence information. In PAKDD, 2014. [13] P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C.-W. Wu, V. S. Tseng, et al. Spmf: a java open-source pattern mining library. JMLR, 2014. [14] P. Fournier-Viger, C.-W. Wu, A. Gomariz, and V. S. Tseng. Vmsp: Efficient vertical mining of maximal sequential patterns. In AI 2018, pages 83–94, 2014. [15] P. Fournier-Viger, C.-W. Wu, and V. S. Tseng. Mining maximal sequential patterns without candidate maintenance. In ADMA, 2013. [16] D. Fradkin and F. Mörchen. Mining sequential patterns for classification. KAIS, 2015. [17] F. Fumarola, P. F. Lanotte, M. Ceci, and D. Malerba. Clofast: closed sequential pattern mining using sparse and vertical id-lists. KAIS, 2016. [18] J. Ge, Y. Xia, and J. Wang. Mining uncertain sequential patterns in iterative mapreduce. In PAKDD, 2015. [19] A. Gomariz, M. Campos, R. Marin, and B. Goethals. Clasp: An efficient algorithm for mining frequent closed sequences. In PAKDD, 2013. [20] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD, 2000. [21] J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001. [22] M. Harris. Optimizing cuda. SC07, 2007. [23] B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, 2008. [24] N. Höft, H. Schulz, and S. Behnke. Fast semantic segmentation of rgb-d scenes with gpu-accelerated deep neural networks. In KI, 2014. [25] S. Hong and H. Kim. An integrated gpu power and performance model. In SIGARCH, 2010. [26] K. Hryniów. Parallel pattern mining-application of gsp algorithm for graphics processing units. In ICCC, 2012. [27] K. Hryńiow. Parallel pattern mining on graphics processing units. In ICCC, 2013. [28] A. Jakkam and C. Busso. A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In ICASSP, 2016. [29] Y.-Y. Jo, S.-W. Kim, and D.-H. Bae. Efficient sparse matrix multiplication on gpu for large social network analysis. In CIKM, 2015. [30] J. Koomey. Growth in data center electricity use 2005 to 2010. A report by Analytical Press, completed at the request of The New York Times, 2011. [31] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. Gpuwattch: enabling energy optimizations in gpgpus. In SIGARCH, 2013. [32] Y. Li, C.-Y. Chow, K. Deng, M. Yuan, J. Zeng, J.-D. Zhang, Q. Yang, and Z.-L. Zhang. Sampling big trajectory data. In CIKM, 2015. [33] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. Nvidia tesla: A unified graphics and computing architecture. IEEE micro, 2008. [34] H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, 2016. [35] C. Nvidia. Programming guide, 2010. [36] S. S. L. Oskouei, H. Golestani, M. Kachuee, M. Hashemi, H. Mohammadzade, and S. Ghiasi. Gpu-based acceleration of deep convolutional neural networks on mobile platforms. ACMMM, 2016. [37] R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, 1996. [38] J. Wang and J. Han. Bide: Efficient mining of frequent closed sequences. In ICDE, 2004. [39] Q. Wang, V. S. Sheng, and X. Wu. Keyphrase extraction with sequential pattern mining. In AAAI, 2017. [40] X. Wang, J. Wang, T. Wang, H. Li, and D. Yang. Parallel sequential pattern mining by transaction decomposition. In ICNC-FSKD, 2010. [41] L. Weng, F. Menczer, and Y.-Y. Ahn. Virality prediction and community structure in social networks. arXiv preprint arXiv:1306.0158, 2013. [42] A. P. Wright, A. T. Wright, A. B. McCoy, and D. F. Sittig. The use of sequential pattern mining to predict next prescribed medications. J Biomed Inform, 2015. [43] K. Wu, E. J. Otoo, and A. Shoshani. An efficient compression scheme for bitmap indices. 2004. [44] M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. MACH LEARN, 2001. [45] F. Zhang, Y. Zhang, and J. D. Bakos. Accelerating frequent itemset mining on graphics processing units. J Supercomput, 2013.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309	-
dc.description.abstract	序列模式探勘可以被應用在多個領域上，例如疾病偵測及股票價格預測。雖然已經有許多演算法與加速方法被提出來解決序列模式探勘的問題，但考量支持度計算簡單又重複的特性，我們認為由中央處理器以及圖形處理器組成的異質性平台相較於傳統由中央處理器為主的機器更適合用來探勘序列模式。因此，本篇論文提出於異質性平台上之高度平行序列模式探勘演算法 (PASTA)，嘗試結合圖形處理器以及中央處理器的優點以加速序列模式探勘。我們提出的PASTA採用垂直位圖資料表現方式，並用優化的前沿展開演算法探索搜尋空間。此外，我們也將演算法流程管線化以確保中央處理器與圖形處理器能同時工作而進一步提升演算法效率並設計記憶體交換機制以緩解圖形處理器上記憶體有限的問題。最後，我們比較並分析PASTA在不同資料集上與過去最好演算法的表現差異。實驗結果顯示PASTA無論在合成資料或真實資料上皆較過去最好的演算法好數十至數百倍。	zh_TW
dc.description.abstract	Sequential pattern mining can be applied to various fields such as disease prediction and stock analysis. Many different algorithms have been proposed for sequential pattern mining, together with acceleration methods. In this thesis, we argue that a heterogeneous platform with CPU and GPU is much more suitable for sequential pattern mining than traditional CPU-based approaches since the support counting process is inherently succinct and repetitive. Therefore, we propose the PArallel SequenTial pAttern mining algorithm, referred to as PASTA, to accelerate sequential pattern mining by combining the merits of CPU and GPU computing. Explicitly, PASTA adopts the vertical bitmap representation of database with an optimized expansion strategy to effectively explore and reduce the search space. In addition, a pipeline strategy is proposed to ensure that both CPU and CPU on the heterogeneous platform operate concurrently to fully utilize the computing power of the platform. Furthermore, we develop a dynamic swapping scheme to mitigate the limited memory problem of the GPU hardware without decreasing the performance. Finally, a comprehensive experiment and a sensitivity test are conducted to analyze PASTA with different baselines. The experiments show that PASTA outperforms the state-of-the-art algorithms by orders of magnitude on both synthetic and real datasets.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T06:34:38Z (GMT). No. of bitstreams: 1 ntu-107-R05921040-1.pdf: 3094495 bytes, checksum: 819258a43b31a37ed4bfbe05e6de5a24 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	致謝 i 中文摘要 ii Abstract iii Contents iv List of Figures vi List of Tables viii 1 Introduction 1 2 Preliminary 6 2.1 Sequential Pattern Mining 6 2.2 GPGPU 6 2.3 Related Work 8 2.4 SPAM 10 2.5 Frontier Expansion Strategy 12 3 PArallel SequenTial pAttern mining (PASTA) 13 3.1 Overview 14 3.2 Detailed Framework Design 15 3.3 Pipeline Strategy 21 3.4 Complexity Analysis 21 4 Experimental Results 25 4.1 Performance of acceleration 25 4.2 Performance analysis with different dataset parameters 26 4.3 Experiments on GPU parameters 29 4.4 Effect of pipeline strategy and swapping mechanism 30 5 Trade-off between memory limitation and performance/energy consumption 33 5.1 Relationship between the memory limitation and the size of swapped memory 34 5.2 Relationship between size of swapped memory and algorithm performance 35 5.3 The effect of swapping mechanism on energy consumption 37 6 Future Work 42 6.1 Bitmap Compressing 42 6.2 Optimal Tree Traversal 43 6.3 Hierarchical Sequential Pattern Mining 43 7 Conclusion 45 Bibliography 46
dc.language.iso	en
dc.subject	高效能計算	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	平行化計算	zh_TW
dc.subject	異質性平台	zh_TW
dc.subject	通用圖形處理器	zh_TW
dc.subject	序列模式探勘	zh_TW
dc.subject	High Performance Computing	en
dc.subject	Sequential Pattern Mining	en
dc.subject	GPGPU	en
dc.subject	Heterogeneous Platform	en
dc.subject	Parallel Computing	en
dc.subject	Data Mining	en
dc.title	於異質性平台上之高度平行化序列模式探勘	zh_TW
dc.title	Highly Parallel Sequential Pattern Mining on a Heterogeneous Platform	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	廖弘源,楊得年,帥宏翰,陳怡伶
dc.subject.keyword	資料探勘,序列模式探勘,通用圖形處理器,異質性平台,平行化計算,高效能計算,	zh_TW
dc.subject.keyword	Data Mining,Sequential Pattern Mining,GPGPU,Heterogeneous Platform,Parallel Computing,High Performance Computing,	en
dc.relation.page	50
dc.identifier.doi	10.6342/NTU201803546
dc.rights.note	有償授權
dc.date.accepted	2018-08-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	3.02 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。