Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 電機工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳銘憲
dc.contributor.authorYu-Heng Hsiehen
dc.contributor.author謝友恆zh_TW
dc.date.accessioned2021-06-17T06:34:38Z-
dc.date.available2021-08-18
dc.date.copyright2018-08-18
dc.date.issued2018
dc.date.submitted2018-08-16
dc.identifier.citation[1] R. Agrawal, T. Imieliński, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[2] R. Agrawal and R. Srikant. Quest synthetic data generator. IBM Almaden Research Center, 1994.
[3] W. Andrzejewski and R. Wrembel. Gpu-wah: Applying gpus to compressing bitmap indexes with word aligned hybrid. In DEXA, 2010.
[4] C. Antunes and A. Oliveira. Sequential pattern mining algorithms: trade-offs between speed and memory, 2004.
[5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In KDD, 2002.
[6] K. Beedkar and R. Gemulla. Lash: Large-scale sequence mining with hierarchies. In SIGMOD, 2015.
[7] P. Bertoldi, M. Avgerinou, and L. Castellazzi. Trends in data centre energy consumption under the european code of conduct for data centre energy efficiency. 2017.
[8] G. Buehrer, R. L. de Oliveira, D. Fuhry, and S. Parthasarathy. Towards a parameter-free and parallel itemset mining algorithm in linearithmic time. In ICDE. IEEE, 2015.
[9] P. Cellier, T. Charnois, M. Plantevit, C. Rigotti, B. Crémilleux, O. Gandrillon, J. Kléma, and J.-L. Manguin. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts. J Biomed Semantics, 2015.
[10] C.-C. Chen, H.-H. Shuai, and M.-S. Chen. Distributed and scalable sequential pattern mining through stream processing. KAIS, 2017.
[11] S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In EMNLP-CoNLL, 2007.
[12] P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas. Fast vertical mining of sequential patterns using co-occurrence information. In PAKDD, 2014.
[13] P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C.-W. Wu, V. S. Tseng, et al. Spmf: a java open-source pattern mining library. JMLR, 2014.
[14] P. Fournier-Viger, C.-W. Wu, A. Gomariz, and V. S. Tseng. Vmsp: Efficient vertical mining of maximal sequential patterns. In AI 2018, pages 83–94, 2014.
[15] P. Fournier-Viger, C.-W. Wu, and V. S. Tseng. Mining maximal sequential patterns without candidate maintenance. In ADMA, 2013.
[16] D. Fradkin and F. Mörchen. Mining sequential patterns for classification. KAIS, 2015.
[17] F. Fumarola, P. F. Lanotte, M. Ceci, and D. Malerba. Clofast: closed sequential pattern mining using sparse and vertical id-lists. KAIS, 2016.
[18] J. Ge, Y. Xia, and J. Wang. Mining uncertain sequential patterns in iterative mapreduce. In PAKDD, 2015.
[19] A. Gomariz, M. Campos, R. Marin, and B. Goethals. Clasp: An efficient algorithm for mining frequent closed sequences. In PAKDD, 2013.
[20] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: frequent pattern-projected sequential pattern mining. In KDD, 2000.
[21] J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, 2001.
[22] M. Harris. Optimizing cuda. SC07, 2007.
[23] B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD, 2008.
[24] N. Höft, H. Schulz, and S. Behnke. Fast semantic segmentation of rgb-d scenes with gpu-accelerated deep neural networks. In KI, 2014.
[25] S. Hong and H. Kim. An integrated gpu power and performance model. In SIGARCH, 2010.
[26] K. Hryniów. Parallel pattern mining-application of gsp algorithm for graphics processing units. In ICCC, 2012.
[27] K. Hryńiow. Parallel pattern mining on graphics processing units. In ICCC, 2013.
[28] A. Jakkam and C. Busso. A multimodal analysis of synchrony during dyadic interaction using a metric based on sequential pattern mining. In ICASSP, 2016.
[29] Y.-Y. Jo, S.-W. Kim, and D.-H. Bae. Efficient sparse matrix multiplication on gpu for large social network analysis. In CIKM, 2015.
[30] J. Koomey. Growth in data center electricity use 2005 to 2010. A report by Analytical Press, completed at the request of The New York Times, 2011.
[31] J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi. Gpuwattch: enabling energy optimizations in gpgpus. In SIGARCH, 2013.
[32] Y. Li, C.-Y. Chow, K. Deng, M. Yuan, J. Zeng, J.-D. Zhang, Q. Yang, and Z.-L. Zhang. Sampling big trajectory data. In CIKM, 2015.
[33] E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. Nvidia tesla: A unified graphics and computing architecture. IEEE micro, 2008.
[34] H. Liu, H. H. Huang, and Y. Hu. ibfs: Concurrent breadth-first search on gpus. In SIGMOD, 2016.
[35] C. Nvidia. Programming guide, 2010.
[36] S. S. L. Oskouei, H. Golestani, M. Kachuee, M. Hashemi, H. Mohammadzade, and S. Ghiasi. Gpu-based acceleration of deep convolutional neural networks on mobile platforms. ACMMM, 2016.
[37] R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, 1996.
[38] J. Wang and J. Han. Bide: Efficient mining of frequent closed sequences. In ICDE, 2004.
[39] Q. Wang, V. S. Sheng, and X. Wu. Keyphrase extraction with sequential pattern mining. In AAAI, 2017.
[40] X. Wang, J. Wang, T. Wang, H. Li, and D. Yang. Parallel sequential pattern mining by transaction decomposition. In ICNC-FSKD, 2010.
[41] L. Weng, F. Menczer, and Y.-Y. Ahn. Virality prediction and community structure in social networks. arXiv preprint arXiv:1306.0158, 2013.
[42] A. P. Wright, A. T. Wright, A. B. McCoy, and D. F. Sittig. The use of sequential pattern mining to predict next prescribed medications. J Biomed Inform, 2015.
[43] K. Wu, E. J. Otoo, and A. Shoshani. An efficient compression scheme for bitmap indices. 2004.
[44] M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. MACH LEARN, 2001.
[45] F. Zhang, Y. Zhang, and J. D. Bakos. Accelerating frequent itemset mining on graphics processing units. J Supercomput, 2013.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309-
dc.description.abstract序列模式探勘可以被應用在多個領域上,例如疾病偵測及股票價格預測。雖然已經有許多演算法與加速方法被提出來解決序列模式探勘的問題,但考量支持度計算簡單又重複的特性,我們認為由中央處理器以及圖形處理器組成的異質性平台相較於傳統由中央處理器為主的機器更適合用來探勘序列模式。因此,本篇論文提出於異質性平台上之高度平行序列模式探勘演算法 (PASTA),嘗試結合圖形處理器以及中央處理器的優點以加速序列模式探勘。我們提出的PASTA採用垂直位圖資料表現方式,並用優化的前沿展開演算法探索搜尋空間。此外,我們也將演算法流程管線化以確保中央處理器與圖形處理器能同時工作而進一步提升演算法效率並設計記憶體交換機制以緩解圖形處理器上記憶體有限的問題。最後,我們比較並分析PASTA在不同資料集上與過去最好演算法的表現差異。實驗結果顯示PASTA無論在合成資料或真實資料上皆較過去最好的演算法好數十至數百倍。zh_TW
dc.description.abstractSequential pattern mining can be applied to various fields such as disease prediction and stock analysis.
Many different algorithms have been proposed for sequential pattern mining, together with acceleration methods. In this thesis, we argue that a heterogeneous platform with CPU and GPU is much more suitable for sequential pattern mining than traditional CPU-based approaches since the support counting process is inherently succinct and repetitive.
Therefore, we propose the PArallel SequenTial pAttern mining algorithm, referred to as PASTA, to accelerate sequential pattern mining by combining the merits of CPU and GPU computing.
Explicitly, PASTA adopts the vertical bitmap representation of database with an optimized expansion strategy to effectively explore and reduce the search space. In addition, a pipeline strategy is proposed to ensure that both CPU and CPU on the heterogeneous platform operate concurrently to fully utilize the computing power of the platform. Furthermore, we develop a dynamic swapping scheme to mitigate the limited memory problem of the GPU hardware without decreasing the performance.
Finally, a comprehensive experiment and a sensitivity test are conducted to analyze PASTA with different baselines. The experiments show that PASTA outperforms the state-of-the-art algorithms by orders of magnitude on both synthetic and real datasets.
en
dc.description.provenanceMade available in DSpace on 2021-06-17T06:34:38Z (GMT). No. of bitstreams: 1
ntu-107-R05921040-1.pdf: 3094495 bytes, checksum: 819258a43b31a37ed4bfbe05e6de5a24 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents致謝 i
中文摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
2 Preliminary 6
2.1 Sequential Pattern Mining 6
2.2 GPGPU 6
2.3 Related Work 8
2.4 SPAM 10
2.5 Frontier Expansion Strategy 12
3 PArallel SequenTial pAttern mining (PASTA) 13
3.1 Overview 14
3.2 Detailed Framework Design 15
3.3 Pipeline Strategy 21
3.4 Complexity Analysis 21
4 Experimental Results 25
4.1 Performance of acceleration 25
4.2 Performance analysis with different dataset parameters 26
4.3 Experiments on GPU parameters 29
4.4 Effect of pipeline strategy and swapping mechanism 30
5 Trade-off between memory limitation and performance/energy consumption 33
5.1 Relationship between the memory limitation and the size of swapped memory 34
5.2 Relationship between size of swapped memory and algorithm performance 35
5.3 The effect of swapping mechanism on energy consumption 37
6 Future Work 42
6.1 Bitmap Compressing 42
6.2 Optimal Tree Traversal 43
6.3 Hierarchical Sequential Pattern Mining 43
7 Conclusion 45
Bibliography 46
dc.language.isoen
dc.subject高效能計算zh_TW
dc.subject資料探勘zh_TW
dc.subject平行化計算zh_TW
dc.subject異質性平台zh_TW
dc.subject通用圖形處理器zh_TW
dc.subject序列模式探勘zh_TW
dc.subjectHigh Performance Computingen
dc.subjectSequential Pattern Miningen
dc.subjectGPGPUen
dc.subjectHeterogeneous Platformen
dc.subjectParallel Computingen
dc.subjectData Miningen
dc.title於異質性平台上之高度平行化序列模式探勘zh_TW
dc.titleHighly Parallel Sequential Pattern Mining on a Heterogeneous Platformen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee廖弘源,楊得年,帥宏翰,陳怡伶
dc.subject.keyword資料探勘,序列模式探勘,通用圖形處理器,異質性平台,平行化計算,高效能計算,zh_TW
dc.subject.keywordData Mining,Sequential Pattern Mining,GPGPU,Heterogeneous Platform,Parallel Computing,High Performance Computing,en
dc.relation.page50
dc.identifier.doi10.6342/NTU201803546
dc.rights.note有償授權
dc.date.accepted2018-08-16
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept電機工程學研究所zh_TW
顯示於系所單位:電機工程學系

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
3.02 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved