Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309| Title: | 於異質性平台上之高度平行化序列模式探勘 Highly Parallel Sequential Pattern Mining on a Heterogeneous Platform |
| Authors: | Yu-Heng Hsieh 謝友恆 |
| Advisor: | 陳銘憲 |
| Keyword: | 資料探勘,序列模式探勘,通用圖形處理器,異質性平台,平行化計算,高效能計算, Data Mining,Sequential Pattern Mining,GPGPU,Heterogeneous Platform,Parallel Computing,High Performance Computing, |
| Publication Year : | 2018 |
| Degree: | 碩士 |
| Abstract: | 序列模式探勘可以被應用在多個領域上,例如疾病偵測及股票價格預測。雖然已經有許多演算法與加速方法被提出來解決序列模式探勘的問題,但考量支持度計算簡單又重複的特性,我們認為由中央處理器以及圖形處理器組成的異質性平台相較於傳統由中央處理器為主的機器更適合用來探勘序列模式。因此,本篇論文提出於異質性平台上之高度平行序列模式探勘演算法 (PASTA),嘗試結合圖形處理器以及中央處理器的優點以加速序列模式探勘。我們提出的PASTA採用垂直位圖資料表現方式,並用優化的前沿展開演算法探索搜尋空間。此外,我們也將演算法流程管線化以確保中央處理器與圖形處理器能同時工作而進一步提升演算法效率並設計記憶體交換機制以緩解圖形處理器上記憶體有限的問題。最後,我們比較並分析PASTA在不同資料集上與過去最好演算法的表現差異。實驗結果顯示PASTA無論在合成資料或真實資料上皆較過去最好的演算法好數十至數百倍。 Sequential pattern mining can be applied to various fields such as disease prediction and stock analysis. Many different algorithms have been proposed for sequential pattern mining, together with acceleration methods. In this thesis, we argue that a heterogeneous platform with CPU and GPU is much more suitable for sequential pattern mining than traditional CPU-based approaches since the support counting process is inherently succinct and repetitive. Therefore, we propose the PArallel SequenTial pAttern mining algorithm, referred to as PASTA, to accelerate sequential pattern mining by combining the merits of CPU and GPU computing. Explicitly, PASTA adopts the vertical bitmap representation of database with an optimized expansion strategy to effectively explore and reduce the search space. In addition, a pipeline strategy is proposed to ensure that both CPU and CPU on the heterogeneous platform operate concurrently to fully utilize the computing power of the platform. Furthermore, we develop a dynamic swapping scheme to mitigate the limited memory problem of the GPU hardware without decreasing the performance. Finally, a comprehensive experiment and a sensitivity test are conducted to analyze PASTA with different baselines. The experiments show that PASTA outperforms the state-of-the-art algorithms by orders of magnitude on both synthetic and real datasets. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309 |
| DOI: | 10.6342/NTU201803546 |
| Fulltext Rights: | 有償授權 |
| Appears in Collections: | 電機工程學系 |
Files in This Item:
| File | Size | Format | |
|---|---|---|---|
| ntu-107-1.pdf Restricted Access | 3.02 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
