請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309
標題: | 於異質性平台上之高度平行化序列模式探勘 Highly Parallel Sequential Pattern Mining on a Heterogeneous Platform |
作者: | Yu-Heng Hsieh 謝友恆 |
指導教授: | 陳銘憲 |
關鍵字: | 資料探勘,序列模式探勘,通用圖形處理器,異質性平台,平行化計算,高效能計算, Data Mining,Sequential Pattern Mining,GPGPU,Heterogeneous Platform,Parallel Computing,High Performance Computing, |
出版年 : | 2018 |
學位: | 碩士 |
摘要: | 序列模式探勘可以被應用在多個領域上,例如疾病偵測及股票價格預測。雖然已經有許多演算法與加速方法被提出來解決序列模式探勘的問題,但考量支持度計算簡單又重複的特性,我們認為由中央處理器以及圖形處理器組成的異質性平台相較於傳統由中央處理器為主的機器更適合用來探勘序列模式。因此,本篇論文提出於異質性平台上之高度平行序列模式探勘演算法 (PASTA),嘗試結合圖形處理器以及中央處理器的優點以加速序列模式探勘。我們提出的PASTA採用垂直位圖資料表現方式,並用優化的前沿展開演算法探索搜尋空間。此外,我們也將演算法流程管線化以確保中央處理器與圖形處理器能同時工作而進一步提升演算法效率並設計記憶體交換機制以緩解圖形處理器上記憶體有限的問題。最後,我們比較並分析PASTA在不同資料集上與過去最好演算法的表現差異。實驗結果顯示PASTA無論在合成資料或真實資料上皆較過去最好的演算法好數十至數百倍。 Sequential pattern mining can be applied to various fields such as disease prediction and stock analysis. Many different algorithms have been proposed for sequential pattern mining, together with acceleration methods. In this thesis, we argue that a heterogeneous platform with CPU and GPU is much more suitable for sequential pattern mining than traditional CPU-based approaches since the support counting process is inherently succinct and repetitive. Therefore, we propose the PArallel SequenTial pAttern mining algorithm, referred to as PASTA, to accelerate sequential pattern mining by combining the merits of CPU and GPU computing. Explicitly, PASTA adopts the vertical bitmap representation of database with an optimized expansion strategy to effectively explore and reduce the search space. In addition, a pipeline strategy is proposed to ensure that both CPU and CPU on the heterogeneous platform operate concurrently to fully utilize the computing power of the platform. Furthermore, we develop a dynamic swapping scheme to mitigate the limited memory problem of the GPU hardware without decreasing the performance. Finally, a comprehensive experiment and a sensitivity test are conducted to analyze PASTA with different baselines. The experiments show that PASTA outperforms the state-of-the-art algorithms by orders of magnitude on both synthetic and real datasets. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/72309 |
DOI: | 10.6342/NTU201803546 |
全文授權: | 有償授權 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 3.02 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。