請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8755
標題: | 時間序列資料庫之封閉性樣式探勘 Mining Closed Patterns in Time-Series Databases |
作者: | Huei-Wen Wu 吳惠雯 |
指導教授: | 李瑞庭 |
關鍵字: | 資料探勘,封閉性樣式,時間序列資料庫, Data mining,Closed pattern,Time-series database, |
出版年 : | 2010 |
學位: | 博士 |
摘要: | 近年來,探勘封閉性樣式已成為知識探索領域中重要的研究議題,其主要目的在於找出隱藏在大量資料中具有代表性的樣式。本論文針對如何從時間序列資料庫中探勘封閉性樣式,提出了三個有效率的演算法,分別為CMP (Closed Multi-sequence Patterns mining)、 CFP (Closed Flexible Patterns mining)與CNP (multi-resolution Closed Numerical Patterns mining)。
CMP演算法主要著重於多時間序列資料庫中樣式的分析,而CFP演算法可從時間序列資料庫中探勘具「彈性間隔」的封閉性樣式。CMP與CFP演算法先將時間序列轉換成符號序列,然後再探勘封閉性樣式。將時間序列轉換成符號序列可減少雜訊並簡化探勘程序,然而可能導致樣式遺失或差異很大的序列卻支持相同樣式的問題。 為避免將時間序列轉換到符號序列所造成的問題,CNP演算法可從多時間序列資料庫中直接找出具有代表性的樣式。此外,CNP演算法加入多層解析度 (multi-resolution)的概念以找出封閉性樣式,可讓使用者以不同的觀點來檢視資料。 上述三個演算法皆以深度優先探勘的方式,並配合投影資料庫以減少搜尋空間,除了加速探勘的過程外,透過有效的修剪策略及檢查封閉性的機制,可避免產生不必要的候選樣式。 本研究結果顯示CMP演算法比改良式Apriori以及BIDE演算法快了數十倍; CFP演算法比改良式Apriori演算法更有效率; CNP演算法執行速度亦較改良式A-Close演算法快。 Closed pattern mining is a critical research issue in the area of knowledge discovery and data mining with the aim of discovering interesting patterns hidden in a large amount of data. In this dissertation, we propose three algorithms, called CMP (Closed Multi-sequence Patterns mining), CFP (Closed Flexible Patterns mining), and CNP (multi-resolution Closed Numerical Patterns mining) to solve various issues extended from the problem of mining closed patterns. The CMP algorithm is designed to find closed patterns in a multi-sequence time-series database. The CFP algorithm is developed to solve the problem of mining closed flexible patterns in a time-series database. Both the CMP and CFP algorithms involve a transformation of time-series sequences into symbolic sequences in the first phase. Although analyzing on symbolic sequences is ideal to reduce the effect of noises and ease the mining process, these approaches may lead to pattern lost and the sequences supporting the same pattern may look quite different. To overcome the problem raised in symbolic sequence analysis, the CNP algorithm is proposed to mine closed patterns without any transformation from time-series sequences to symbolic sequences. The method also employs the Haar wavelet transform to discover patterns in the multiple resolutions in order to provide different perspectives on datasets. All the proposed algorithms have employed the concept of projected databases to localize the pattern extension that leads to a significant runtime improvement. Moreover, effective closure checking schemes and pruning strategies are devised respectively in each of the proposed algorithms to avoid generating redundant candidates. The experimental results show that the CMP algorithm significantly outperforms the modified Apriori and BIDE algorithms. The CFP algorithm achieves better performance than the modified Apriori algorithm in all cases. And, the CNP algorithm has demonstrated a significant runtime improvement in comparison to the modified A-Close algorithm. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/8755 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 資訊管理學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf | 1.02 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。