時間序列資料庫中多重解析度頻繁樣式之資料探勘

Huei-Ping Tzeng; 曾惠萍

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43733

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李瑞庭(J. T. Lee)
dc.contributor.author	Huei-Ping Tzeng	en
dc.contributor.author	曾惠萍	zh_TW
dc.date.accessioned	2021-06-15T02:27:10Z	-
dc.date.available	2012-08-19
dc.date.copyright	2009-08-19
dc.date.issued	2009
dc.date.submitted	2009-08-17
dc.identifier.citation	[1] R. Agrawal and R. Srikant, Mining sequential patterns, Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 1995, pp. 3-14. [2] R. Srikant and R. Agrawal, Mining sequential patterns: Generalizations and performance improvements, Proceedings of 5th International Conference Extending Database Technology, Avignon, France, 1996, pp. 3-17. [3] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proceedings of the International Conference Very Large Data Bases, Santiago, Chile, 1994, pp. 487-499. [4] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal and M. C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth, Proceedings of International Conference Data Engineering, Heidelberg, Germany, 2001, pp. 215-224. [5] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, Freespan: Frequent pattern-projected sequential pattern mining, Proceedings of International Conference on Knowledge Discovery and Data Mining, Boston, MA, 2000, pp. 355-359. [6] M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Machine Learning, Vol. 42, No. 1-2, 2001, pp. 31-60. [7] M.Y. Lin, S.Y. Lee, and S.S. Wang, DELISP: Efficient discovery of generalized sequential patterns by delimited pattern-growth technology, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Taipei, Taiwan, 2002, pp. 189-209. [8] Y. L. Chen and Y.H. Hu, Constraint-based sequential pattern mining: the consideration of recency and compactness, Decision Support Systems, Vol. 42, No. 2, 2006, pp. 1203-1215. [9] F. Gianotti, M. Nanni, and D. Pedreschi, Efficient mining of temporally annotated sequences, Proceedings of the 6th SIAM International Conference on Data Mining, 2006, pp. 346-357. [10] R. Ziembiński, Algorithms for context based sequential pattern mining, Fundamenta Informaticae, Vol. 76, No. 4, 2007, pp. 495–510. [11] F. Masseglia, P. Poncelet and M. Teisseire, Efficient mining of sequential patterns with time constraints: Reducing the combinations, Expert System with Applications, Vol. 36, No. 2, 2009, pp. 2677-2690. [12] H. F. Li and S.Y. Lee, Mining frequent itemsets over data streams using efficient window sliding techniques, Expert System with Applications, Vol. 36, No. 2, 2009, pp. 1466-1477. [13] H. C. Kum, J. H. Chang and W. Wang, Sequential pattern mining in multi-databases via multiple alignment, Data Mining & Knowledge Discovery, Vol. 12, No. 1, 2006, pp. 151-180. [14] J. Han, G. Dong and Y. Yin, Efficient mining of partial periodic patterns in time series database, Proceedings of International Conference Data Engineering, Sydney, Australia, 1999, pp. 106-115. [15] A. Udechukwu, K. Barker K. and R. Alhajj, Discovering all frequent trends in time series, Proceedings of the Winter International Symposium on Information and Communication Technologies, 2004, pp. 1–6. [16] C. Xie, H. Tan and X. Yu, Trend feature mining algorithm based on financial time series, Proceedings of the 4th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, 2007, pp. 97-102. [17] S. Papadimitriou and P. S. Yu, Optimal multi-scale patterns in time series streams, Proceedings of ACM SIGMOD International Conference on Management of Data, 2006, pp. 647-658. [18] C. H. Chen, T. P. Hong and V. S. Tseng, Mining fuzzy frequent trends from time series, Expert Systems with Applications, Vol. 36, No. 2, 2009, pp. 4147-4153. [19] T. C. Fu, F. L. Chung, R. Luk and C. M. Ng, Financial time series indexing based on low resolution clustering, Proceedings of the Workshop at the 4th International Conference on Data Mining, 2004, pp. 5-14. [20] W. S. Han, J. Lee, Y. S. Moon and H. Jiang, Ranked subsequence matching in time-series databases, Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp. 423-434. [21] J. Lin, E. Keogh, S. Lonardi and B. Chiu, A symbolic representation of time series, with implications for streaming algorithms, Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, San Diego, CA, 2003, pp. 2-11. [22] E. Keogh, J. Lin, and A. Fu, HOT SAX: Efficient finding the most unusual time series subsequence, Proceedings of the 5th IEEE International Conference on Data Mining, 2005, pp. 226-233. [23] J. Lin and E. Keogh, Group SAX: Extending the notion of contrast sets to time series and multimedia data, Proceedings of the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin, Germany, 2006, pp. 284-296. [24] J. Shieh and E. Leogh, iSAX: Indexing and mining terabyte sized time series, Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, 2008, pp. 623-631 [25] G. Das, K. I. Lin and H. Mannila, Rule discovery from time series, Proceedings of the 3rd International Conference of Knowledge Discovery and Data, 1998, pp.16-22. [26] 'U' ̈. Lepik, Numerical solution of differential equations using Haar wavelets, Mathematics and Computers in Simulation, Vol. 68, 2003, pp.127-143.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/43733	-
dc.description.abstract	近年來，時間序列特性的資料以蓬勃的速度被廣泛地應用在各個領域中，例如財務資料分析、網路流量分析或科學數據的處理等等。從時間序列資料庫中找尋不同解析度的頻繁樣式，可以幫助科學家或是財務分析師判斷發展趨勢與獲得有價值的資訊。因此，在本篇論文中，我們提出一個有效率的探勘演算法叫做「MFP-Miner」。可以從時間序列資料庫中，找尋不同解析度的頻繁樣式。我們所提出的演算法主要包括三個階段。首先，我們將資料庫由高解析度轉換為低解析度。然後，我們從轉換後的資料庫中找出所有長度為1的頻繁樣式並建立其映射資料庫。最後，我們利用頻繁樣式樹以深先搜尋法的方式遞迴產生所有的頻繁樣式，並列舉出在高解析度資料庫中所有的頻繁樣式。在探勘過程中，MFP-Miner利用映射資料庫來計算支持度並使用有效的修剪策略刪除不必要的候選樣式，所以可以有效率地從時間序列的資料庫中，找出所有不同解析度的頻繁樣式。實驗結果顯示，不論在合成資料或是真實資料中，我們所提出的方法皆比改良式的Apriori演算法更有效率、更具擴充性。	zh_TW
dc.description.abstract	Time series data have been generated at an unprecedented speed from almost every application domain in the last decade, e.g., financial data analysis, network traffic analysis, scientific data processing, etc. Mining multi-resolution frequent patterns in time series databases can help scientists or financial analysts analyze the trends of data and obtain valuable information. Therefore, in this thesis, we propose an efficient algorithm, MFP-Miner (Mining Frequent Patterns Miner), to mine multi-resolution frequent patterns in time-series databases. Our proposed method consists of three phases. First, we transform the original database into a database in the low resolution and obtain the transformed database. Second, we find frequent 1-patterns from the transformed database and construct a projected database for each frequent 1-pattern found. Third, we recursively generate frequent patterns by a frequent pattern tree in a depth-first search manner and enumerate all frequent patterns in the original database. Since the MFP-Miner employs projected databases to localize the support counting and pattern mining, and utilizes effective pruning strategies to remove unnecessary candidates during the mining process, it can efficiently mine all multi-resolution frequent patterns in time-series databases. The experiment results show that the proposed method is more efficient and scalable than the Apriori modified.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T02:27:10Z (GMT). No. of bitstreams: 1 ntu-98-R96725016-1.pdf: 558130 bytes, checksum: e4a7a4500f695650ad21f5f4aaf8360a (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	Table of Contents i List of Figures iii List of Tables v Chapter 1 Introduction 1 Chapter 2 Preliminaries and Problem Definitions 8 Chapter 3 The Proposed Method 10 3.1 Frequent pattern tree 10 3.2 Sequence transformation 11 3.3 Generating frequent 1-patterns 12 3.4 Mining frequent k-patterns 13 3.5 The pruning strategies 14 3.6 Restoring frequent patterns in the original database 16 3.7 The proposed algorithm 17 3.8 An example 20 Chapter 4 Performance Evaluation 23 4.1 Synthetic datasets 23 4.2 Performance evaluation on synthetic datasets 24 4.3 Performance evaluation on real datasets 28 Chapter 5 Conclusions and Future Work 34 References 36
dc.language.iso	en
dc.subject	頻繁性樣式	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	時間序列資料庫	zh_TW
dc.subject	data mining	en
dc.subject	frequent patterns	en
dc.subject	time series database	en
dc.title	時間序列資料庫中多重解析度頻繁樣式之資料探勘	zh_TW
dc.title	Mining Multi-resolution Frequent Patterns in Time-series Databases	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	陳彥良(Yen-Liang Chen),劉敦仁(Duen-Ren Liu)
dc.subject.keyword	資料探勘,時間序列資料庫,頻繁性樣式,	zh_TW
dc.subject.keyword	data mining,time series database,frequent patterns,	en
dc.relation.page	38
dc.rights.note	有償授權
dc.date.accepted	2009-08-17
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	545.05 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。