有效率的跨交易關聯規則探勘演算法

Chen-Sheng Wang; 王春笙

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27512

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李瑞庭(Anthony J.T. Lee)
dc.contributor.author	Chen-Sheng Wang	en
dc.contributor.author	王春笙	zh_TW
dc.date.accessioned	2021-06-12T18:07:50Z	-
dc.date.available	2008-01-02
dc.date.copyright	2008-01-02
dc.date.issued	2007
dc.date.submitted	2007-12-19
dc.identifier.citation	[1] R.C. Agarwal, C.C. Aggarwal, V.V.V. Prasad, A tree projection algorithm for generation of frequent itemsets, Journal of Parallel and Distributed Computing 61 (3) (2001) 350-371. [2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, An interval classifier for database mining applications, in: Proceedings of the International Conference on Very Large Data Bases, 1992, pp. 560-573. [3] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1993, pp. 207–216. [4] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of International Conference on Very Large Data Bases, 1994, pp. 487–499. [5] C. Apte, B. Liu, E. P.D. Pednault, and P. Smyth, Business applications of data mining, Communications of the ACM 45(8) (2002) 49–53. [6] R.J. Bayardo, Efficiently mining long patterns from databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1998, pp. 85–93. [7] B. Berendt, M. Spiliopoulou, Analysis of navigation behaviour in web sites integrating multiple information systems, The VLDB Journal 9 (2000) 56-75. [8] S. Brin, R. Motwani, J. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1997, pp. 255-264. [9] G. Chen, Q. Wei, Fuzzy association rules and the extended mining algorithms, Information Sciences 147 (1–4) (2002) 201–228. [10] M.S. Chen, J. Han, P. Yu, Data mining: an overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering 8 (6) (1996) 866-883. [11] M.S. Chen, J.S Park, and P.S. Yu, Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineering 10(2) (1998) 209-221. [12] Y.L. Chen, S.S. Chen, P.Y. Hsu, Mining hybrid sequential patterns and sequential rules, Information Systems 27 (5) (2002) 345-362. [13] Y.L. Chen, J.M. Chen, C.W. Tung, A data mining approach for retail knowledge discovery with consideration of the effect of shelf-space adjacency on sales, Decision Support Systems 42 (3) (2006) 1503-1520. [14] Y.L. Chen, W.H. Hsu, Y.H. Lee, TASC: Two-attribute-set clustering through decision tree construction, European Journal of Operational Research 174 (2) (2006) 930-944. [15] Y.L. Chen, H.L. Hu, An overlapping cluster algorithm to provide non-exhaustive clustering, European Journal of Operational Research 173 (3) (2006) 762-780. [16] Y.L. Chen, Y.H. Hu, Constraint-based sequential pattern mining: The consideration of recency and compactness, Decision Support Systems 42 (2) (2006) 1203-1215. [17] Y.L. Chen, T.C.K. Huang, A new approach for discovering fuzzy quantitative sequential patterns in sequence databases, Fuzzy Sets and Systems 157 (12) (2006) 1641-1661. [18] Y.L. Chen, C.C. Shen, Mining generalized knowledge from ordered data through attribute-oriented induction techniques, European Journal of Operational Research 166 (1) (2005) 221-245. [19] Y.L. Chen, K. Tang, R.J. Shen, Y.H. Hu, Market basket analysis in a multiple store environment, Decision Support Systems 40 (2) (2005) 339-354. [20] Y.L. Chen, C.C. Yu, Mining sequential patterns from multidimensional sequence data, IEEE Transactions on Knowledge and Data Engineering 17(1) (2005) 136-140. [21] D.K.Y. Chiu, A.K.C. Wong, Multiple pattern associations for interpreting structural and functional characteristics of biomolecules, Information Sciences 167 (1–4) (2004) 23–39. [22] U. Fayyad, R. Uthurusamy, Data mining and knowledge discovery in databases, Communications of the ACM 39(11) (1996) 24-25. [23] L. Feng, T.S. Dillon, J. Liu, Inter-transactional association rules for multi-dimensional contests for prediction and their application to studying meteorological data, Data and Knowledge Engineering 37 (1) (2001) 85–115. [24] L. Feng, J.X. Yu, H. Lu, J. Han, A template model for multidimensional inter-transactional association rules, The VLDB Journal 11 (2) (2002) 153–175. [25] V. Ganti, J. Gehrke, R. Ranakrishnan, Mining very large databases, IEEE Computer 32 (8) (1999) 38-45. [26] F. Geerts, B. Goethals, J. Bussche, A tight upper bound on the number of candidate patterns, in: Proceedings of International Conference on Data Mining, 2001, pp. 155–162. [27] P. Giudici, R. Castelo, Association models for web mining, Data Mining and Knowledge Discovery 5 (2001) 183-196. [28] G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets, in: Proceedings of the IEEE ICDM Workshop in Frequent Itemset Mining Implementations, 2003, pp. 123–132. [29] T. Hamrouni, S. BenYahia, Y. Slimani, Avoiding the itemset closure computation “pitfall”, in: Proceedings of the 3rd International Conference on Concept Lattices and their Applications, 2005, pp. 46-59. [30] J. Han, Y. Fu, Discovery of multiple-level association rules from large databases, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 420-431. [31] J. Han, M. Kamber, Data mining: Concepts and Techniques, Morgan Kaufmann, San Fransisco, 2001. [32] J. Han, M. Kamber, Data mining: Concepts and Techniques Second Edition, Morgan Kaufmann, San Fransisco, 2006. [33] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the ACM-SIGMOD International Conference on Management of Data, 2000, pp. 1-12. [34] J. Han, J. Wang, Y. Lu, P. Tzvetkov, Mining top-k frequent closed patterns without minimum support, in: Proceedings of the 2nd International Conference on Data Mining, 2002, pp. 211-218. [35] P.Y. Hsu, Y.L. Chen, C.C. Ling, Algorithms for mining association rules in bag databases, Information Sciences 166 (1–4) (2004) 31–47. [36] W.H. Hsu, J.A. Jao, Y.L. Chen, Discovering conjecturable rules through tree-based clustering analysis, Expert Systems with Applications 29 (3) (2005) 493-505. [37] Y.H. Hu, Y.L. Chen, Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism, Decision Support Systems 42 (1) (2006) 1-24. [38] K.Y. Huang, C.H. Chang, K.Z. Lin, ClosedPROWL: Efficient mining of closed frequent continuities by projected window list technology, in: Proceedings of the 5th SIAM International Conference on Data Mining, 2005. [39] R. Kohavi, C. Broadley, B. Frasca, L. Mason, Z. Zheng, KDD-cup 2000 organizers’ report: peeling the onion, SIGKDD Explorations 2 (2) (2000) 86–98. [40] A.J.T. Lee, R.W. Hong, W.M. Ko, W.K. Tsao, H.H. Lin, Mining spatial association rules in image databases, Information Sciences 177 (7) (2007) 1593–1608. [41] A.J.T. Lee, W.C. Lin, C.S. Wang, Mining association rules with multi-dimensional constraints, The Journal of Systems and Software 79 (1) (2006) 79–92. [42] A.J.T. Lee, C.S. Wang, An efficient algorithm for mining frequent inter-transaction patterns, Information Sciences 177 (17) (2007) 3453–3476. [43] A.J.T. Lee, Y.T. Wang, Efficient data mining for calling path patterns in GSM networks, Information Systems 28 (8) (2003) 929–948. [44] S.J. Lee, K. Siau, A review of data mining techniques, Industrial Management & Data Systems 101(1) (2001) 41-46. [45] Q. Li, L. Feng, A. Wong, From intra-transaction to generalized inter-transaction: landscaping multidimensional contexts in association rule mining, Information Sciences 172 (3–4) (2005) 361–395. [46] T.S. Lim, W.Y. Loh, Y.S. Shih, A comparison of prediction accuracy complexity and training time of thirty-three old and new classification algorithms, Machine Learning 40(3) (2000) 203-228. [47] D. Littau, D. Boley, Streaming data reduction using low-memory factored representations, Information Sciences 176 (14) (2006) 2016–2041. [48] D.R. Liu, C. Hsu, Project-based knowledge maps: combining project mining and xml-enabled topic maps, Internet Research 14 (3) (2004) 254-266. [49] D.R. Liu, C.K. Ke, Knowledge support for problem-solving in a production process: A hybrid of knowledge discovery and case-based reasoning, Expert Systems with Applications 33(1) (2007) 147-161. [50] D.R. Liu, C.K. Ke, J.Y. Lee, C.F. Lee, Knowledge maps for composite e-services: A mining-based system platform coupling with recommendations, Expert Systems with Applications 34 (1) (2008) 700-716. [51] D.R. Liu, Y.Y. Shih, Integrating AHP and data mining for product recommendation based on customer lifetime value, Information & Management 42 (3) (2005) 387-400. [52] D.R. Liu, Y.Y. Shih, Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences, Journal of Systems and Software 77 (2) (2005) 181-191. [53] D.R. Liu, I.C. Wu, K.S. Yang, Task-based k-support system: disseminating and sharing task-relevant knowledge, Expert Systems with Applications 29 (2) (2005) 408-423. [54] H. Lu, J. Han, L. Feng, Stock movement prediction and n-dimensional inter-transaction association rules, in: Proceedings of ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge, 1998, pp. 1–7. [55] H. Lu, L. Feng, J. Han, Beyond intratransaction association analysis: mining multidimensional inter-transaction association rules, ACM Transactions on Information Systems 18 (4) (2000) 423–454. [56] B. Lucchese, S. Orlando, R. Perego, Fast and memory efficient mining of frequent closed itemsets, IEEE Transactions on Knowledge and Data Engineering 18 (1) (2006) 21-36. [57] H.D.K. Moonestinghe, S. Fodeh, P. N. Tan, Frequent closed itemset mining using prefix graphs with an efficient flow-based pruning strategy, in: Proceedings of the 6th International Conference on Data Mining, 2006, pp. 426-435. [58] J.S. Park, M.S. Chen, P.S. Yu, An effective hash-based algorithm for mining association rules, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1995, pp. 175-186. [59] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Efficient mining of association rules using closed itemset lattices, Information Systems 24 (1) 1999 25-46. [60] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proceedings of the 5th International Conference on Database Theory, 1999, pp. 398-416. [61] J. Pei, J. Han, R. Mao, Closet: An efficient algorithm for mining frequent closed itemsets, in: Proceedings of the 5th ACM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 11-20. [62] J. Pei, J. Han, L.V.S. Lakshmanan, Mining frequent itemsets with convertible constraints, in: Proceedings of IEEE International Conference on Data Engineering, 2001, pp. 433–332. [63] A. Savasere, E. Omiecinski, S. Navathe, An efficient algorithm for mining association rules in large databases, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 432–443. [64] Li. Shen, Hong Shen, Ling Cheng, New algorithms for efficient mining of association rules, Information Sciences 118 (1–4) (1999) 251–268. [65] M. Shen, D.R. Liu, Discovering role-relevant process-views for disseminating process knowledge, Expert Systems with Applications 26 (3) (2004) 301-310. [66] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, D. Shah, Turbo-charging vertical mining of large databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 2000, pp. 22–33. [67] N.G. Singh, S.R. Singh, A.K. Mahanta, CloseMiner: Discovering frequent closed itemsets using frequent closed tidsets, in: Proceedings of the 5th International Conference on Data Mining, 2005, pp. 633-636. [68] R. Srikant, R. Agrawal, Mining generalized association rules, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 407-419. [69] G. Sudipto, R. Rajeev, and S. Kyuseok, ROCK: A robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345-366. [70] H. Toivonen, Sampling large databases for association rules, in: Proceedings of the International Conference on Very Large Data Bases, 1996, pp. 134-145. [71] Y.J. Tsay, J.Y. Chiang, An efficient cluster and decomposition algorithm for mining association rules, Information Sciences 160 (1–4) (2004) 161–171. [72] A.K.H. Tung, H. Lu, J. Han, L. Feng, Breaking the barrier of transactions: mining inter-transaction association rules, in: Proceedings of ACM International Conference on Knowledge and Data Discovery, 1999, pp. 423–454. [73] A.K.H. Tung, H. Lu, J. Han, L. Feng, Efficient mining of intertransaction association rules, IEEE Transactions on Knowledge and Data Engineering 15 (1) (2003) 43–56. [74] T. Uno, T. Asai, Y. Uchida, H. Arimura, An efficient algorithm for enumerating closed patterns in transaction databases, in: Proceedings of the 7th International Conference on Discovery Science, 2004, pp. 16-31. [75] C.Y. Wang, S.S. Tseng, T.P. Hong, Flexible online association rule mining based on multidimensional pattern relations, Information Sciences 176 (12) (2006) 1752–1780. [76] J. Wang, J. Han, J. Pei, Closet+: Searching for the best strategies for mining frequent closed itemsets, in: Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 236-245. [77] S.B. Yahia, T. Hamrouni, E.M. Nguifo, Frequent closed itemset based algorithms: a thorough structural and analytical survey, ACM SIGKDD Explorations Newsletter 8 (1) (2006) 93-104. [78] X. Yin, J. Han, CPAR: Classification based on predictive association rules, in: Proceedings of 2003 SIAM International Conference on Data Mining, 2003, pp. 331-335. [79] M.J. Zaki, Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering, 12 (3) (2000) 372-390. [80] M.J. Zaki, Generating non-redundant association rules, in: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 34-43. [81] M.J. Zaki, K. Gouda, Fast vertical mining using diffsets, in: Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 326-335. [82] M.J. Zaki, C.J. Hsiao, Efficient algorithms for mining closed itemsets and their lattice structure, IEEE Transactions on Knowledge and Data Engineering 17 (4) (2005) 462-478. [83] Yahoo, Finance, Available from: http://yahoo.com.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27512	-
dc.description.abstract	在本論文中，我們針對探勘跨交易關聯規則，提出了二個有效率的演算法，稱為ITP-Miner (Inter-Transaction Patterns Miner)以及CITP-Miner (Closed Inter-Transaction Patterns Miner)。我們對這二個演算法設計了三種資料結構： ID-pair儲存跨交易樣式探勘所需的資訊； ITP-tree能夠列舉並找出所有的頻繁跨交易樣式；以及CITP-tree能夠列舉並找出所有的封閉式頻繁跨交易樣式。 ITP-Miner演算法使用ID-pair以及ITP-tree以探勘所有頻繁跨交易樣式。由於使用了ITP-tree，ITP-Miner只需讀取資料庫一遍並且將合併，修剪，及支持度的計算限定於少量的ID-pair。實驗結果顯示ITP-Miner演算法比FITI演算法快了數十倍。 CITP-Miner演算法使用ID-pair以及CITP-tree以探勘所有封閉式頻繁跨交易樣式。由於使用了CITP-tree，CITP-Miner能夠使用有效的修剪策略以避免耗時的候選樣式產生以及重覆的支持度計算。實驗結果顯示CITP-Miner演算法比FITI以及ClosedPROWL演算法快了數十倍。另外，我們比較ITP-Miner以及CITP-Miner演算法，由於CITP-Miner使用有效的修剪策略找出封閉式頻繁跨交易樣式，而封閉式頻繁跨交易樣式的數量通常少於ITP-Miner所找出所有的頻繁跨交易樣式，所以實驗結果顯示在大部份的情況下，CITP-Miner演算法執行時間較ITP-Miner演算法快，然而卻會消秏較多的記憶體。	zh_TW
dc.description.abstract	In this dissertation, we propose two efficient algorithms, called ITP-Miner (Inter-Transaction Patterns Miner) and CITP-Miner (Closed Inter-Transaction Patterns Miner), for mining inter-transaction association rules. We devise three data structures for both algorithms: an ID-pair stores the information used to find inter-transaction patterns; an ITP-tree enumerates and discovers frequent inter-transaction patterns; and a CITP-tree enumerates and discovers closed frequent inter-transaction patterns. The ITP-Miner algorithm uses the ID-pairs and the ITP-tree to mine all frequent inter-transaction patterns. By using the ITP-tree, the ITP-Miner requires only one database scan and can localize joining, pruning, and support counting to a small number of ID-pairs. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. The CITP-Miner algorithm uses the ID-pairs and the CITP-tree to mine all closed frequent inter-transaction patterns. By using the CITP-tree, the CITP-Miner can embed effective pruning strategies to avoid costly candidate generation and repeated support counting. The experiment results show that the CITP-Miner algorithm outperforms the FITI and ClosedPROWL algorithms by one order of magnitude. We also compare the ITP-Miner and CITP-Miner algorithms. Since the CITP-Miner uses effective pruning strategies for mining all closed frequent inter-transaction patterns, and the number of patterns mined by the CITP-Miner may be much less than that mined by the ITP-Miner, the experiment results show that in most of the cases, the CITP-Miner algorithm outperforms the ITP-Miner algorithm in terms of execution time, but it consumes more amount of main memory.	en
dc.description.provenance	Made available in DSpace on 2021-06-12T18:07:50Z (GMT). No. of bitstreams: 1 ntu-96-D91725001-1.pdf: 611853 bytes, checksum: db9593f2137f5405090a832e0152a2d7 (MD5) Previous issue date: 2007	en
dc.description.tableofcontents	Table of Contents i List of Figures iii List of Tables v Chapter 1 Introduction 1 1.1 Motivations 4 1.2 Contributions 6 1.3 Dissertation Layout 7 Chapter 2 Background and Literature Survey 8 2.1 Association Rule Mining 8 2.2 Inter-transaction Association Rule Mining 10 2.3 Closed Pattern Mining 11 Chapter 3 Preliminaries and Problem Definitions 14 Chapter 4 Mining Frequent Inter-transaction Patterns 18 4.1 The ID-pair and ITP-tree 18 4.2 Main Concept 20 4.3 The ITP-Miner Algorithm 24 4.4 Memory Management of the ITP-Miner 27 4.5 Complexity Analysis of the ITP-Miner 28 4.6 Correctness of the ITP-Miner Algorithm 29 4.7 An Example 30 Chapter 5 Mining Closed Frequent Inter-transaction Patterns 34 5.1. CITP-tree and Joinable Class 34 5.2. Join Operation 35 5.3. Pruning Strategies 36 5.4. The CITP-Miner Algorithm 38 5.5. Memory Management of the CITP-Miner 41 5.6 Complexity Analysis of the CITP-Miner 42 5.7 Correctness of the CTIP-Miner Algorithm 43 5.8. An Example 44 Chapter 6 Performance Evaluation 47 6.1 Complexity Analysis of the Comparing Algorithms 47 6.2 Generation of Synthetic Datasets 49 6.3 Descriptions of Real Datasets 50 6.4 Descriptions of “Worst case” Datasets 51 6.5 Performance Analyses of the ITP-Miner and CITP-Miner Algorithms 51 6.5.1 Experiments on the synthetic data 53 6.5.2 Experiments on the real data 60 6.5.3 Experiments on the “worst case” data 63 6.5.4 Experiments on the disk-based algorithms 64 Chapter 7 Conclusions and Future Work 67 References 72
dc.language.iso	en
dc.subject	資料探勘	zh_TW
dc.subject	關聯規則	zh_TW
dc.subject	封閉項目集	zh_TW
dc.subject	跨交易關聯規則	zh_TW
dc.subject	Data mining	en
dc.subject	Association rules	en
dc.subject	Inter-transaction association rules	en
dc.subject	Closed itemset	en
dc.title	有效率的跨交易關聯規則探勘演算法	zh_TW
dc.title	Efficient Mining Algorithms for Inter-transaction Association Rules	en
dc.type	Thesis
dc.date.schoolyear	96-1
dc.description.degree	博士
dc.contributor.oralexamcommittee	陳炳宇,歐陽彥正,陳彥良,劉敦仁
dc.subject.keyword	關聯規則,封閉項目集,資料探勘,跨交易關聯規則,	zh_TW
dc.subject.keyword	Association rules,Closed itemset,Data mining,Inter-transaction association rules,	en
dc.relation.page	79
dc.rights.note	有償授權
dc.date.accepted	2007-12-20
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-96-1.pdf 未授權公開取用	597.51 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。