cdeNDI:一個有效率的頻繁項目集探勘演算法

Chien-Ming Huang; 黃健銘

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李秀惠(Hsiu-hui Lee)
dc.contributor.author	Chien-Ming Huang	en
dc.contributor.author	黃健銘	zh_TW
dc.date.accessioned	2021-06-13T16:27:52Z	-
dc.date.available	2005-07-20
dc.date.copyright	2005-07-20
dc.date.issued	2005
dc.date.submitted	2005-07-14
dc.identifier.citation	[AAP 2000] R. Agrawal, Charu Aggarwal, and V.V.V. Prasad. Depth First Generation of Long Patterns. In Intl. Conference on Knowledge Discovery and Data Mining, August 2000. [AIS 1993] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, ACM SIGMOD Conf. Management of Data, May 1993. [AS 1994] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Data Base Conf., Sept. 1994. [Bayardo 1998] R. J. Bayardo: Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June1998. [BCG 2001] D.Burdick, M.Calimlim, J.Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In 17th Intl. Conf. on Data Engineering, April 2001. [BMU 1997] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, 1997 [CG 2002] T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In 6th European Conf. on Principles of Data Mining and Knowledge Discovery, Spring 2002. [DS 1999] B. Dunkel and N. Soparkar. Data organization and access for e_cient data mining. In 15th IEEE Intl. Conf. on Data Engineering, March 1999. [Fredkin 1960] E. Fredkin. Trie memory. In Communications of the ACM. Sept 1960 [Goethals 2004] B. Goethals. Memory issues in frequent itemset mining. In ACM Symposium on Applied Computing, 2004. [HK 2001] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA, 2001. [HPY 2000] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000. [LK 1998] D-I. Lin and Z. M. Kedem. Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In 6th Intl. Conf. Extending Database Technology, March 1998. [LPWH 2002] J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. In ACM SIGKDD, July 2002. [OPPS 2002] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and resource-aware mining of frequent sets. In IEEE intl. Conf. on Data Mining, 2002. [PBTL 1999] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999. [PCY 1995] J.S. Park, M.-S. Chen, and P.S. Yu. An effective hash based algorithm for mining association rules. In ACM SIGMOD Conf. Management of Data, 1995. [PHLN 2001] J. Pei, J. Hart, H. Lu, S. Nishio, S. Tang, and D. Yang, H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases, in Intl. Conf. on Data Mining, San Jose, CA, Nov. 2001. [PHM 2000] J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000. [Shenoy 2000] P. Shenoy, et al. Turbo-charging vertical mining of large databases. In Intl. Conf. Management of Data, May 2000. [Zaki 2000a] M. J. Zaki. Generating non-redundant association rules. In Intl. Conf. Knowledge Discovery and Data Mining, August 2000. [Zaki 2000b] M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, May-June 2000 [Zaki 2002] M. J. Zaki and C.-J. Hsiao. ChARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Intl. Conf. on Data Mining, April 2002. [ZG 2003] M. J. Zaki and K. Gouda. Fast Vertical Mining Using Diffsets. In ACM SIGKDD, August 24-27, 2003. [DS 1] http://www.ics.uci.edu/~mlearn/MLRepository.html [DS 2] http://www.almaden.ibm.com/software/quest/Resources/ index.shtml [SW 1] http://miles.cnuce.cnr.it/~palmeri/datam/DCI/ [SW 2] http://fuzzy.cs.uni-magdeburg.de/~borgelt/fpgrowth.html
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201	-
dc.description.abstract	頻繁項目集的探勘，也就是從大型資料庫中找出頻繁項目集。這是許多其他問題的根本和基礎，像是關連規則、循序規則、分類和許多其他的課題。在過去十年來，這個問題已經有了很大的進展。許多的演算法或改進現有演算法都不斷的被提出。然而，當我們降低最低支持度或是當我們遇到的資料庫是高度關連的時候，頻繁項目集的數目可能會極大。因此，如何應付密集資料庫仍然是一各具挑戰性的課題。在這篇論文裡，我們提出cdeNDI這一種新演算法。這是以Eclat這個演算法為基礎，將closed itemsets和non-derivable itemsets的觀念結合。在我們演算法一開始，就像是Eclat這個演算法額外加上deduction rules的計算。當我們發現derivable itemsets後，我們便改用deduction rules來計算這個項目集的超集合的支持度。這使的我們在長頻繁項目集的尋找上會更為快速。而為了讓deduction rules能夠在depth-first搜尋中運作，我們將搜尋的順序改為逆的，並且採用靜態遞減排序法。為了要提供快速的頻繁項目集支持度搜尋，我們使用closed trie來儲存已發現的頻繁項目集的支持度。Closed trie不但提供高效率的搜尋，而且也壓縮了儲存的空間。這麼一來，我們可以在密集、高度關連的資料中有好表現。而以現實生活和綜合性資料庫來做實驗的結果，也顯示我們在現實生活資料庫中比其他演算法有更好的效果。	zh_TW
dc.description.abstract	Frequent itemsets mining, which finds frequent itemsets from a large database, is a fundamental and essential for many problems, such association rules, sequential patterns, classification, and many more. In the past decade, tremendous progress has been made in this topic. New algorithms and enhancements on existing algorithms are continuously proposed. However, once we lower down the minimum support or when the dataset that we need to handle is highly correlated, the number of frequent itemsets can be extremenly huge. Therefore, how to cope with dense database is still a challenging topic. In our thesis, we propose a new algorithm, called cdeNDI, that based on Eclat combine closed itemsets and non-derivable itemsets. In the beginning of our algorithm, the process runs as Eclat with computation of deduction rules. When we find derivable itemsets, we use deduction rules to count the support of following supersets instead. This makes the generation of long frequent itemsets faster. In order to make deduction rules work in depth-first search, we adapt the traversal to reverse order and branches ordering to static descending ordering. To provide quick lookup for supports of frequent itemsets, we use closed trie to store the supports of itemsets that have been found. Closed trie not only shows efficient lookup speed, but also makes the storage space compacted. In this way, we can achieve good performance in dense, highly correlated datasets. The experiments on a number of real-world and synthetic databases show that cdeNDI outperform other algorithms in real-world databases.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T16:27:52Z (GMT). No. of bitstreams: 1 ntu-94-R92922099-1.pdf: 702702 bytes, checksum: 74878ea2548a8b9c2abed0e335cfda9d (MD5) Previous issue date: 2005	en
dc.description.tableofcontents	中文摘要 I Abstract II Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Related Works 2 1.3 Research Objects 4 1.4 Organization of this thesis 5 Chapter 2. Background 6 2.1 Problem statement 6 2.2 Apriori 9 2.3 Eclat 12 2.3.1. Equivalence classes 12 2.3.2. Vertical Format 15 2.3 Closed Frequent Itemsets 18 2.4 Non-Derivable Itemsets 21 2.4.1. Deduction Rules 21 2.4.2. Non-Derivable Itemsets 23 Chapter 3. Algorithm Design & Implementation 26 3.1 Reverse-traversal 27 3.2 Trie 29 3.3 Static Ordering 32 3.4 Pseudo-Code Description 32 Chapter 4. Experimental Results 37 Chapter 5. Conclusions and Future Works 41 5.1 Conclusions 41 5.2 Future Works 41 References 42
dc.language.iso	en
dc.subject	資料探勘	zh_TW
dc.subject	頻繁項目集	zh_TW
dc.subject	frequent itemsets	en
dc.subject	data mining	en
dc.title	cdeNDI:一個有效率的頻繁項目集探勘演算法	zh_TW
dc.title	cdeNDI:a Efficient Algorithm for Mining Frequent Itemsets	en
dc.type	Thesis
dc.date.schoolyear	93-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林一鵬(I-Peng Lin),廖純中(Churn-Jung Liau)
dc.subject.keyword	頻繁項目集,資料探勘,	zh_TW
dc.subject.keyword	frequent itemsets,data mining,	en
dc.relation.page	44
dc.rights.note	有償授權
dc.date.accepted	2005-07-14
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	686.23 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。