cdeNDI:一個有效率的頻繁項目集探勘演算法

Chien-Ming Huang; 黃健銘

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201

標題:	cdeNDI:一個有效率的頻繁項目集探勘演算法 cdeNDI:a Efficient Algorithm for Mining Frequent Itemsets
作者:	Chien-Ming Huang 黃健銘
指導教授:	李秀惠(Hsiu-hui Lee)
關鍵字:	頻繁項目集,資料探勘, frequent itemsets,data mining,
出版年 :	2005
學位:	碩士
摘要:	頻繁項目集的探勘，也就是從大型資料庫中找出頻繁項目集。這是許多其他問題的根本和基礎，像是關連規則、循序規則、分類和許多其他的課題。在過去十年來，這個問題已經有了很大的進展。許多的演算法或改進現有演算法都不斷的被提出。然而，當我們降低最低支持度或是當我們遇到的資料庫是高度關連的時候，頻繁項目集的數目可能會極大。因此，如何應付密集資料庫仍然是一各具挑戰性的課題。在這篇論文裡，我們提出cdeNDI這一種新演算法。這是以Eclat這個演算法為基礎，將closed itemsets和non-derivable itemsets的觀念結合。在我們演算法一開始，就像是Eclat這個演算法額外加上deduction rules的計算。當我們發現derivable itemsets後，我們便改用deduction rules來計算這個項目集的超集合的支持度。這使的我們在長頻繁項目集的尋找上會更為快速。而為了讓deduction rules能夠在depth-first搜尋中運作，我們將搜尋的順序改為逆的，並且採用靜態遞減排序法。為了要提供快速的頻繁項目集支持度搜尋，我們使用closed trie來儲存已發現的頻繁項目集的支持度。Closed trie不但提供高效率的搜尋，而且也壓縮了儲存的空間。這麼一來，我們可以在密集、高度關連的資料中有好表現。而以現實生活和綜合性資料庫來做實驗的結果，也顯示我們在現實生活資料庫中比其他演算法有更好的效果。 Frequent itemsets mining, which finds frequent itemsets from a large database, is a fundamental and essential for many problems, such association rules, sequential patterns, classification, and many more. In the past decade, tremendous progress has been made in this topic. New algorithms and enhancements on existing algorithms are continuously proposed. However, once we lower down the minimum support or when the dataset that we need to handle is highly correlated, the number of frequent itemsets can be extremenly huge. Therefore, how to cope with dense database is still a challenging topic. In our thesis, we propose a new algorithm, called cdeNDI, that based on Eclat combine closed itemsets and non-derivable itemsets. In the beginning of our algorithm, the process runs as Eclat with computation of deduction rules. When we find derivable itemsets, we use deduction rules to count the support of following supersets instead. This makes the generation of long frequent itemsets faster. In order to make deduction rules work in depth-first search, we adapt the traversal to reverse order and branches ordering to static descending ordering. To provide quick lookup for supports of frequent itemsets, we use closed trie to store the supports of itemsets that have been found. Closed trie not only shows efficient lookup speed, but also makes the storage space compacted. In this way, we can achieve good performance in dense, highly correlated datasets. The experiments on a number of real-world and synthetic databases show that cdeNDI outperform other algorithms in real-world databases.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201
全文授權:	有償授權
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-94-1.pdf 未授權公開取用	686.23 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。