Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor李秀惠(Hsiu-hui Lee)
dc.contributor.authorChien-Ming Huangen
dc.contributor.author黃健銘zh_TW
dc.date.accessioned2021-06-13T16:27:52Z-
dc.date.available2005-07-20
dc.date.copyright2005-07-20
dc.date.issued2005
dc.date.submitted2005-07-14
dc.identifier.citation[AAP 2000] R. Agrawal, Charu Aggarwal, and V.V.V. Prasad. Depth First Generation of Long Patterns. In Intl. Conference on Knowledge Discovery and Data Mining, August 2000.
[AIS 1993] R. Agrawal, T. Imielinski, and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, ACM SIGMOD Conf. Management of Data, May 1993.
[AS 1994] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules,” Proc. 20th Very Large Data Base Conf., Sept. 1994.
[Bayardo 1998] R. J. Bayardo: Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June1998.
[BCG 2001] D.Burdick, M.Calimlim, J.Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In 17th Intl. Conf. on Data Engineering, April 2001.
[BMU 1997] S. Brin, R. Motwani, J.D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, 1997
[CG 2002] T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In 6th European Conf. on Principles of Data Mining and Knowledge Discovery, Spring 2002.
[DS 1999] B. Dunkel and N. Soparkar. Data organization and access for e_cient data mining. In 15th IEEE Intl. Conf. on Data Engineering, March 1999.
[Fredkin 1960] E. Fredkin. Trie memory. In Communications of the ACM. Sept 1960
[Goethals 2004] B. Goethals. Memory issues in frequent itemset mining. In ACM Symposium on Applied Computing, 2004.
[HK 2001] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA, 2001.
[HPY 2000] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.
[LK 1998] D-I. Lin and Z. M. Kedem. Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set. In 6th Intl. Conf. Extending Database Technology, March 1998.
[LPWH 2002] J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by Opportunistic Projection. In ACM SIGKDD, July 2002.
[OPPS 2002] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. Adaptive and resource-aware mining of frequent sets. In IEEE intl. Conf. on Data Mining, 2002.
[PBTL 1999] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In 7th Intl. Conf. on Database Theory, January 1999.
[PCY 1995] J.S. Park, M.-S. Chen, and P.S. Yu. An effective hash based algorithm for mining association rules. In ACM SIGMOD Conf. Management of Data, 1995.
[PHLN 2001] J. Pei, J. Hart, H. Lu, S. Nishio, S. Tang, and D. Yang, H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases, in Intl. Conf. on Data Mining, San Jose, CA, Nov. 2001.
[PHM 2000] J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.
[Shenoy 2000] P. Shenoy, et al. Turbo-charging vertical mining of large databases. In Intl. Conf. Management of Data, May 2000.
[Zaki 2000a] M. J. Zaki. Generating non-redundant association rules. In Intl. Conf. Knowledge Discovery and Data Mining, August 2000.
[Zaki 2000b] M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, May-June 2000
[Zaki 2002] M. J. Zaki and C.-J. Hsiao. ChARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Intl. Conf. on Data Mining, April 2002.
[ZG 2003] M. J. Zaki and K. Gouda. Fast Vertical Mining Using Diffsets. In ACM SIGKDD, August 24-27, 2003.
[DS 1] http://www.ics.uci.edu/~mlearn/MLRepository.html
[DS 2] http://www.almaden.ibm.com/software/quest/Resources/
index.shtml
[SW 1] http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
[SW 2] http://fuzzy.cs.uni-magdeburg.de/~borgelt/fpgrowth.html
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/38201-
dc.description.abstract頻繁項目集的探勘,也就是從大型資料庫中找出頻繁項目集。這是許多其他問題的根本和基礎,像是關連規則、循序規則、分類和許多其他的課題。
在過去十年來,這個問題已經有了很大的進展。許多的演算法或改進現有演算法都不斷的被提出。然而,當我們降低最低支持度或是當我們遇到的資料庫是高度關連的時候,頻繁項目集的數目可能會極大。因此,如何應付密集資料庫仍然是一各具挑戰性的課題。
在這篇論文裡,我們提出cdeNDI這一種新演算法。這是以Eclat這個演算法為基礎,將closed itemsets和non-derivable itemsets的觀念結合。在我們演算法一開始,就像是Eclat這個演算法額外加上deduction rules的計算。當我們發現derivable itemsets後,我們便改用deduction rules來計算這個項目集的超集合的支持度。這使的我們在長頻繁項目集的尋找上會更為快速。而為了讓deduction rules能夠在depth-first搜尋中運作,我們將搜尋的順序改為逆的,並且採用靜態遞減排序法。為了要提供快速的頻繁項目集支持度搜尋,我們使用closed trie來儲存已發現的頻繁項目集的支持度。Closed trie不但提供高效率的搜尋,而且也壓縮了儲存的空間。這麼一來,我們可以在密集、高度關連的資料中有好表現。而以現實生活和綜合性資料庫來做實驗的結果,也顯示我們在現實生活資料庫中比其他演算法有更好的效果。
zh_TW
dc.description.abstractFrequent itemsets mining, which finds frequent itemsets from a large database, is a fundamental and essential for many problems, such association rules, sequential patterns, classification, and many more.
In the past decade, tremendous progress has been made in this topic. New algorithms and enhancements on existing algorithms are continuously proposed. However, once we lower down the minimum support or when the dataset that we need to handle is highly correlated, the number of frequent itemsets can be extremenly huge. Therefore, how to cope with dense database is still a challenging topic.

In our thesis, we propose a new algorithm, called cdeNDI, that based on Eclat combine closed itemsets and non-derivable itemsets. In the beginning of our algorithm, the process runs as Eclat with computation of deduction rules. When we find derivable itemsets, we use deduction rules to count the support of following supersets instead. This makes the generation of long frequent itemsets faster. In order to make deduction rules work in depth-first search, we adapt the traversal to reverse order and branches ordering to static descending ordering. To provide quick lookup for supports of frequent itemsets, we use closed trie to store the supports of itemsets that have been found. Closed trie not only shows efficient lookup speed, but also makes the storage space compacted. In this way, we can achieve good performance in dense, highly correlated datasets. The experiments on a number of real-world and synthetic databases show that cdeNDI outperform other algorithms in real-world databases.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T16:27:52Z (GMT). No. of bitstreams: 1
ntu-94-R92922099-1.pdf: 702702 bytes, checksum: 74878ea2548a8b9c2abed0e335cfda9d (MD5)
Previous issue date: 2005
en
dc.description.tableofcontents中文摘要 I
Abstract II
Chapter 1. Introduction 1
1.1 Motivation 1
1.2 Related Works 2
1.3 Research Objects 4
1.4 Organization of this thesis 5
Chapter 2. Background 6
2.1 Problem statement 6
2.2 Apriori 9
2.3 Eclat 12
2.3.1. Equivalence classes 12
2.3.2. Vertical Format 15
2.3 Closed Frequent Itemsets 18
2.4 Non-Derivable Itemsets 21
2.4.1. Deduction Rules 21
2.4.2. Non-Derivable Itemsets 23
Chapter 3. Algorithm Design & Implementation 26
3.1 Reverse-traversal 27
3.2 Trie 29
3.3 Static Ordering 32
3.4 Pseudo-Code Description 32
Chapter 4. Experimental Results 37
Chapter 5. Conclusions and Future Works 41
5.1 Conclusions 41
5.2 Future Works 41
References 42
dc.language.isoen
dc.subject資料探勘zh_TW
dc.subject頻繁項目集zh_TW
dc.subjectfrequent itemsetsen
dc.subjectdata miningen
dc.titlecdeNDI:一個有效率的頻繁項目集探勘演算法zh_TW
dc.titlecdeNDI:a Efficient Algorithm for Mining Frequent Itemsetsen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林一鵬(I-Peng Lin),廖純中(Churn-Jung Liau)
dc.subject.keyword頻繁項目集,資料探勘,zh_TW
dc.subject.keywordfrequent itemsets,data mining,en
dc.relation.page44
dc.rights.note有償授權
dc.date.accepted2005-07-14
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
686.23 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved