Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61321
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳健輝(Gen-Huey Chen)
dc.contributor.authorTing-Yao Changen
dc.contributor.author張庭耀zh_TW
dc.date.accessioned2021-06-16T13:01:02Z-
dc.date.available2014-08-17
dc.date.copyright2013-08-17
dc.date.issued2013
dc.date.submitted2013-08-07
dc.identifier.citation[1] R. Agrawal, T. Imieli, and A. Swami, 'Mining association rules between sets of items in large databases,' Proc. the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., USA, 1993.
[2] R. Agrawal and R. Srikant, 'Fast algorithms for mining association rules,' Proc. the 20th International Conference on Very Large Data Bases, VLDB, pp. 487- 499, 1994.
[3] Y. Liu, W.-K. Liao, and A. Choudhary, 'A fast high utility itemsets mining algorithm,' Proc. the 1st International Workshop on Utility-based Data Mining, Chicago, Illinois, 2005.
[4] C. W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu, 'Mining top-k high utility itemsets,' Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012.
[5] M. Liu and J. Qu, 'Mining high utility itemsets without candidate generation,' Proc. the 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, 2012.
[6] J. Han, J. Pei, Y. Yin, and R. Mao, 'Mining frequent patterns without candidate generation: a frequent-pattern tree approach,' Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004.
[7] L. Shen, H. Shen, P. Prithard, and R. Topor, 'Finding the N largest itemsets,' Proc. the International Conference on Data Mining, pp. 211-222, 1998.
[8] A. W.-C. Fu, R. W.-W. Kwong, and J. Tang, 'Mining N-most interesting itemsets,' Proc. the 12th International Symposium on Foundations of Intelligent Systems, 2000.
[9] H. Jiawei, W. Jianyong, L. Ying, and P. Tzvetkov, 'Mining top-k frequent closed patterns without minimum support,' Proc. the IEEE International Conference on Data Mining, pp. 211-218, 2002.
[10] Y. Hirate, E. Iwahashi, and H. Yamana, 'TF2P-growth: an efficient algorithm for mining frequent patterns without any thresholds,' Proc. the IEEE ICDM Workshop on Alternative Techniques for Data Mining and Knowledge Discovery, 2004.
[11] Y.-L. Cheung and A. W.-C. Fu, 'Mining frequent itemsets without support threshold: with and without item constraints,' IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 1052-1069, 2004.
[12] W. Jianyong, H. Jiawei, L. Ying, and P. Tzvetkov, 'TFP: an efficient algorithm for mining top-k frequent closed itemsets,' IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 652-663, 2005.
[13] S.-C. Ngan, T. Lam, R. C.-W. Wong, and A. W.-C. Fu, 'Mining N-most interesting itemsets without support threshold by the COFI-tree,' International Journal of Business Intelligence and Data Mining, vol. 1, pp. 88-106, 2005.
[14] T. Quang, S. Oyanagi, and K. Yamazaki, 'ExMiner: an efficient algorithm for mining top-k frequent patterns,' Advanced Data Mining and Applications, vol. 4093, pp. 436-447, 2006.
[15] K.-T. Chuang, J.-L. Huang, and M.-S. Chen, 'Mining top-k frequent patterns in the presence of the memory constraint,' The VLDB Journal, vol. 17, pp. 1321- 1344, 2008.
[16] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, 'Efficient tree structures for high utility pattern mining in incremental databases,' IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 1708-1721, 2009.
[17] V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, 'UP-Growth: an efficient algorithm for high utility itemset mining,' Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010.
[18] V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu, 'Efficient algorithms for mining high utility itemsets from transactional databases,' IEEE Transactions on Knowledge and Data Engineering, vol. PP, pp. 1-14, 2012.
[19] R. Rymon, 'Search through systematic set enumeration,' Proc. the International Conference Principles of Knowledge Representation and Reasoning, pp. 539- 550, 1992.
[20] J. Pisharath, Y. Liu, B. Ozisikyilmaz, R. Narayanan, W. K. Liao, A. Choudhary and G. Memik, NU-MineBench version 2.0 dataset and technical report, http:// cucis.ece.northwestern.edu/projects/DMS/MineBench.html.
[21] Frequent itemset mining implementations repository, http://fimi.ua.ac.be/.
[22] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, 'Discovering frequent closed itemsets for association rules,' Database Theory - ICDT’99, Springer, pp. 398- 416, 1999.
[23] M. J. Zaki and C.-J. Hsiao, 'CHARM: an efficient algorithm for closed itemset mining,' Proc. the 2nd SIAM International Conference on Data Mining, pp. 457- 473, 2002.
[24] C. Borgelt, X. Yang, R. Nogales-Cadenas, P. Carmona-Saez, and A. Pascual- Montano, 'Finding closed frequent item sets by intersecting transactions,' Proc. the 14th International Conference on Extending Database Technology, Uppsala, Sweden, 2011.
[25] J. Wang, J. Han, and J. Pei, 'CLOSET+: searching for the best strategies for mining frequent closed itemsets,' Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., 2003.
[26] J. Pei, J. Han, and R. Mao, 'CLOSET: an efficient algorithm for mining frequent closed itemsets,' Proc. the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21-30, 2000.
[27] D. Burdick, M. Calimlim, and J. Gehrke, 'MAFIA: a maximal frequent itemset algorithm for transactional databases,' Proc. the 17th International Conference on Data Engineering, pp. 443-452, 2001.
[28] R. Agrawal and R. Srikant, 'Mining sequential patterns,' Proc. the 11th International Conference on Data Engineering, pp. 3-14, 1995.
[29] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, 'FreeSpan: frequent pattern-projected sequential pattern mining,' Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, 2000.
[30] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 'Mining sequential patterns by pattern-growth: the prefixspan approach,' IEEE Transactions on Knowledge and Data Engineering , vol. 16, pp. 1424-1440, 2004.
[31] J. Wang, J. Han, and C. Li, 'Frequent closed sequence mining without candidate maintenance,' IEEE Transactions on Knowledge and Data Engineering, vol. 19, pp. 1042-1056, 2007.
[32] C.-H. Lee, C.-R. Lin, and M.-S. Chen, 'Sliding-window filtering: an efficient algorithm for incremental mining,' Proc. the 10th International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, 2001.
[33] C.-Y. Chang, M.-S. Chen, and C.-H. Lee, 'Mining general temporal association rules for items with different exhibition periods,' Proc. the IEEE International Conference on Data Mining, pp. 59-66, 2002.
[34] C.-H. Lee, M.-S. Chen, and C.-R. Lin, 'Progressive partition miner: an efficient algorithm for mining general temporal association rules,' IEEE Transactions on Knowledge and Data Engineering, vol. 15, pp. 1004-1017, 2003.
[35] J.-W. Huang, B.-R. Dai, and M.-S. Chen, 'Twain: two-end association miner with precise frequent exhibition periods,' ACM Transactions on Knowledge Discovery from Data, vol. 1, 2007.
[36] X. Liu, J. Guan, and P. Hu, 'Mining frequent closed itemsets from a landmark window over online data streams,' Computers & Mathematics with Applications, vol. 57, pp. 927-936, 2009.
[37] H.-F. Li and S.-Y. Lee, 'Mining frequent itemsets over data streams using efficient window sliding techniques,' Expert Systems with Applications, vol. 36, pp. 1466-1477, 2009.
[38] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, 'Sliding window-based frequent pattern mining over data streams,' Information sciences, vol. 179, pp. 3843-3865, 2009.
[39] T. F. Gharib, H. Nassar, M. Taha, and A. Abraham, 'An efficient algorithm for incremental mining of temporal association rules,' Data & Knowledge Engineering, vol. 69, pp. 800-815, 2010.
[40] G. S. Manku and R. Motwani, 'Approximate frequency counts over data streams,' Proc. the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002.
[41] H.-F. Li, C.-C. Ho, and S.-Y. Lee, 'Incremental updates of closed frequent itemsets over continuous data streams,' Expert Systems with Applications, vol. 36, pp. 2451-2458, 2009.
[42] B.-E. Shie, V. S. Tseng, and P. S. Yu, 'Online mining of temporal maximal utility itemsets from data streams,' Proc. the ACM Symposium on Applied Computing, Sierre, Switzerland, 2010.
[43] H. Yao, H. J. Hamilton, and C. J. Butz, “A foundational approach to mining itemset utilities from databases,” Proc. the 4th SIAM International Conference on Data Mining, Florida, USA, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61321-
dc.description.abstract在資料探勘中,探勘前k高效益物品集是一項新興的議題,其中k代表的是使用者想要的高效益物品集的個數。在探勘前k高效益物品集這一議題之前,已有探勘高效益物品集的議題,而探勘高效益物品集則是發掘那些效益值高於使用者定義的效益門檻值的物品集。但對於使用者來說,設定一個適當的效益門檻值是一個困難的問題,如果門檻值設定得太低,太多的高效益物品集會被產生,而這可能導致探勘的演算法變得效率低下,或更甚至耗盡記憶體。另一方面,如果門檻值設定得太高,則可能找不到高效益物品集。
因此,探勘前k高效益物品集更適合那些想要找出特定個數的高效益物品集的使用者。為了探勘前k高效益物品集,現有的演算法首先會先高估物品集的效益值去產生候選物品集,緊接著再計算候選物品集的真正效益值去獲得真正的前k高效益物品集。但現有的演算法會有產生太多候選物品集的問題,而大部分的候選物品集在經過計算真正的效益值之後,會發現它們並非為前k高效益物品集。
在本篇論文中,我們提出了一個能快速探勘前k高效益物品集的演算法,其名稱為TKU+。通過避免產生候選物品集的方式,TKU+可以有效的探勘前k高效益物品集。在不同的資料集上,我們將TKU+和目前最先進的演算法相比,實驗結果顯示TKU+不論在執行時間及記憶體消耗上,均優於目前最先進的演算法。
zh_TW
dc.description.abstractMining top-k high utility itemsets is an emerging topic in data mining, where k is the desired number of high utility itemsets to be mined. Although several studies have been carried out on mining high utility itemsets, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. But setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found.
Therefore, mining top¬-k high utility itemsets is more suitable for users who want to find a certain number of high utility itemsets. To identify top-k high utility itemsets, the existing algorithms first generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms incur the problem that a very large number of candidates are generated, but most of the candidates are found out to be not top-k high utility itemsets after their exact utilities are computed.
In this thesis, we propose a fast algorithm, named TKU+, for mining top-k high utility itemsets. By avoiding the costly generation and utility computation of numerous candidate itemsets, TKU+ can efficiently mine top-k high utility itemsets. We compared TKU+ with the state-of-the-art algorithms on various datasets, and experimental results show that TKU+ outperforms these algorithms in terms of both running time and memory consumption.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T13:01:02Z (GMT). No. of bitstreams: 1
ntu-102-R00922102-1.pdf: 1576091 bytes, checksum: 27d4098042cbbefc98589f11df1e9fdc (MD5)
Previous issue date: 2013
en
dc.description.tableofcontents誌謝 i
中文摘要 ii
ABSTRACT iii
CONTENTS iv
LIST OF FIGURES vi
LIST OF TABLES viii
Chapter 1 Introduction 1
Chapter 2 Background 6
2.1 Definitions and notations 6
2.2 Related work 9
2.2.1 Frequent itemset mining 9
2.2.2 Top-k frequent itemset mining 9
2.2.3 High utility itemset mining 10
2.2.4 Top-k high utility itemset mining 12
2.3 Research issues 12
Chapter 3 Utility-List 14
3.1 Utility-lists of 1-itemsets 14
3.2 Utility-lists of 2-itemsets 16
3.3 Utility-lists of k-itemsets (k≥3) 19
Chapter 4 A Fast Mining Algorithm 22
4.1 Preprocessing 22
4.2 Description of algorithm 24
4.2.1 Search space 24
4.2.2 A pruning method used in HUI-Miner 25
4.2.3 A new method for raising threshold during mining process 27
4.2.4 Implementation 28
Chapter 5 Experiments 30
5.1 Environments 30
5.2 Experimental results 31
5.2.1 Experimental results for different datasets 31
5.2.1.1 Sparse datasets 32
5.2.1.2 Dense datasets 32
5.2.2 More experimental results 33
5.3 Discussion 41
Chapter 6 Conclusion 46
References 48
Appendix A 54
Appendix B 64
dc.language.isoen
dc.subject資料探勘zh_TW
dc.subject前k物品集探勘zh_TW
dc.subject高效益物品集zh_TW
dc.subject效益探勘zh_TW
dc.subjectdata miningen
dc.subjectutility miningen
dc.subjecthigh utility itemseten
dc.subjecttop-k itemset miningen
dc.title一個能快速探勘前k高效益物品集的演算法zh_TW
dc.titleA Fast Algorithm for Mining Top-k High Utility Itemsetsen
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林宣華(Shian-Hua Lin),周信宏(Hsin-Hung Chou)
dc.subject.keyword資料探勘,效益探勘,高效益物品集,前k物品集探勘,zh_TW
dc.subject.keyworddata mining,utility mining,high utility itemset,top-k itemset mining,en
dc.relation.page66
dc.rights.note有償授權
dc.date.accepted2013-08-08
dc.contributor.author-college電機資訊學院zh_TW
dc.contributor.author-dept資訊工程學研究所zh_TW
顯示於系所單位:資訊工程學系

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf
  未授權公開取用
1.54 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved