一個能快速探勘前k高效益物品集的演算法

Ting-Yao Chang; 張庭耀

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61321

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳健輝(Gen-Huey Chen)
dc.contributor.author	Ting-Yao Chang	en
dc.contributor.author	張庭耀	zh_TW
dc.date.accessioned	2021-06-16T13:01:02Z	-
dc.date.available	2014-08-17
dc.date.copyright	2013-08-17
dc.date.issued	2013
dc.date.submitted	2013-08-07
dc.identifier.citation	[1] R. Agrawal, T. Imieli, and A. Swami, 'Mining association rules between sets of items in large databases,' Proc. the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., USA, 1993. [2] R. Agrawal and R. Srikant, 'Fast algorithms for mining association rules,' Proc. the 20th International Conference on Very Large Data Bases, VLDB, pp. 487- 499, 1994. [3] Y. Liu, W.-K. Liao, and A. Choudhary, 'A fast high utility itemsets mining algorithm,' Proc. the 1st International Workshop on Utility-based Data Mining, Chicago, Illinois, 2005. [4] C. W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu, 'Mining top-k high utility itemsets,' Proc. the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 2012. [5] M. Liu and J. Qu, 'Mining high utility itemsets without candidate generation,' Proc. the 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, USA, 2012. [6] J. Han, J. Pei, Y. Yin, and R. Mao, 'Mining frequent patterns without candidate generation: a frequent-pattern tree approach,' Data Mining and Knowledge Discovery, vol. 8, pp. 53-87, 2004. [7] L. Shen, H. Shen, P. Prithard, and R. Topor, 'Finding the N largest itemsets,' Proc. the International Conference on Data Mining, pp. 211-222, 1998. [8] A. W.-C. Fu, R. W.-W. Kwong, and J. Tang, 'Mining N-most interesting itemsets,' Proc. the 12th International Symposium on Foundations of Intelligent Systems, 2000. [9] H. Jiawei, W. Jianyong, L. Ying, and P. Tzvetkov, 'Mining top-k frequent closed patterns without minimum support,' Proc. the IEEE International Conference on Data Mining, pp. 211-218, 2002. [10] Y. Hirate, E. Iwahashi, and H. Yamana, 'TF2P-growth: an efficient algorithm for mining frequent patterns without any thresholds,' Proc. the IEEE ICDM Workshop on Alternative Techniques for Data Mining and Knowledge Discovery, 2004. [11] Y.-L. Cheung and A. W.-C. Fu, 'Mining frequent itemsets without support threshold: with and without item constraints,' IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 1052-1069, 2004. [12] W. Jianyong, H. Jiawei, L. Ying, and P. Tzvetkov, 'TFP: an efficient algorithm for mining top-k frequent closed itemsets,' IEEE Transactions on Knowledge and Data Engineering, vol. 17, pp. 652-663, 2005. [13] S.-C. Ngan, T. Lam, R. C.-W. Wong, and A. W.-C. Fu, 'Mining N-most interesting itemsets without support threshold by the COFI-tree,' International Journal of Business Intelligence and Data Mining, vol. 1, pp. 88-106, 2005. [14] T. Quang, S. Oyanagi, and K. Yamazaki, 'ExMiner: an efficient algorithm for mining top-k frequent patterns,' Advanced Data Mining and Applications, vol. 4093, pp. 436-447, 2006. [15] K.-T. Chuang, J.-L. Huang, and M.-S. Chen, 'Mining top-k frequent patterns in the presence of the memory constraint,' The VLDB Journal, vol. 17, pp. 1321- 1344, 2008. [16] C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, 'Efficient tree structures for high utility pattern mining in incremental databases,' IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 1708-1721, 2009. [17] V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, 'UP-Growth: an efficient algorithm for high utility itemset mining,' Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 2010. [18] V. S. Tseng, B.-E. Shie, C.-W. Wu, and P. S. Yu, 'Efficient algorithms for mining high utility itemsets from transactional databases,' IEEE Transactions on Knowledge and Data Engineering, vol. PP, pp. 1-14, 2012. [19] R. Rymon, 'Search through systematic set enumeration,' Proc. the International Conference Principles of Knowledge Representation and Reasoning, pp. 539- 550, 1992. [20] J. Pisharath, Y. Liu, B. Ozisikyilmaz, R. Narayanan, W. K. Liao, A. Choudhary and G. Memik, NU-MineBench version 2.0 dataset and technical report, http:// cucis.ece.northwestern.edu/projects/DMS/MineBench.html. [21] Frequent itemset mining implementations repository, http://fimi.ua.ac.be/. [22] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, 'Discovering frequent closed itemsets for association rules,' Database Theory - ICDT’99, Springer, pp. 398- 416, 1999. [23] M. J. Zaki and C.-J. Hsiao, 'CHARM: an efficient algorithm for closed itemset mining,' Proc. the 2nd SIAM International Conference on Data Mining, pp. 457- 473, 2002. [24] C. Borgelt, X. Yang, R. Nogales-Cadenas, P. Carmona-Saez, and A. Pascual- Montano, 'Finding closed frequent item sets by intersecting transactions,' Proc. the 14th International Conference on Extending Database Technology, Uppsala, Sweden, 2011. [25] J. Wang, J. Han, and J. Pei, 'CLOSET+: searching for the best strategies for mining frequent closed itemsets,' Proc. the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., 2003. [26] J. Pei, J. Han, and R. Mao, 'CLOSET: an efficient algorithm for mining frequent closed itemsets,' Proc. the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21-30, 2000. [27] D. Burdick, M. Calimlim, and J. Gehrke, 'MAFIA: a maximal frequent itemset algorithm for transactional databases,' Proc. the 17th International Conference on Data Engineering, pp. 443-452, 2001. [28] R. Agrawal and R. Srikant, 'Mining sequential patterns,' Proc. the 11th International Conference on Data Engineering, pp. 3-14, 1995. [29] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, 'FreeSpan: frequent pattern-projected sequential pattern mining,' Proc. the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, 2000. [30] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 'Mining sequential patterns by pattern-growth: the prefixspan approach,' IEEE Transactions on Knowledge and Data Engineering , vol. 16, pp. 1424-1440, 2004. [31] J. Wang, J. Han, and C. Li, 'Frequent closed sequence mining without candidate maintenance,' IEEE Transactions on Knowledge and Data Engineering, vol. 19, pp. 1042-1056, 2007. [32] C.-H. Lee, C.-R. Lin, and M.-S. Chen, 'Sliding-window filtering: an efficient algorithm for incremental mining,' Proc. the 10th International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, 2001. [33] C.-Y. Chang, M.-S. Chen, and C.-H. Lee, 'Mining general temporal association rules for items with different exhibition periods,' Proc. the IEEE International Conference on Data Mining, pp. 59-66, 2002. [34] C.-H. Lee, M.-S. Chen, and C.-R. Lin, 'Progressive partition miner: an efficient algorithm for mining general temporal association rules,' IEEE Transactions on Knowledge and Data Engineering, vol. 15, pp. 1004-1017, 2003. [35] J.-W. Huang, B.-R. Dai, and M.-S. Chen, 'Twain: two-end association miner with precise frequent exhibition periods,' ACM Transactions on Knowledge Discovery from Data, vol. 1, 2007. [36] X. Liu, J. Guan, and P. Hu, 'Mining frequent closed itemsets from a landmark window over online data streams,' Computers & Mathematics with Applications, vol. 57, pp. 927-936, 2009. [37] H.-F. Li and S.-Y. Lee, 'Mining frequent itemsets over data streams using efficient window sliding techniques,' Expert Systems with Applications, vol. 36, pp. 1466-1477, 2009. [38] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, 'Sliding window-based frequent pattern mining over data streams,' Information sciences, vol. 179, pp. 3843-3865, 2009. [39] T. F. Gharib, H. Nassar, M. Taha, and A. Abraham, 'An efficient algorithm for incremental mining of temporal association rules,' Data & Knowledge Engineering, vol. 69, pp. 800-815, 2010. [40] G. S. Manku and R. Motwani, 'Approximate frequency counts over data streams,' Proc. the 28th International Conference on Very Large Data Bases, Hong Kong, China, 2002. [41] H.-F. Li, C.-C. Ho, and S.-Y. Lee, 'Incremental updates of closed frequent itemsets over continuous data streams,' Expert Systems with Applications, vol. 36, pp. 2451-2458, 2009. [42] B.-E. Shie, V. S. Tseng, and P. S. Yu, 'Online mining of temporal maximal utility itemsets from data streams,' Proc. the ACM Symposium on Applied Computing, Sierre, Switzerland, 2010. [43] H. Yao, H. J. Hamilton, and C. J. Butz, “A foundational approach to mining itemset utilities from databases,” Proc. the 4th SIAM International Conference on Data Mining, Florida, USA, 2004.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61321	-
dc.description.abstract	在資料探勘中，探勘前k高效益物品集是一項新興的議題，其中k代表的是使用者想要的高效益物品集的個數。在探勘前k高效益物品集這一議題之前，已有探勘高效益物品集的議題，而探勘高效益物品集則是發掘那些效益值高於使用者定義的效益門檻值的物品集。但對於使用者來說，設定一個適當的效益門檻值是一個困難的問題，如果門檻值設定得太低，太多的高效益物品集會被產生，而這可能導致探勘的演算法變得效率低下，或更甚至耗盡記憶體。另一方面，如果門檻值設定得太高，則可能找不到高效益物品集。因此，探勘前k高效益物品集更適合那些想要找出特定個數的高效益物品集的使用者。為了探勘前k高效益物品集，現有的演算法首先會先高估物品集的效益值去產生候選物品集，緊接著再計算候選物品集的真正效益值去獲得真正的前k高效益物品集。但現有的演算法會有產生太多候選物品集的問題，而大部分的候選物品集在經過計算真正的效益值之後，會發現它們並非為前k高效益物品集。在本篇論文中，我們提出了一個能快速探勘前k高效益物品集的演算法，其名稱為TKU+。通過避免產生候選物品集的方式，TKU+可以有效的探勘前k高效益物品集。在不同的資料集上，我們將TKU+和目前最先進的演算法相比，實驗結果顯示TKU+不論在執行時間及記憶體消耗上，均優於目前最先進的演算法。	zh_TW
dc.description.abstract	Mining top-k high utility itemsets is an emerging topic in data mining, where k is the desired number of high utility itemsets to be mined. Although several studies have been carried out on mining high utility itemsets, which refers to the discovery of itemsets with utilities higher than a user-specified minimum utility threshold min_util. But setting an appropriate minimum utility threshold is a difficult problem for users. If min_util is set too low, too many high utility itemsets will be generated, which may cause the mining algorithms to become inefficient or even run out of memory. On the other hand, if min_util is set too high, no high utility itemset will be found. Therefore, mining top¬-k high utility itemsets is more suitable for users who want to find a certain number of high utility itemsets. To identify top-k high utility itemsets, the existing algorithms first generate candidate itemsets by overestimating their utilities, and subsequently compute the exact utilities of these candidates. These algorithms incur the problem that a very large number of candidates are generated, but most of the candidates are found out to be not top-k high utility itemsets after their exact utilities are computed. In this thesis, we propose a fast algorithm, named TKU+, for mining top-k high utility itemsets. By avoiding the costly generation and utility computation of numerous candidate itemsets, TKU+ can efficiently mine top-k high utility itemsets. We compared TKU+ with the state-of-the-art algorithms on various datasets, and experimental results show that TKU+ outperforms these algorithms in terms of both running time and memory consumption.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T13:01:02Z (GMT). No. of bitstreams: 1 ntu-102-R00922102-1.pdf: 1576091 bytes, checksum: 27d4098042cbbefc98589f11df1e9fdc (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	誌謝 i 中文摘要 ii ABSTRACT iii CONTENTS iv LIST OF FIGURES vi LIST OF TABLES viii Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 Definitions and notations 6 2.2 Related work 9 2.2.1 Frequent itemset mining 9 2.2.2 Top-k frequent itemset mining 9 2.2.3 High utility itemset mining 10 2.2.4 Top-k high utility itemset mining 12 2.3 Research issues 12 Chapter 3 Utility-List 14 3.1 Utility-lists of 1-itemsets 14 3.2 Utility-lists of 2-itemsets 16 3.3 Utility-lists of k-itemsets (k≥3) 19 Chapter 4 A Fast Mining Algorithm 22 4.1 Preprocessing 22 4.2 Description of algorithm 24 4.2.1 Search space 24 4.2.2 A pruning method used in HUI-Miner 25 4.2.3 A new method for raising threshold during mining process 27 4.2.4 Implementation 28 Chapter 5 Experiments 30 5.1 Environments 30 5.2 Experimental results 31 5.2.1 Experimental results for different datasets 31 5.2.1.1 Sparse datasets 32 5.2.1.2 Dense datasets 32 5.2.2 More experimental results 33 5.3 Discussion 41 Chapter 6 Conclusion 46 References 48 Appendix A 54 Appendix B 64
dc.language.iso	en
dc.subject	資料探勘	zh_TW
dc.subject	前k物品集探勘	zh_TW
dc.subject	高效益物品集	zh_TW
dc.subject	效益探勘	zh_TW
dc.subject	data mining	en
dc.subject	utility mining	en
dc.subject	high utility itemset	en
dc.subject	top-k itemset mining	en
dc.title	一個能快速探勘前k高效益物品集的演算法	zh_TW
dc.title	A Fast Algorithm for Mining Top-k High Utility Itemsets	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	林宣華(Shian-Hua Lin),周信宏(Hsin-Hung Chou)
dc.subject.keyword	資料探勘,效益探勘,高效益物品集,前k物品集探勘,	zh_TW
dc.subject.keyword	data mining,utility mining,high utility itemset,top-k itemset mining,	en
dc.relation.page	66
dc.rights.note	有償授權
dc.date.accepted	2013-08-08
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	1.54 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。