以增量式關聯分類方法分析海量資料

Hsin-Ting Chung; 鍾欣廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳靜枝
dc.contributor.author	Hsin-Ting Chung	en
dc.contributor.author	鍾欣廷	zh_TW
dc.date.accessioned	2021-06-16T13:15:35Z	-
dc.date.available	2014-08-06
dc.date.copyright	2013-08-06
dc.date.issued	2013
dc.date.submitted	2013-07-29
dc.identifier.citation	[1] Agrawal, R., T. Imieliński, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Washington, DC), 1993, pp. 207-216. [2] Agrawal, R., and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago de Chile, Chile), 1994, pp. 487-499. [3] All too much. The Economist, 2010. [4] Antonie, M.-L., and O. R. Za‥ıane. An Associative Classifier Based on Positive and Negative Rules. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (Paris, France), 2004, pp. 64-69. [5] Baralis, E., S. Chiusano, and P. Garza. On Support Thresholds in Associative Classiﬁcation. In Proceedings of the 2004 ACM Symposium on Applied Computing (Nicosia, Cyprus), 2004, pp. 553-558. [6] Baralis, E., and P. Garza. A Lazy Approach to Pruning Classification Rules. In Proceedings of 2002 IEEE International Conference on Data Mining (Maebashi City, Japan), 2002, pp. 35-42. [7] Boser, B. E., I. M. Guyon, and V. N. Vapnik. A Training Algorithm for Optimal Margin Classifiers. In Proceedings of 5th annual workshop on Computational learning theory (Pittsburgh, Pennsylvania, USA), 1992, pp. 144-152. [8] Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984. [9] Cheung, D. W.-L., J. Han, V. Ng, and C. Y. Wong. Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In Proceedings of the 12th International Conference on Data Engineering (New Orleans, Louisiana), 1996, pp. 106-114. [10] Craven, M. W., and J. W. Shavlik. Using Neural Networks for Data Mining. Future Generation Computer Systems, 13(2-3), 1997, pp. 211-229. [11] Crespoa, F., and R. Weber. A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets and Systems, 150(2), 2005, pp. 267-284. [12] Crone, S. F., S. Lessmann, and R. Stahlbock. The Impact of Preprocessing on Data Mining: An Evaluation of Classifier Sensitivity in Direct Marketing. European Journal of Operational Research, 173(3), 2006, pp. 781-800. [13] Data, data everywhere. The Economist, 2010. [14] Demirkana, H., and D. Delen. Leveraging the Capabilities of Service-oriented Decision Support Systems: Putting Analytics and Big Data in Cloud. Decision Support Systems, 2012. [15] Fayyad, U. M., and K. B. Irani. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of XIII International Joint Conference on Artificial Intelligence (Chambery), 1993, pp. 1022-1029. [16] Fidelis, M. V., H. S. Lopes, and A. A. Freitas. Discovering Comprehensible Classification Rules with a Genetic Algorithm. In Proceedings of 2000 Congress on Evolutionary Computation, 2000, pp. 805-810. [17] Gartner. Definition of Big Data. Available from: http://www.gartner.com/it-glossary/big-data/. [18] Gharib, T. F., H. Nassar, M. Taha, and A. Abraham. An Efficient Algorithm for Incremental Mining of Temporal Association Rules. Data & Knowledge Engineering, 69(8), 2010, pp. 800-815. [19] Han, J., M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2011. [20] Han, J., and J. Pei. Mining Frequent Patterns by Pattern-growth: Methodology and Implications. ACM SIGKDD Explorations Newsletter, 2(2), 2000, pp. 14-20. [21] Han, J., J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA), 2000, pp. 1-12. [22] Hopkins, B. Beyond the Hype of Big Data. CIO, 2011. [23] IBM. What is big data? Bringing Big Data to the Enterprise. Available from: http://www-01.ibm.com/software/data/bigdata/. [24] Jackson, J. The Big Promise of Big Data. CIO, 2012. [25] Jacobs, A. The Pathologies of Big Data. Communications of the ACM, 52(8), 2009, pp. 36-44. [26] Jong, K. A. D., W. M. Spears, and D. F. Gordon. Using Genetic Algorithms for Concept Learning. Machine Learning, 13(2-3), 1993, pp. 161-188. [27] Kaastraa, I., and M. Boyd. Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10(3), 1996, pp. 215-236. [28] KDD Cup 1999 Data Set from UCI Machine Learning Repository. Available from: http://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data. [29] LaValle, S., E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz. Big Data, Analytics And The Path From Insights To Value. MIT Sloan Management Review, 52(2), 2011, pp. 21-32. [30] Lee, C.-H., C.-R. Lin, and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. Information Systems, 30(3), 2005, pp. 227-244. [31] Li, W., J. Han, and J. Pei. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In Proceedings of 2001 IEEE International Conference on Data Mining (San Jose, California, USA), 2001, pp. 369-376. [32] Lippmann, R. P. An Introduction to Computing with Neural Nets. ASSP Magazine, IEEE, 4(2), 1987, pp. 4-22. [33] Liu, B., W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (New York City, New York, USA), 1998, pp. 80-86. [34] Liu, B., Y. Ma, and C. K. Wong. Improving an Association Rule Based Classifier. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (Lyon, France), 2000, pp. 504-509. [35] Lu, H., R. Setiono, and H. Liu. Effective Data Mining Using Neural Networks. IEEE Transactions on Knowledge and Data Engineering, 8(6), 1996, pp. 957-961. [36] Manyika, J., et al. Big data: The next frontier for innovation, competition, and productivity. M. G. Institute, 2011. [37] Meretakis, D., and B. Wuthrich. Extending Naive Bayes Classifiers Using Long itemsets. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, CA, USA), 1999, pp. 165-174. [38] Nash, K. S. How Big Data Can Reduce Big Risk. CIO, 2012. [39] Novikov, B., N. Vassilieva, and A. Yarygina. Querying big data. In Proceedings of the 13th International Conference on Computer Systems and Technologies (Ruse, Bulgaria), 2012, pp. 1-10. [40] Olavsrud, T. Big Data Causes Concern and Big Confusion. CIO, 2012. [41] Olavsrud, T. How to Be Ready for Big Data. CIO, 2012. [42] Quinlan, J. R. Induction of Decision Trees. Machine Learning, 1(1), 1986, pp. 81-106. [43] Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993. [44] Reid, C. Can 'Big Data' Fix Book Marketing? Publishers Weekly, 2012. [45] Schadt, E. E., M. D. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. Computational Solutions to Large-scale Data Management and Analysis. Nature Reviews Genetics, 11(9), 2010, pp. 647-657. [46] Shannon, C. E. A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 1948, pp. 379-423, 623-656. [47] Stackpole, B. 5 Things IT Should Do to Prepare for Big Data. Computerworld US, 2012. [48] Thabtah, and F. Abdeljaber. A Review of Associative Classification Mining. Knowledge Engineering Review, 22(1), 2007, pp. 37-65. [49] Thabtah, F., P. Cowling, and Y. o. Peng. MCAR: Multi-class Classification based on Association Rule. In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications (Cairo, Egypt), 2005. [50] Thabtah, F. A., P. Cowling, and Y. Peng. MMAC: A New Multi-class, Multi-label Associative Classification Approach. In Proceedings of the 4th IEEE International Conference on Data Mining (Brighton, UK), 2004, pp. 217-224. [51] Tsai, P. S. M., C.-C. Lee, and A. L. P. Chen. An Efficient Approach for Incremental Association Rule Mining. In Proceedings of the 3rd Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining (Beijing, China), 1999, pp. 74-83. [52] Utgoff, P. E., N. C. Berkman, and J. A. Clouse. Decision Tree Induction Based on Efficient Tree Restructuring. Machine Learning, 29(1), 1997, pp. 5-44. [53] Wang, K., Y. He, and D. W. Cheung. Mining Conﬁdent Rules Without Support Requirement. In Proceedings of the 10th International Conference on Information and Knowledge Management (Atlanta, Georgia), 2001, pp. 89-96. [54] Wang, K., S. Zhou, and Y. He. Growing Decision Trees on Support-less Association Rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, MA, USA), 2000, pp. 265-269. [55] Yin, X., and J. Han. CPAR: Classication based on Predictive Association Rules. In Proceedings of the SIAM International Conference on Data Mining (San Francisco, CA), 2003, pp. 369-376. [56] Zaki, M. J., S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (Newport Beach, California, USA), 1997, pp. 283-286. [57] Zhou, Z., and C. I. Ezeife. A Low-Scan Incremental Association Rule Maintenance Method Based on the Apriori Property. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence (Ottawa, Canada), 2001, pp. 26-35. [58] Zhu, F., and S. Guan. Ordered incremental training for GA-based classifiers. Pattern Recognition Letters, 26(14), 2005, pp. 2135-2151.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847	-
dc.description.abstract	由於資訊科技以及網路的發展，讓大量的資料得以從眾多的來源快速地蒐集及儲存，海量資料近年來成為一個火紅的議題。企業組織可以利用海量資料獲取競爭優勢，例如：組織能透過分析海量資料以改善決策的品質。然而，管理與分析這些龐大且快速更新的資料，對組織而言是一項艱鉅的挑戰。　　與資料分析息息相關的議題為資料探勘技術，其中分類是一項普遍的資料探勘方法。分類為將資料物件依據某些條件歸類到事先制定好的類別之中的資料探勘方法。然而，海量資料的巨量、即時性及多樣性這三項特點，使得傳統的資料探勘方法不足以分析海量資料。因此，本研究提出一個增量式關聯分類的啟發式演算法，用來有效並有效率地分析海量資料。　　本研究所提出的關聯分類演算法並不同時使用所有的屬性去建置分類器，而是逐步增加屬性去改良分類器的正確性。並且此演算法可以篩選出具有鑑別力的屬性，優先使用這些具有鑑別力的屬性，以最小化建置分類器所需屬性之數量，顯著地縮減計算時間。此外，本研究所提出的關聯分類演算法能夠使用之前所產生的規則與新增的資料來更新分類器，以避免重複尋找已知的資訊。最後，本研究使用大量的網路入侵偵測資料來驗證此演算法的有效性和效率。	zh_TW
dc.description.abstract	Big data has emerged as one of the most popular issues these days since the advance of IT and network technologies enable the massive data collection from many different sources. Organizations can derive competitive advantage from big data. For instance, they can improve the quality of the decision making by analyzing big data. However, big data creates huge challenges for organizations to manage and analyze such large and updated rapidly data. 　　Closely connected to the big data issues is the development of data mining technique, and one of the most popular data mining tasks is the classification that deals with grouping data objects into predefined categories based on certain criteria. However, since the three characteristics of big data, volume, velocity and variety, big data has exceeded the capability of the conventional data mining approaches. Therefore, a heuristic incremental associative classification algorithm is proposed in this study to analyze big data effectively and efficiently. 　　The associative classification algorithm proposed in this study builds a classifier by iterative steps, which adds some of attributes to improve the accuracy of the classifier each time, instead of using all the attributes at the same time. In addition, the proposed algorithm can identify and prioritize the discriminative attributes to minimize the number of attributes used, so it can reduce the computing time significantly. Moreover, the classifier can be updated based on the previous rules and the incremental data to avoid re-finding the existing information. The efficiency and the validity of the proposed algorithm are verified with a large volume of intrusion detection data set.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T13:15:35Z (GMT). No. of bitstreams: 1 ntu-102-R00725009-1.pdf: 1058848 bytes, checksum: a3d0872a903ed7391b6c96a69ad41337 (MD5) Previous issue date: 2013	en
dc.description.tableofcontents	Content v List of Figures viii List of Tables ix Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Objectives 5 1.3 Scope 6 Chapter 2 Literature Review 8 2.1 Big Data 8 2.2 Classification Approaches 10 2.3 Associative Classification 13 2.3.1 Frequent Rules Discovery Approaches 13 2.3.2 Rules Ranking Approaches 15 2.3.3 Pruning Techniques 17 2.3.4 Prediction Methods 18 2.4 Incremental Approaches 19 2.5 Conclusion 21 Chapter 3 Problem Description 23 3.1 Definition of Big Data 24 3.2 Classification Based on Associations (CBA) 24 3.3 Data Assumptions 30 3.4 Problem Statement 31 3.5 Summary 33 Chapter 4 The Heuristic Incremental Associative Classification Algorithm (HIACA) 34 4.1 The Associative Classification based on Potential Items (ACPI) 35 4.2 The Incremental Associative Classification based on Potential Items (IACPI) 50 4.3 The Time Complexity of ACPI and IACPI 61 Chapter 5 Computational Analysis 64 5.1 Data Source and Data Description 64 5.2 Experiments for the ACPI 66 5.2.1 Experiment 1: The Effect of Increasing the Numbers of Attributes 67 5.2.2 Experiment 2: The Effect of Increasing the Numbers of Objects 72 5.2.3 Experiment 3: The Classifying Ability 74 5.2.4 Experiment 4: The Ways of Ordering the Attributes 75 5.3 Experiments for the IACPI 78 5.3.1 Scenario 1: Eliminating a Class 80 5.3.2 Scenario 2: Adding a New Class 83 5.3.3 Scenario 3: Eliminating a Class and Adding a New Class 86 5.3.4 Scenario 4: Adding a New Attribute 89 5.4 Summary 92 Chapter 6 Conclusion and Future Work 94 6.1 Conclusion 94 6.2 Future Works 96 References 97 Appendix A. The Pseudo Code of ACPI 101 Appendix B. The Pseudo Code of IACPI 103 Appendix C. The results of ACPI with different numbers of attributes 105 Appendix D. The attribute orders of the experiment 4 for ACPI 107 Appendix E. The classifier generated by ACPI with data set 0 109 Appendix F. The classifier generated by IACPI with data set 1 111 Appendix G. The classifier generated by IACPI with data set 2 112 Appendix H. The classifier generated by IACPI with data set 3 113 Appendix I. The classifier generated by IACPI with data set 4 114 Appendix J. The Results of Experiment 2 for the ACPI 115 Appendix K. The Results of Experiment 4 for the ACPI 117 Appendix L. The Results of Scenario 1 for the IACPI 120 Appendix M. The Results of Scenario 2 for the IACPI 122 Appendix N. The Results of Scenario 3 for the IACPI 124 Appendix O. The Results of Scenario 4 for the IACPI 126
dc.language.iso	en
dc.subject	海量資料	zh_TW
dc.subject	資料探勘	zh_TW
dc.subject	分類	zh_TW
dc.subject	關聯規則	zh_TW
dc.subject	增量式演算法	zh_TW
dc.subject	Association Rules	en
dc.subject	Big Data	en
dc.subject	Data Mining	en
dc.subject	Incremental Algorithm	en
dc.subject	Classification	en
dc.title	以增量式關聯分類方法分析海量資料	zh_TW
dc.title	An Incremental Associative Classification Approach for Big Data Analytics	en
dc.type	Thesis
dc.date.schoolyear	101-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	魏志平,陳建錦,盧信銘
dc.subject.keyword	海量資料,資料探勘,分類,關聯規則,增量式演算法,	zh_TW
dc.subject.keyword	Big Data,Data Mining,Classification,Association Rules,Incremental Algorithm,	en
dc.relation.page	127
dc.rights.note	有償授權
dc.date.accepted	2013-07-29
dc.contributor.author-college	管理學院	zh_TW
dc.contributor.author-dept	資訊管理學研究所	zh_TW
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 未授權公開取用	1.03 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。