Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳靜枝
dc.contributor.authorHsin-Ting Chungen
dc.contributor.author鍾欣廷zh_TW
dc.date.accessioned2021-06-16T13:15:35Z-
dc.date.available2014-08-06
dc.date.copyright2013-08-06
dc.date.issued2013
dc.date.submitted2013-07-29
dc.identifier.citation[1] Agrawal, R., T. Imieliński, and A. Swami. Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Washington, DC), 1993, pp. 207-216.
[2] Agrawal, R., and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases (Santiago de Chile, Chile), 1994, pp. 487-499.
[3] All too much. The Economist, 2010.
[4] Antonie, M.-L., and O. R. Za‥ıane. An Associative Classifier Based on Positive and Negative Rules. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (Paris, France), 2004, pp. 64-69.
[5] Baralis, E., S. Chiusano, and P. Garza. On Support Thresholds in Associative Classification. In Proceedings of the 2004 ACM Symposium on Applied Computing (Nicosia, Cyprus), 2004, pp. 553-558.
[6] Baralis, E., and P. Garza. A Lazy Approach to Pruning Classification Rules. In Proceedings of 2002 IEEE International Conference on Data Mining (Maebashi City, Japan), 2002, pp. 35-42.
[7] Boser, B. E., I. M. Guyon, and V. N. Vapnik. A Training Algorithm for Optimal Margin Classifiers. In Proceedings of 5th annual workshop on Computational learning theory (Pittsburgh, Pennsylvania, USA), 1992, pp. 144-152.
[8] Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.
[9] Cheung, D. W.-L., J. Han, V. Ng, and C. Y. Wong. Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. In Proceedings of the 12th International Conference on Data Engineering (New Orleans, Louisiana), 1996, pp. 106-114.
[10] Craven, M. W., and J. W. Shavlik. Using Neural Networks for Data Mining. Future Generation Computer Systems, 13(2-3), 1997, pp. 211-229.
[11] Crespoa, F., and R. Weber. A methodology for dynamic data mining based on fuzzy clustering. Fuzzy Sets and Systems, 150(2), 2005, pp. 267-284.
[12] Crone, S. F., S. Lessmann, and R. Stahlbock. The Impact of Preprocessing on Data Mining: An Evaluation of Classifier Sensitivity in Direct Marketing. European Journal of Operational Research, 173(3), 2006, pp. 781-800.
[13] Data, data everywhere. The Economist, 2010.
[14] Demirkana, H., and D. Delen. Leveraging the Capabilities of Service-oriented Decision Support Systems: Putting Analytics and Big Data in Cloud. Decision Support Systems, 2012.
[15] Fayyad, U. M., and K. B. Irani. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of XIII International Joint Conference on Artificial Intelligence (Chambery), 1993, pp. 1022-1029.
[16] Fidelis, M. V., H. S. Lopes, and A. A. Freitas. Discovering Comprehensible Classification Rules with a Genetic Algorithm. In Proceedings of 2000 Congress on Evolutionary Computation, 2000, pp. 805-810.
[17] Gartner. Definition of Big Data. Available from: http://www.gartner.com/it-glossary/big-data/.
[18] Gharib, T. F., H. Nassar, M. Taha, and A. Abraham. An Efficient Algorithm for Incremental Mining of Temporal Association Rules. Data & Knowledge Engineering, 69(8), 2010, pp. 800-815.
[19] Han, J., M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2011.
[20] Han, J., and J. Pei. Mining Frequent Patterns by Pattern-growth: Methodology and Implications. ACM SIGKDD Explorations Newsletter, 2(2), 2000, pp. 14-20.
[21] Han, J., J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA), 2000, pp. 1-12.
[22] Hopkins, B. Beyond the Hype of Big Data. CIO, 2011.
[23] IBM. What is big data? Bringing Big Data to the Enterprise. Available from: http://www-01.ibm.com/software/data/bigdata/.
[24] Jackson, J. The Big Promise of Big Data. CIO, 2012.
[25] Jacobs, A. The Pathologies of Big Data. Communications of the ACM, 52(8), 2009, pp. 36-44.
[26] Jong, K. A. D., W. M. Spears, and D. F. Gordon. Using Genetic Algorithms for Concept Learning. Machine Learning, 13(2-3), 1993, pp. 161-188.
[27] Kaastraa, I., and M. Boyd. Designing a neural network for forecasting financial and economic time series. Neurocomputing, 10(3), 1996, pp. 215-236.
[28] KDD Cup 1999 Data Set from UCI Machine Learning Repository. Available from: http://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data.
[29] LaValle, S., E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz. Big Data, Analytics And The Path From Insights To Value. MIT Sloan Management Review, 52(2), 2011, pp. 21-32.
[30] Lee, C.-H., C.-R. Lin, and M.-S. Chen. Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining. Information Systems, 30(3), 2005, pp. 227-244.
[31] Li, W., J. Han, and J. Pei. CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In Proceedings of 2001 IEEE International Conference on Data Mining (San Jose, California, USA), 2001, pp. 369-376.
[32] Lippmann, R. P. An Introduction to Computing with Neural Nets. ASSP Magazine, IEEE, 4(2), 1987, pp. 4-22.
[33] Liu, B., W. Hsu, and Y. Ma. Integrating Classification and Association Rule Mining. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (New York City, New York, USA), 1998, pp. 80-86.
[34] Liu, B., Y. Ma, and C. K. Wong. Improving an Association Rule Based Classifier. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (Lyon, France), 2000, pp. 504-509.
[35] Lu, H., R. Setiono, and H. Liu. Effective Data Mining Using Neural Networks. IEEE Transactions on Knowledge and Data Engineering, 8(6), 1996, pp. 957-961.
[36] Manyika, J., et al. Big data: The next frontier for innovation, competition, and productivity. M. G. Institute, 2011.
[37] Meretakis, D., and B. Wuthrich. Extending Naive Bayes Classifiers Using Long itemsets. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, CA, USA), 1999, pp. 165-174.
[38] Nash, K. S. How Big Data Can Reduce Big Risk. CIO, 2012.
[39] Novikov, B., N. Vassilieva, and A. Yarygina. Querying big data. In Proceedings of the 13th International Conference on Computer Systems and Technologies (Ruse, Bulgaria), 2012, pp. 1-10.
[40] Olavsrud, T. Big Data Causes Concern and Big Confusion. CIO, 2012.
[41] Olavsrud, T. How to Be Ready for Big Data. CIO, 2012.
[42] Quinlan, J. R. Induction of Decision Trees. Machine Learning, 1(1), 1986, pp. 81-106.
[43] Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
[44] Reid, C. Can 'Big Data' Fix Book Marketing? Publishers Weekly, 2012.
[45] Schadt, E. E., M. D. Linderman, J. Sorenson, L. Lee, and G. P. Nolan. Computational Solutions to Large-scale Data Management and Analysis. Nature Reviews Genetics, 11(9), 2010, pp. 647-657.
[46] Shannon, C. E. A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 1948, pp. 379-423, 623-656.
[47] Stackpole, B. 5 Things IT Should Do to Prepare for Big Data. Computerworld US, 2012.
[48] Thabtah, and F. Abdeljaber. A Review of Associative Classification Mining. Knowledge Engineering Review, 22(1), 2007, pp. 37-65.
[49] Thabtah, F., P. Cowling, and Y. o. Peng. MCAR: Multi-class Classification based on Association Rule. In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications (Cairo, Egypt), 2005.
[50] Thabtah, F. A., P. Cowling, and Y. Peng. MMAC: A New Multi-class, Multi-label Associative Classification Approach. In Proceedings of the 4th IEEE International Conference on Data Mining (Brighton, UK), 2004, pp. 217-224.
[51] Tsai, P. S. M., C.-C. Lee, and A. L. P. Chen. An Efficient Approach for Incremental Association Rule Mining. In Proceedings of the 3rd Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining (Beijing, China), 1999, pp. 74-83.
[52] Utgoff, P. E., N. C. Berkman, and J. A. Clouse. Decision Tree Induction Based on Efficient Tree Restructuring. Machine Learning, 29(1), 1997, pp. 5-44.
[53] Wang, K., Y. He, and D. W. Cheung. Mining Confident Rules Without Support Requirement. In Proceedings of the 10th International Conference on Information and Knowledge Management (Atlanta, Georgia), 2001, pp. 89-96.
[54] Wang, K., S. Zhou, and Y. He. Growing Decision Trees on Support-less Association Rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, MA, USA), 2000, pp. 265-269.
[55] Yin, X., and J. Han. CPAR: Classication based on Predictive Association Rules. In Proceedings of the SIAM International Conference on Data Mining (San Francisco, CA), 2003, pp. 369-376.
[56] Zaki, M. J., S. Parthasarathy, M. Ogihara, and W. Li. New Algorithms for Fast Discovery of Association Rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (Newport Beach, California, USA), 1997, pp. 283-286.
[57] Zhou, Z., and C. I. Ezeife. A Low-Scan Incremental Association Rule Maintenance Method Based on the Apriori Property. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence (Ottawa, Canada), 2001, pp. 26-35.
[58] Zhu, F., and S. Guan. Ordered incremental training for GA-based classifiers. Pattern Recognition Letters, 26(14), 2005, pp. 2135-2151.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847-
dc.description.abstract由於資訊科技以及網路的發展,讓大量的資料得以從眾多的來源快速地蒐集及儲存,海量資料近年來成為一個火紅的議題。企業組織可以利用海量資料獲取競爭優勢,例如:組織能透過分析海量資料以改善決策的品質。然而,管理與分析這些龐大且快速更新的資料,對組織而言是一項艱鉅的挑戰。
  與資料分析息息相關的議題為資料探勘技術,其中分類是一項普遍的資料探勘方法。分類為將資料物件依據某些條件歸類到事先制定好的類別之中的資料探勘方法。然而,海量資料的巨量、即時性及多樣性這三項特點,使得傳統的資料探勘方法不足以分析海量資料。因此,本研究提出一個增量式關聯分類的啟發式演算法,用來有效並有效率地分析海量資料。
  本研究所提出的關聯分類演算法並不同時使用所有的屬性去建置分類器,而是逐步增加屬性去改良分類器的正確性。並且此演算法可以篩選出具有鑑別力的屬性,優先使用這些具有鑑別力的屬性,以最小化建置分類器所需屬性之數量,顯著地縮減計算時間。此外,本研究所提出的關聯分類演算法能夠使用之前所產生的規則與新增的資料來更新分類器,以避免重複尋找已知的資訊。最後,本研究使用大量的網路入侵偵測資料來驗證此演算法的有效性和效率。
zh_TW
dc.description.abstractBig data has emerged as one of the most popular issues these days since the advance of IT and network technologies enable the massive data collection from many different sources. Organizations can derive competitive advantage from big data. For instance, they can improve the quality of the decision making by analyzing big data. However, big data creates huge challenges for organizations to manage and analyze such large and updated rapidly data.
  Closely connected to the big data issues is the development of data mining technique, and one of the most popular data mining tasks is the classification that deals with grouping data objects into predefined categories based on certain criteria. However, since the three characteristics of big data, volume, velocity and variety, big data has exceeded the capability of the conventional data mining approaches. Therefore, a heuristic incremental associative classification algorithm is proposed in this study to analyze big data effectively and efficiently.
  The associative classification algorithm proposed in this study builds a classifier by iterative steps, which adds some of attributes to improve the accuracy of the classifier each time, instead of using all the attributes at the same time. In addition, the proposed algorithm can identify and prioritize the discriminative attributes to minimize the number of attributes used, so it can reduce the computing time significantly. Moreover, the classifier can be updated based on the previous rules and the incremental data to avoid re-finding the existing information. The efficiency and the validity of the proposed algorithm are verified with a large volume of intrusion detection data set.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T13:15:35Z (GMT). No. of bitstreams: 1
ntu-102-R00725009-1.pdf: 1058848 bytes, checksum: a3d0872a903ed7391b6c96a69ad41337 (MD5)
Previous issue date: 2013
en
dc.description.tableofcontentsContent v
List of Figures viii
List of Tables ix
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Objectives 5
1.3 Scope 6
Chapter 2 Literature Review 8
2.1 Big Data 8
2.2 Classification Approaches 10
2.3 Associative Classification 13
2.3.1 Frequent Rules Discovery Approaches 13
2.3.2 Rules Ranking Approaches 15
2.3.3 Pruning Techniques 17
2.3.4 Prediction Methods 18
2.4 Incremental Approaches 19
2.5 Conclusion 21
Chapter 3 Problem Description 23
3.1 Definition of Big Data 24
3.2 Classification Based on Associations (CBA) 24
3.3 Data Assumptions 30
3.4 Problem Statement 31
3.5 Summary 33
Chapter 4 The Heuristic Incremental Associative Classification Algorithm (HIACA) 34
4.1 The Associative Classification based on Potential Items (ACPI) 35
4.2 The Incremental Associative Classification based on Potential Items (IACPI) 50
4.3 The Time Complexity of ACPI and IACPI 61
Chapter 5 Computational Analysis 64
5.1 Data Source and Data Description 64
5.2 Experiments for the ACPI 66
5.2.1 Experiment 1: The Effect of Increasing the Numbers of Attributes 67
5.2.2 Experiment 2: The Effect of Increasing the Numbers of Objects 72
5.2.3 Experiment 3: The Classifying Ability 74
5.2.4 Experiment 4: The Ways of Ordering the Attributes 75
5.3 Experiments for the IACPI 78
5.3.1 Scenario 1: Eliminating a Class 80
5.3.2 Scenario 2: Adding a New Class 83
5.3.3 Scenario 3: Eliminating a Class and Adding a New Class 86
5.3.4 Scenario 4: Adding a New Attribute 89
5.4 Summary 92
Chapter 6 Conclusion and Future Work 94
6.1 Conclusion 94
6.2 Future Works 96
References 97
Appendix A. The Pseudo Code of ACPI 101
Appendix B. The Pseudo Code of IACPI 103
Appendix C. The results of ACPI with different numbers of attributes 105
Appendix D. The attribute orders of the experiment 4 for ACPI 107
Appendix E. The classifier generated by ACPI with data set 0 109
Appendix F. The classifier generated by IACPI with data set 1 111
Appendix G. The classifier generated by IACPI with data set 2 112
Appendix H. The classifier generated by IACPI with data set 3 113
Appendix I. The classifier generated by IACPI with data set 4 114
Appendix J. The Results of Experiment 2 for the ACPI 115
Appendix K. The Results of Experiment 4 for the ACPI 117
Appendix L. The Results of Scenario 1 for the IACPI 120
Appendix M. The Results of Scenario 2 for the IACPI 122
Appendix N. The Results of Scenario 3 for the IACPI 124
Appendix O. The Results of Scenario 4 for the IACPI 126
dc.language.isoen
dc.subject海量資料zh_TW
dc.subject資料探勘zh_TW
dc.subject分類zh_TW
dc.subject關聯規則zh_TW
dc.subject增量式演算法zh_TW
dc.subjectAssociation Rulesen
dc.subjectBig Dataen
dc.subjectData Miningen
dc.subjectIncremental Algorithmen
dc.subjectClassificationen
dc.title以增量式關聯分類方法分析海量資料zh_TW
dc.titleAn Incremental Associative Classification Approach for Big Data Analyticsen
dc.typeThesis
dc.date.schoolyear101-2
dc.description.degree碩士
dc.contributor.oralexamcommittee魏志平,陳建錦,盧信銘
dc.subject.keyword海量資料,資料探勘,分類,關聯規則,增量式演算法,zh_TW
dc.subject.keywordBig Data,Data Mining,Classification,Association Rules,Incremental Algorithm,en
dc.relation.page127
dc.rights.note有償授權
dc.date.accepted2013-07-29
dc.contributor.author-college管理學院zh_TW
dc.contributor.author-dept資訊管理學研究所zh_TW
顯示於系所單位:資訊管理學系

文件中的檔案:
檔案 大小格式 
ntu-102-1.pdf
  未授權公開取用
1.03 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved