以增量式關聯分類方法分析海量資料

Hsin-Ting Chung; 鍾欣廷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847

標題:	以增量式關聯分類方法分析海量資料 An Incremental Associative Classification Approach for Big Data Analytics
作者:	Hsin-Ting Chung 鍾欣廷
指導教授:	陳靜枝
關鍵字:	海量資料,資料探勘,分類,關聯規則,增量式演算法, Big Data,Data Mining,Classification,Association Rules,Incremental Algorithm,
出版年 :	2013
學位:	碩士
摘要:	由於資訊科技以及網路的發展，讓大量的資料得以從眾多的來源快速地蒐集及儲存，海量資料近年來成為一個火紅的議題。企業組織可以利用海量資料獲取競爭優勢，例如：組織能透過分析海量資料以改善決策的品質。然而，管理與分析這些龐大且快速更新的資料，對組織而言是一項艱鉅的挑戰。　　與資料分析息息相關的議題為資料探勘技術，其中分類是一項普遍的資料探勘方法。分類為將資料物件依據某些條件歸類到事先制定好的類別之中的資料探勘方法。然而，海量資料的巨量、即時性及多樣性這三項特點，使得傳統的資料探勘方法不足以分析海量資料。因此，本研究提出一個增量式關聯分類的啟發式演算法，用來有效並有效率地分析海量資料。　　本研究所提出的關聯分類演算法並不同時使用所有的屬性去建置分類器，而是逐步增加屬性去改良分類器的正確性。並且此演算法可以篩選出具有鑑別力的屬性，優先使用這些具有鑑別力的屬性，以最小化建置分類器所需屬性之數量，顯著地縮減計算時間。此外，本研究所提出的關聯分類演算法能夠使用之前所產生的規則與新增的資料來更新分類器，以避免重複尋找已知的資訊。最後，本研究使用大量的網路入侵偵測資料來驗證此演算法的有效性和效率。 Big data has emerged as one of the most popular issues these days since the advance of IT and network technologies enable the massive data collection from many different sources. Organizations can derive competitive advantage from big data. For instance, they can improve the quality of the decision making by analyzing big data. However, big data creates huge challenges for organizations to manage and analyze such large and updated rapidly data. 　　Closely connected to the big data issues is the development of data mining technique, and one of the most popular data mining tasks is the classification that deals with grouping data objects into predefined categories based on certain criteria. However, since the three characteristics of big data, volume, velocity and variety, big data has exceeded the capability of the conventional data mining approaches. Therefore, a heuristic incremental associative classification algorithm is proposed in this study to analyze big data effectively and efficiently. 　　The associative classification algorithm proposed in this study builds a classifier by iterative steps, which adds some of attributes to improve the accuracy of the classifier each time, instead of using all the attributes at the same time. In addition, the proposed algorithm can identify and prioritize the discriminative attributes to minimize the number of attributes used, so it can reduce the computing time significantly. Moreover, the classifier can be updated based on the previous rules and the incremental data to avoid re-finding the existing information. The efficiency and the validity of the proposed algorithm are verified with a large volume of intrusion detection data set.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61847
全文授權:	有償授權
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-102-1.pdf 目前未授權公開取用	1.03 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。