非同步局部模組分散式分類與不規則變動之巨量資料探勘

Yu-Fen Chen; 陳玉芬

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52120

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳銘憲
dc.contributor.author	Yu-Fen Chen	en
dc.contributor.author	陳玉芬	zh_TW
dc.date.accessioned	2021-06-15T16:08:16Z	-
dc.date.available	2016-08-25
dc.date.copyright	2015-08-25
dc.date.issued	2015
dc.date.submitted	2015-08-19
dc.identifier.citation	[1] Bayes net generator. http://weka.sourceforge.net/doc.dev/weka/ classifiers/bayes/net/BayesNetGenerator.html. Accessed: 2015-05-30. [2] Random decision generator. http://www.dbs.ifi.lmu.de/ ~zimek/diplomathesis/implementations/EHNDs/doc/weka/ datagenerators/RDG1.html. Accessed: 2015-05-30. [3] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering, 18(5):577–589, May 2006. [4] N. K. Alham, M. Li, S. Hammoud, Y. Liu, and M. Ponraj. A distributed svm for image annotation. Proceedings of the 7th International Conference on Fuzzy System and Knowledge Discovery, pages 2983–2987, 2010. [5] N. K. Alham, M. Li, Y. Liu, M. Ponraj, and M. Qi. A distributed svm ensemble for image classification and annotation. Proceedings of the 9th International Conference on Fuzzy System and Knowledge Discovery, pages 1581–1584, 2012. [6] H. Altincay. On naive bayesian fusion of dependent classifiers. Pattern Recognition Letters, 26:2463–2473, 2005. [7] A. B. Ashfaq, M. Javed, S. A. Khayam, and H. Radha. An information-theoretic combining method for multi-classifier anomaly detection systems. IEEE International Conference on Communications (ICC 2010), pages 1–5, 2010. 91 [8] A. Bar-Or, D. Keren, A. Schuster, and R. Wolff. Hierarchical decision tree induction in distributed genomic databases. IEEE Transactions on Knowledge and Data Engineering, 17(8):1138–1151, 2005. [9] J. Basak and R. Kothari. A classification paradigm for distributed vertically partitioned data. Neural Computation, 16(7):1525–1544, July 2004. [10] Y. Ben-Haim and E. Tom-Tov. A streaming parallel decision tree algorithm. Journal of Machine Learning Research, 11:849–872, 2010. [11] F. Berzal, J.-C. Cubero, F. Cuenca, and J. M. Medina. Relational decomposition through partial functional dependencies. Journal of Data and Knowledge Engineering, 43(2):207–234, 2002. [12] K. Bhaduri, R. Wolff, C. Giannella, and H. Kargupta. Distributed decision tree induction in peer-to-peer systems. Statistical Analysis and Data Mining, 1(2):85–103, 2008. [13] A. Bifet and R. Gavalda. Learning from time-changing data with adaptive windowing. In SIAM International Conference on Data Mining, pages 443–448, 2007. [14] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. Moa: Massive online analysis. Journal of Machine Learning Research, pages 1601–1604, 2010. [15] J. A. Blackard and D. J. Dean. Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Journal of Computers and Electronics in Agriculture, 24(3):131–151, 1999. [16] C. L. Blake and C. J. Merz. Uci repository of machine learning databases. 1998. [17] A. Cano, A. Zafra, and S. Ventura. Speeding up multiple instance learning classification rules on gpus. Journal of Knowledge and Information Systems, 44(1):127–145, 2015. 92 [18] R. Cattrala, F. Oppacher, and D. Deugo. Evolutionary data mining with automatic rule generalization. Recent Advances in Computers, Computing and Communications, pages 296–300, 2002. [19] X. Chai, L. Deng, Q. Yang, and C. X. Ling. Test-cost sensitive naive bayes classification. Proceedings of IEEE International Conference on Data Mining, pages 51–58, 2004. [20] J. H. Chang and W. S. Lee. A sliding window method for finding recently frequent itemsets over online data streams. Journal of Information Science and Engineering, 20:753–762, 2004. [21] H.-L. Chen, M.-S. Chen, and S.-C. Lin. Catching the trend: a framework for clustering concept-drifting categorical data. IEEE Transactions on Knowledge and Data Engineering, 21(5):652–665, May 2009. [22] K. Chen and L. Liu. Best k: Critical clustering structures in categorical datasets. Knowledge and Information Systems, 20(1):1–33, 2009. [23] V. Cho and B. Wuthrich. Distributed mining of classification rules. Knowledge and Information Systems, 4(1):1–30, January 2002. [24] J. Demetrovics, G. O. H. Katona, and D. Miklos. Partial dependencies in relational databases and their realization. Journal of Discrete Applied Mathematics, 40(2): 127–138, 1992. [25] Q. Ding, Q. Ding, and W. Perrizo. Decision tree classification of spatial data streams using peano count trees. Proceedings of the ACM/SIGAPP Symposium On Applied Computing, pages 413–417, 2002. [26] P. Domingos and G. Hulten. Mining high-speed data streams. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 71–80, August 2000. 93 [27] R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training support vector machines. Journal of Machine Learning Research, 6:1889–1918, 2005. [28] W. Fan. Systematic data selection to mine concept-drifting data streams. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, Washington USA, pages 128–137, 2004. [29] J. H. Friedman and B. E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954, 2008. [30] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Mining data streams: a review. ACM SIGMOD Record, 34(2):18–26, June 2005. [31] J. Gama and G. Castillo. Learning with local drift detection. Advanced Data Mining and Applications, 4093:42–55, 2006. [32] J. Gama and P. Kosina. Recurrent concepts in data streams classification. International Journal of Knowledge and Information Systems, 40(3):489–507, May 2014. [33] J. Gao, W. Fan, J. Han, and P. S. Yu. A general framework for mining concept-drifting data streams with skewed distributions. Proceedings of the 2007 SIAM International Conference on Data Mining, 2007. [34] J. Gehrke, V. Ganti, R. Ramakrishnan, and W.-Y. Loh. Boat - optimistic decision tree construction. Proceedings of the 1999 ACM SIGMOD International Conference on the Management of Data, pages 169–180, 1999. [35] K. Goebel and W. Yan. Choosing classifiers for decision fusion. Proceedings of the Seventh International Conference on Information Fusion, pages 563–568, 2004. [36] M. Goli and S. M. T. R. Rankoohi. A new vertical fragmentation algorithm based on ant collective behavior in distributed database systems. Journal of Knowledge and Information Systems, 30(2):435–455, 2012. 94 [37] J. B. Gomes, M. M. Gaber, P. A. C. Sousa, and E. Menasalvas. Mining recurring concepts in a dynamic feature space. IEEE Transactions on Neural Networks and Learning Systems, 25(1):95–110, 2013. [38] J. B. Gomes, P. A. Sousa, and E. Menasalvas. Tracking recurrent concepts using context. Journal of Intelligent Data Analysis, 16(5):803–825, 2012. [39] M. J. Hosseini, Z. Ahmadi, and H. Beigy. Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification. Journal of Evolving Systems, 4(1):43–60, 2013. [40] G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 97–106, 2001. [41] M. Z. Islam. Explore: A novel decision tree classification algorithm. Data Security and Security Data, 6121:55–71, 2012. [42] M. Khademi, M. T. Manzuri-Shalmani, M. H. Kiapour, and A. A. KiaeiJrip. Recognizing combinations of facial action units with different intensity using a mixture of hidden markov models and neural network. Computer Vision and Pattern Recognition, 5997:304–313, 2010. [43] R. Klinkenberg and T. Joachims. Detecting concept drift with support vector machines. Proceedings of the 17th IEEE International Conference on Machine Learning (ICML 2000), pages 487–494, 2000. [44] E. Kokiopoulou and P. Frossard. Distributed classification of multiple observation sets by consensus. IEEE Transactions on Signal Processing, 59(1):104–114, 2011. [45] L. I. Kuncheva and J. J. Rodríguez. A weighted voting framework for classifiers ensembles. Journal of Knowledge and Information Systems, 38(2):259–275, 2014. 95 [46] B. Lantow. Applying distributed classification algorithms to wireless sensor networks - a brief view into the application of the sprint algorithm family. Seventh International Conference on Networking, pages 52–59, 2008. [47] P. Li, X. Wu, and X. Hu. Mining recurring concept drifts with limited labeled streaming data. International Journal of ACM Transactions on Intelligent Systems and Technology (TIST), 3(2), February 2012. [48] P. Li, X. Wu, Q. Liang, and Y. Gao. Concept drifting detection on noisy streaming data in random ensemble decision trees, journal = Proc. of the 6th International Conference on Machine Learning and Data Mining(MLDM-11), year = 2009, pages = 236-250. [49] P. Li, X. Wu, Q. Liang, X. Hu, and Y. Zhang. Random ensemble decision trees for learning concept-drifting data streams. Proc. of the 15th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD-11), pages 313–325, 2011. [50] Y. Liao and V. Vemuri. Use of k-nearest neighbor classifier for intrusion detection. Computers and Security, 21(5):439–448, 2002. [51] C.-R. Lin and M.-S. Chen. On the optimal clustering of sequential data. Proceedings of the 2nd SIAM International Conference on Data Mining, April 2002. [52] M. Lippi, M. Jaeger, P. Frasconi, and A. Passerini. Relational information gain. Journal of Machine Learning, 83(2):219–239, 2011. [53] Y. Liu, L. Guo, F. Li, and S. Chen. An empirical evaluation of battery power consumption for streaming data transmission to mobile devices. Proceedings of the 19th ACM international conference on Multimedia, pages 473–482, 2011. [54] J. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probavility, 1967. 96 [55] U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury. A survey of decision fusion and feature fusion strategies for pattern classification. IETE Technical Review, 27(4): 293–307, 2010. [56] M. M. Masud, T. M. Al-Khateeb, L. Khan, C. Aggarwal, J. Gao, J. Han, and B. Thuraisingham. Detecting recurring and novel classes in concept-drifting data streams. Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2011), pages 1176–1181, May 2011. [57] M. M. Masud, J. Gao, L. Khan, and J. Han. A practical approach to classify evolving data streams: training with limited amount of labeled data. Proc. of the 8th IEEE International Conference on Data Mining(ICDM-08), pages 929–934, 2008. [58] M. M. Masud, J. Gao, L. Khan, J. Han, and B. Thuraisingham. Integrating novel class detection with classification for concept-drifting data streams. Proceedings of the Machine Learning and Knowledge Discovery in Databases Conference, pages 79–94, 2009. [59] V. Metsis, I. Androutsopoulos, and G. Paliouras. Spam filtering with naive bayes - which naive bayes? The 3rd Conference on Email and Anti-Spam (CEAS 2006), 2006. [60] D. J. Miller, Y. Zhang, and G. Kesidis. Decision aggregation in distributed classification by a transductive extension of maximum entropy/improved iterative scaling. EURASIP Journal on Advances in Signal Processing, pages 1–21, 2008. [61] R. T. Ng and J. Han. Efficient and effective clustering methods for spatial data mining. Proceedings of the 20th VLDB Conference, 1994. [62] J. Ortiz, A. G. Olaya, and D. Borrajo. A dynamic sliding window approach for activity recognition. Proceedings of the 19th international conference on User modeling, adaption, and personalization, pages 219–230, 2011. 97 [63] S. Ramamurthy and R. Bhatnagar. Tracking recurrent concept drift in streaming data using ensemble classifiers. Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA 2007), pages 404–409, 2007. [64] J. J. Rodriguez and L. I. Kuncheva. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10): 1619–1630, 2006. [65] L. Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2):1–39, 2010. [66] J. K. Song, H. Gao, L. L. Gao, and Y. Fu. Highly accurate distributed classification of web documents. Proceedings of the International Sumposium on Web Information Systems and Applications (WISA), pages 68–71, 2009. [67] E. Spyromitros-Xioufis, M. Spiliopoulou, G. Tsoumakas, and I. Vlahavas. Dealing with concept drift and class imbalance in multi-label stream classification. Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), pages 1583–1588, 2011. [68] M. Stolpe, K. Bhaduri, K. Das, and K. Morik. Anomaly detection in vertically partitioned data by distributed core vector machines. Journal of Machine Learning and Knoeledge Discovery in Databases, 8190:321–336, 2013. [69] W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 377–382, 2001. [70] K. Thomson and R. J. McQueen. Machine learning applied to fourteen agricultural datasets. Computer Science Working Papers, 1996. [71] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Issue Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423, 2001. 98 [72] K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3/4):385–404, 1996. [73] K. Ueno, X. Xi, E. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. Proceedings of the 2006 IEEE International Conference on Data Mining (ICDM 2006), pages 623–632, 2006. [74] A. V. Uzilov, J. M. Keegan, and D. H. Mathews. Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics, 7(173), 2006. [75] R. M. Vallim, J. A. A. Filho, R. F. de Mello, A. C. P. L. F. de Carvalho, and J. Gama. Unsupervised density-based behavior change detection in data streams. Intelligent Data Analysis, 18(2):181–201, 2014. [76] H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, pages 226–235, 2003. [77] Y. Wang, Z. Li, Y. Zhang, L. Zhang, and Y. Jiang. Improving the performance of data stream classifiers by mining recurring contexts. Proc. of the 2nd International Conference on Advanced Data Mining and Applications, pages 1094–1106, 2006. [78] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. [79] C. Wojek, G. Dorko, A. Schulz, and B. Schiele. Sliding-windows for rapid object class localization: a parallel technique. Proceedings of the 30th DAGM symposium on Pattern Recognition, pages 71–81, 2008. [80] Y. Wu, J. M. Patel, and H. V. Jagadish. Structural join order selection for xml query optimization. Proceedings of the 19th IEEE International Conference of Data Engineering, pages 443–454, 2003. 99 [81] T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. ACM SIGMOD Conference, 1996. [82] Y. Zhang, D. Sow, D. Turaga, and M. van der Schaar. A fast online learning algorithm for distributed mining of bigdata. ACM SIGMETRICS Performance Evaluation Review, 41(4):90–93, 2014. [83] H. Zhu, Y. Wang, and Z. Yu. Clustering of evolving data stream with multiple adaptive sliding window. Proceedings of the 2010 International Conference on Data Storage and Data Engineering, pages 95–100, 2010.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/52120	-
dc.description.abstract	大數據資料強調大量、快速、多元、價值、及精確五種特性。涵蓋多類型的資料，包括科學工程分析，社群網路，傳感器，物聯網，及多媒體應用等。對於在大數據中，如何有效率地處理資料，進而轉為結構性的資訊，其資訊探勘技術需求日趨迫切且更具挑戰。分散式分類系統在整合分散式模型與資料扮演關鍵性的角色，分散式系統主要是利用統計分析及協同整合子資料庫之模型，讓多個區域性裝置可以同時蒐集資料。隨著大數據應用、無線與行動技術的普及，由分散式裝置產生的各式不同特性之資料量逐漸增加。分散式分類模型面臨下列數個巨量資料下衍生的難題：1) 非同步局部資料分散式分類, 地區性裝置受限於有限資源如電力、資料儲存空間，及區域性或其他規劃等因素，收集不完整屬性且僅有局部之資料。傳統收集完整資料，或是利用抽樣統計等相關技術整合的方法，將不再適用非同步之不完整資料的分散環境中。2) 就資料本身而言，快速變化的資料分佈，其變化模式與改變趨勢亦隨著外在環境有所變動，其多元的變化亦增加傳統分析及判斷資料是否改變之複雜度。若只使用單一固定長度之時間快門觀察資料，將會大幅降低預測模式反應資訊變化之效能。3) 為了更進一步擴充分散式分類模型的使用規模，利用模型轉換技術，將傳統普及之高效率非規則式模型，轉化為可傳輸之規則式模型，因此，如何將非規則式模型轉化為適當規則式模型，將成為決定分類模型效能的關鍵。本論文試著解決以上的問題，我們首先將目光著眼於分散式分類模型系統，並設計一套整合非同步局部資料的分散模型之方法，使得整體分散式分類系統的區域模型效能可以被妥善運用；由於分散式分類系統允許區域裝置收集非固定量之區域資料，當收集資料以產生資料模組時，因資料多樣性及變動多樣化等特性，單一時間窗所造成的錯誤率也隨之上升而大幅降低系統效能，我們進而提出連續性叢集方法，讓系統根據時間與資料分佈將資料適當切割以產生符合資料分佈之模組。最後，本論文提出兩種模型轉換方法，使傳統無法傳至伺服器整合之非規則式模型轉至規則式模型，以擴展分散式分類模型可使用之規模並提升整體效能。不論是在理論分析或者實驗測試上，本論文所提出之分散式分類模型皆較傳統分散式分類有更卓越的效能提升與更廣泛之應用。	zh_TW
dc.description.abstract	Big Data emphasizes on 5Vs (Volume, Velocity, Variety, Value and Veracity) relevant to variety of data (scientific and engineering, social network, sensor/IoT/IoE, and multimedia-audio, video, image, etc) that contribute to the Big Data challenges. This phenomenon introduces the urgent requirement for efficiently managing data to structured information. One predominate approach is distributed classification ensemble, which improve prediction efficiency by using ensemble of distributed model or integrated by combining distributed information via statistics, to allow multiple devices collect data concurrently. With the popularity of Big Data applications, wireless and mobile technology, the amount of data in various characteristics generated by distributed devices has been tremendously increasing. As a result, distributed classification in Big Data has new challenges. There are three main challenges in distributed big data systems: 1) the distributed classification models are asynchronous and incomplete from distributed devices. Traditional distributed classification algorithms, which rely on horizontal sub-databases or vertical sub-databases, cannot be applied in this scenario. 2) Due to various characteristics of Big Data, simply separating data to equal size for constructing models takes away the significant performance benefit of classification models. In particular, non-regular recurring data are especially vulnerable to models derived from equally separated windows because noise data interfere most of the models in fixed-size buckets. 3) In our distributed environment, arbitrarily transforming popular lazy models to rules will increase the diversity of local models and reduce additional transmission bandwidth consumption. This dissertation tries to solve the above problems. First, this dissertation focuses on distributed streaming environment scenario, and proposes a rule-based distributed classification for asynchronous partial data (DIP). Our proposed method DIP selects models based on the amount of local databases and the quality of local models such that the performance gain can be fully utilized. DIP saves the communication bandwidth by transferring organized information, instead of individual instances. In addition, our distributed classification method DIP enables local devices collect various amount of local data. Due to data diversity and change diversification, the performance of classification models built from fixed-size windows or chunks declines. We investigate the data characteristic of non-regular data and introduce sequential clustering which adaptively forms sequential clusters of data based on data distributions and time to reduce noise data of inter-cluster interference and enhance classification prediction of derived models. Finally, this dissertation proposes two model transformation methods, which transforms data distributions to rules, to facilitate popular lazy classifiers in our distributed classifier. In both theory analysis and tested experiments, the proposed distributed classification framework can achieve a significant performance gain and bigger scope, as compared to the traditional distributed classification ensemble and existing dynamically changing methods.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T16:08:16Z (GMT). No. of bitstreams: 1 ntu-104-Q90921009-1.pdf: 1887806 bytes, checksum: be3b1fc076c091e176a738e089f3730e (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	口試委員審定書 i 致謝 iii 中文摘要 v Abstract vii Contents ix List of Figures xiii 1 Introduction 1 1.1 Motivation and Overview of the Dissertation . . . . . . . . . . . . . . . 1 1.1.1 Distributed Classification for Asynchronous Partial Data Streams 3 1.1.2 Prediction of Non-regular Recurring Concept Data Streams . . . 3 1.1.3 Model Transformation from Data Distribution to Rules . . . . . . 5 1.2 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . 5 2 Distributed Classification for Asynchronous Partial Data Streams 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 Solution Overview . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 ix 2.3 The Offline Model Maintenance . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 Factors of Prediction Quality . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Timing of Local Model Construction and Transmission . . . . . . 19 2.3.3 Model Selection at the Server . . . . . . . . . . . . . . . . . . . 24 2.4 Online Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Time Period of the Model . . . . . . . . . . . . . . . . . . . . . 27 2.4.2 Weighted Information Gain of the Rule . . . . . . . . . . . . . . 29 2.4.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5.1 Data Description and Experiment Setup . . . . . . . . . . . . . . 35 2.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3 Prediction of Non-regular Recurring Concept Data Streams 49 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3 Algorithms for Sequential Clustering . . . . . . . . . . . . . . . . . . . . 62 3.3.1 Data Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3.2 Online Data Separation . . . . . . . . . . . . . . . . . . . . . . . 63 3.4 Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Detection of Concept Recurrence . . . . . . . . . . . . . . . . . . . . . . 69 3.6 Performance Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.6.1 Experiment I: On the clustering quality of SC . . . . . . . . . . . 72 3.6.2 Experiment II: On the classification quality of PRECORD . . . . 74 3.6.3 Experiment III: On the evaluation of heuristics . . . . . . . . . . 77 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 x 4 Model Transformation from Data Distribution to Rules 81 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 The Transformation from k-NN to Rule-based Classifiers . . . . . . . . . 82 4.3 The Transformation from Naïve Bayesian to Rule-based Classifiers . . . 84 5 Conclusion 89 Bibliography 91
dc.language.iso	en
dc.subject	不規則重覆資料	zh_TW
dc.subject	分散式分類	zh_TW
dc.subject	非同步局部資料	zh_TW
dc.subject	連續性叢集	zh_TW
dc.subject	non-regular recurring concept data	en
dc.subject	asynchronous partial data	en
dc.subject	distributed classification	en
dc.subject	sequential clustering	en
dc.title	非同步局部模組分散式分類與不規則變動之巨量資料探勘	zh_TW
dc.title	Distributed Classification of Asynchronous Partial Model for Non-regular Drifting Data	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	歐建志,葉彌妍,陳孟彰,王凡
dc.subject.keyword	分散式分類,非同步局部資料,不規則重覆資料,連續性叢集,	zh_TW
dc.subject.keyword	distributed classification,asynchronous partial data,non-regular recurring concept data,sequential clustering,	en
dc.relation.page	100
dc.rights.note	有償授權
dc.date.accepted	2015-08-19
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	1.84 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。