請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34742
標題: | 條件性叢集技術 Constrained Data Clustering |
作者: | Bi-Ru Dai 戴碧如 |
指導教授: | 陳銘憲(Ming-Syan Chen) |
關鍵字: | 資料探勘,資料叢集,資料串流, Data Mining,Data Clustering,Data Stream, |
出版年 : | 2006 |
學位: | 博士 |
摘要: | 近年來,由於各種應用中快速累積了大量資料,資料探勘相關的研究領域越來越受到重視,而其中的資料叢集分析技術,則提供了使用者觀察相似資料群集的途徑。
由於資料探勘的研究常因應用領域而異,其中限制性探勘技術是將應用領域之專業知識加入資料探勘分析考量中的一種方式,在此論文之中的第一個研究課題,即為提出新的限制性資料叢集定義:同一個叢集之中的任意兩個成員,其限制性屬性的差值不可超過所給定的限制範圍。根據此定義,我們提出了幾個相對應的限制性資料叢集演算法,接著,由於觀察到階層式叢集演算法具有的一個基本特性,即資料分群的順序會影響最後的叢集成果,因此又更進一步地設計了漸進式解除限制(progressive constraint relaxation)之技術,以降低分群順序的影響,並提昇分群的成果。 除了針對靜態資料進行資料叢集演算法的研究之外,我們也探討了資料串流環境中的資料叢集技術。在資料串流環境中,資料通常是快速累積,因此需要利用有限的時間與空間資源,提出有效的解決方案。此論文中,我們提出了一個針對多條資料串流進行叢集分析的架構,此架構包含了兩個階段,第一個階段處理並儲存資料串流,第二個階段則提供動態回應使用者叢集分析需求的機制。 最後,我們將限制性資料叢集技術延伸至資料串流環境中,配合本論文中所提出的限制性資料叢集定義,設計相對應的資料串流儲存架構,以產生符合使用者需求與限制的資料叢集。 Among various data mining capabilities, data clustering is a useful technique for group behavior investigation, and is helpful for many applications. Since data mining is an application dependent technology, the information involving domain knowledge is usually imposed on the mining systems as various constraints. In this dissertation, we address the problem of constrained clustering with numerical constraints, in which the constraint attribute values of any two data items in the same cluster are required to be within the corresponding constraint range. Several algorithms are proposed to solve such a clustering problem. It is noted that due to the intrinsic nature of the numerical constrained clustering, there is an order dependency on the process of attaining the clustering, which in many cases degrades the clustering results. In view of this, we devise a progressive constraint relaxation technique to remedy this drawback and improve the overall performance of clustering results. In addition to clustering on static data sets, the problem of clustering multiple data streams is also addressed in this dissertation. We devise a Clustering on Demand framework, abbreviated as COD framework, to dynamically cluster multiple data streams. The COD framework consists of two phases, i.e., the online maintaining phase and the offline clustering phase. The online maintaining phase provides an efficient mechanism to maintain the summary hierarchies of the data streams with multiple resolutions. On the other hand, an adaptive clustering algorithm is devised for the offline phase to retrieve the approximations of the desired sub-streams from the summary hierarchies according to the clustering queries. Finally, the concepts of constraints and data streams are combined and considered together. We devise a framework of Constrained Clustering for the Evolving Data Stream, abbreviated as CCDS framework, to cluster the data stream under the pairwise range constraint. Two phases are designed to maintain the data points and to generate clusters respectively. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34742 |
全文授權: | 有償授權 |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-95-1.pdf 目前未授權公開取用 | 1.15 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。