利用函數型主成分計分及平均曲線對函數型資料進行k均值法分群之探討

Chia-Tung Chiang; 江家彤

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24253

標題:	利用函數型主成分計分及平均曲線對函數型資料進行k均值法分群之探討 Study effectiveness of k-means clustering of functional data: functional principal component scores feature and mean curve feature
作者:	Chia-Tung Chiang 江家彤
指導教授:	陳宏(Hung Chen)
關鍵字:	函數型主成分分析,k均值法分群, functional principal component,k-means,
出版年 :	2011
學位:	碩士
摘要:	集群分析旨在將資料分為數個相異性較大的群組，使組內的相似程度高，是分析高維度資料及大型資料庫的重要資料探勘工具之一；藉著經集群分析後的資料，可更容易的探索組內成員和有興趣的變量之間的關係。應用集群分析於高維度資料前，往往會先降低資料的維度，而以不同觀點去做資料降維，可能會使得到的結論有所不同。本論文的研究主題為探討對函數型資料(functional data)之觀測對象分群的問題，在文獻中(Abraham, 2003)，從平均函數(mean function)角度出發對資料做降維，再以k均值法對降維後的資料做分群。在2008年Peng 和 Muller的文章中，在所有的曲線有相同的平均函數的假設之下，利用有限維度的函數型主成份分數 (functional principal component scores) 之分佈來探查資料的分群。然而，無論是以平均函數或是共變異數函數 (covariance function)為出發點對資料做降維，所得到的群集都反映出平均函數的特性。這個現象引發了我們試圖針對這兩個方法的效用提出一套理論分析。在本文中，我們將提出說明在某些狀況下，從共變異數函數為出發點將會降低分群品質之效力。在2007年Chiou和Li的文章中提出一套以疊代重分群為主的分群演算法，在初步分群方面，主要是利用有限維度的函數型主成份分數之分佈來探查資料在平均結構上的初步分群。依據我們的推論，我們建議在初步分群中，應從平均函數的角度來探查資料的分群。 Organizing functional data into sensible groupings is one of the most fundamental modes of understanding and learning the underlying mechanism generating functional data. Clustering analysis is often employed to search for homogeneous subgroups of individuals in a data set. In Abraham et al. (2003, Scandinavian Journal of Statistics), they start with feature extraction on the mean function and use k-means clustering procedure to determine the clusters. In Peng and Muller (2008, Annals of Applied Statistics), they assume common mean function for all units and start with feature extraction on the covariance function. However, the clusters found by $k$-means clustering procedure can be explained through the characteristics of mean function of each unit. This motivates a theoretical study on comparing the utilities of these two approaches under the settings of densely observed functional data. We will only present the case that the size of clusters is two only. We will present analysis on the lose of efficiency with feature extraction on the covariance function. In Chiou and Li (2007, Journal of the Royal Statistical Society, Series B), they proposed an iterative functional clustering algorithm which apply the method used in Peng and Muller to the initial clustering stage. We advocate to use the mean function in the initial stage. An analysis is provided to support this recommendation.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/24253
全文授權:	未授權
顯示於系所單位：	數學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 未授權公開取用	1.57 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。