多變量分類樹之建構與應用

Wei-Ting Yang; 楊惟婷

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/23021

標題:	多變量分類樹之建構與應用 Construction of Multivariate Classification Trees and Its Applications
作者:	Wei-Ting Yang 楊惟婷
指導教授:	陳正剛
關鍵字:	分類樹,費雪判別,馬氏距離,多變量,屬性選擇, Classification tree,Mahalanobis distance,Multivariate,Attribute selection,
出版年 :	2009
學位:	碩士
摘要:	分類樹(Classification tree)是一種常用於資料探勘的分類方法，透過一連串選擇適當的屬性(attribute)並將資料作分割，已達到分類的結果。但持續對資料做分割會造成樣本數迅速減少，將造成分類樹下層的估計較為不可靠。另外，當反應變數(response)與數個屬性之間呈現線性關係時，傳統分類樹也無法提供有效的分類結果。另一種常用於多變量分析的分類方法為費雪判別(Fisher's Linear Discriminant)，此方法尋找屬性之間的最佳線性組合，已達到能將各類別作最適當的分類，但此方法無法適用於資料非線性關係。為解決上述所提兩種分類方法的缺失，本研究提出一個新的分類方法－多變量分類樹 (Multivariate classification tree)。此方法因應不同的資料結構，選擇適當的分類方式。當資料屬線性關係時，選擇一組屬性的線性組合做分割，此時不僅能對資料做更精確的描述，並避免傳統方法因多次分割所造成樣本數銳減的問題。若資料非屬線性關係，則選擇傳統以單一屬性做分割的分類方式。本研究所提出的多變量分類樹中，包含一個選擇適當屬性的方法，以及單一屬性及多個屬性的衡量比較。另外，本研究導入費雪判別及馬氏距離(Mahalanobis distance)的概念，同時考慮反應變數及屬性的分布情況，以選擇最適當的決策條件(conditional clause)。為驗證本研究所提出之多變量分類樹，透過模擬產生的資料，與傳統的分類方法比較。證明此方法能有效的處理各種結構的資料，並得到準確的結果。 Classification tree is a very common technique in data mining. It is built through selecting the appropriate attribute and sequentially splitting the sample into subsets. However, the sample size reduces sharply after few levels of splitting, and results in unreliable prediction. In addition, the classification tree could not provide accurate result efficiently for data with multivariate structure. Therefore, we propose a multivariate classification tree method to deal with different kinds of data structures. The objective is to choose the appropriate conditional clause that can capture the data character well. The proposed tree will employ a linear combination of multiple attributes if needed to avoid unnecessary sample size reduction and to obtain a more accurate tree model. To build the multivariate tree, we propose a systematic methodology to select the relevant attributes and to evaluate, compare and select the univariate model and multivariate model. In addition, we incorporate the idea of Fisher’s linear discriminant and Mahalanobis distance so that the conditional clause will take into accounts the data distributions of both the response and the attributes. To validate the proposed method, we compare that with other classification methods via simulated data and the real cases. It is shown that the new method can capture different data structures with acceptable accuracy.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/23021
全文授權:	未授權
顯示於系所單位：	工業工程學研究所

文件中的檔案：

沒有與此文件相關的檔案。

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。