請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77716
標題: | CART分類樹及多層判別分析理論研究與改進 Theories and Enhancement of CART and Multi-layer Classifier |
作者: | Yu-Wei Liu 劉有為 |
指導教授: | 陳正剛 |
關鍵字: | 分類樹,多層判別分析,Gini Index,Youden’s Index,Wilk’s Lambda檢定, Classification Tree,Multilayer Classifier,Gini Index,Youden’s Index, |
出版年 : | 2017 |
學位: | 碩士 |
摘要: | CART 為一分類樹中最常見的演算法,其中每一層分類階段都有兩個子節點,透過序列式的二元分割建構整棵樹。多層判別分析為另一種分類樹,有別於CART,多層判別分析每一層可能有二或三個節點,其中一節點為未分類資料,再由這一個未分類節點的資料,繼續利用分類效能最佳的屬性分割,直到到達停止條件。CART和多層判別分析於屬性挑選和變數選擇的方法上,可以透過Gini Index或Youden’s index等方法,選擇每ㄧ個節點所要分割的屬性,以及每一個節點如何去切割。然而,基於CART與多層判別分析演算法的特性,在某些資料型態下,使用Gini Index或Youden’s Index進行切點和變數的選擇皆有其不足的地方。
本研究透過一個二類別兩屬性資料(二維標竿資料)和一個二類別三屬性資料(三維標竿資料),探討CART、多層判別分析使用Gini index及Youden’s Index於變數挑選上之表現。我們證明,於二維標竿資料及三維標竿資料下,CART使用Youden’s Index於變數挑選上之表現會比使用Gini Index於變數挑選上之表現好,有較大的機率得到最佳的分類模型。而多層判別分析使用Gini Index於變數挑選上之表現會比使用Youden’s Index於變數挑選上之表現好,有較大的機率得到最佳的分類模型。 根據二維標竿資料及三維標竿資料理論探討之結論,本研究更進一步提出新的演算法,多層混合指標判別分析,結合Gini Index和Youden’s Index兩種不同的指標,搭配CART及多層判別分析,進行變數挑選及資料分割。本演算法於每一分割階段先藉由CART使用Youden’s Index,和多層判別分析使用Gini Index挑選出最佳變數,透過Wilk’s Lambda之檢定決定每一分類階段該使用CART分類樹搭配Youden’s Index或多層判別分析使用Gini Index之方案,評估八個分割方案,展開新的一層,如此不斷地分割,直到到達停止條件。最後我們使用腫瘤分類的實例來測試,比較多層混合指標判別分析與原始的CART 和多層判別分析的分類效能。 Classification and Regression Tree (CART) is the most commonly used classification tree algorithm consisting of a hierarchy of decision nodes. Each decision node in CART can only be split into two child-nodes. The tree construction is then conducted through sequential binary partition. An alternative classification tree, multi-layer classifier (MLC), can be built with each node split into two or three child nodes. Among the child-nodes, there is one node with unclassified data, and will be split further into next layer. The splitting continues until the stopping condition is reached. Both CART and MLC could use Gini Index or Youden's Index as the criterion to select the decision feature and to determine the cutoff point. However, it is yet to be investigated which criterion used by CART and MLC to choose features and cutoff points will achieve better performance. In this research, we first propose a two-feature classification problem with data arranged in a two-dimentional benchmark plane and then extend this benchmark plane to a benchmark cube for three-feature problem. We will discuss the performance of CART and MLC using Gini Index and Youden's Index on feature selection with data arranged in a two-dimensional benchmark data structure and a three-dimensional benchmark data structure. We will prove that CART performs better using Youden's Index, with which there is a greater probability to yield the best classification model than using Gini Index under both two-dimensional and three-dimensional benchmark data structures. We will further show that MLC has a better performance using Gini Index. According to the theoretical discussion of our studies of the criteria used by the two classifiers on the two-dimensional and three-dimensional benchmark data structure, we will then propose a novel hybrid classification algorithm switching between the two classifiers, each using the corresponding best-performance Index. The proposed algorithm first simultaneously evaluates CART using Youden's Index and MLC using Gini Index by the Wilk’s Lambda Test and chooses the best method to split a chosen feature at each stage. The classifier is then grown stage-by-stage continuously until the stop condition is reached. Finally, we use a real case to verify and compare the discriminating capabilities of the hybrid algorithm, CART and MLC. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77716 |
DOI: | 10.6342/NTU201703106 |
全文授權: | 未授權 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-106-R04546036-1.pdf 目前未授權公開取用 | 13 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。