請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77476
標題: | 結合費雪線性判別分析之分類樹理論及應用研究 Researches and Applications of Integrating FLD and Classification Trees |
作者: | Li-Cheng Hsieh 謝立成 |
指導教授: | 陳正剛(Argon Chen) |
關鍵字: | 分類樹,多層判別分析,Gini Index,Wilk’s Lambda檢定,Gini Ratio, Classification tree,Multi-layer Classifier,Wilks’ Lambda Test,Gini Ratio, |
出版年 : | 2018 |
學位: | 碩士 |
摘要: | 分類樹(Classification Tree)在資料探勘以及機器學習領域上被廣泛使用來探討資料的分類,CART分類樹 (Classification and Regression Tree) 為分類樹中最常見的演算法,利用資料屬性條件二元分割資料,遞迴建構分類樹直至終止條件,透過末枝葉節點 (leaf node)進行資料分類。多層判別分析(Multi-layer Classifier, MLC)為另一種分類樹,透過二元或三元分割資料建構分類樹,於每層分割節點中擇一判定為未分類資料,再由此未分類之節點繼續分割直至停止條件。費雪線性判別分析 (Fisher Linear Discriminant, FLD) 則為一常見之屬性線性組合分類法,將資料屬性進行線性組合以最大化其組間變異並最小化組內變異,將資料從多維屬性空間投影至一維空間進行資料分類。
為探討CART分類樹和多層判別分析的分類表現,學者賴 (2010)建構二屬性二類別標竿資料進行兩種分類樹的性質探討及最佳分類結構之比較,此標竿資料會隨著其參數值增加而改變標竿資料之分佈情形,以探討兩種分類樹於不同資料分佈下之性質。透過標竿資料之分析,現行之CART分類樹與多層判別分析所建立之分類樹模型,會在不同的資料分佈情形下各有其不同效率之分類模型,因此可以證明兩種方法可互補彼此不足之處,賴 (2010)也據此提出混合兩種方法之分類樹演算法。 然賴 (2010)所提出之標竿資料型態為一階梯資料型態,僅適合探討經分割資料所建構之分類樹性質,並不適合屬性線性組合之FLD分類效能探討,因此本研究提出另一二類別二屬性資料標竿,可同時用以探討資料分割分類樹及屬性線性組合判別分析之吉尼係數(Gini index)下之分類表現及相關性質。 依據標竿資料之探討,本研究提出利用吉尼係數比值(Gini Ratio)來結合費雪線性判別分析及資料分割之分類樹演算法,嘗試結合CART分類樹、多層判別分析以及線性判別分析進行屬性選擇與節點分割。此演算法於各個分割階段利用Gini Ratio比較不同屬性個數之資料分割及線性組合判別分析之分類效能,以決定是否分割資料或進行屬性線性組合,並透過Wilk’s Lambda之檢定決定是否需展開新的一層。 研究最後使用乳房腫瘤良惡性分類的實例進行驗證,可以發現所提出之分類模型經交叉驗證之結果優於其他分類方法。 Classification Tree is widely used in data analysis and machine learning. CART (Classification and Regression Tree) is one of the most popular algorithms in classification trees. CART recursively constructs the tree until terminating condition is met, and classifies the data through the leaf nodes. Multi-layer Classifier (MLC) is constructed by splitting data into binary or ternary nodes. One split node in each layer is selected as unclassified node, with which data is continued to split until terminating condition is met. Fisher Linear Discriminant (FLD) is a commonly used method to find a linear combination of features separating two classes. The linear combination of features aims to maximize the between-group variation and to minimize the within-group variation at the same time by projecting data from multi-dimensional feature space into one-dimensional space. To compare the performance of CART and MLC, Lai (2010) proposes a two-feature-two-category benchmark data, comparing tree structure and performance of CART and MLC. As the parameter of the benchmark data increases, the distribution of the benchmark data varies. From the benchmark data, the tree models of CART and MLC have different properties and performances under different benchmark data parameter settings. Lai (2010) proves that the two methods are complement with each other. In addition, Lai (2010) also proposes an enhanced tree algorithm incorporating the advantages of the two tree methods. However, the benchmark data proposed by Lai (2010) is a ladder-shape data distribution, which is only suitable for comparing tree performances constructed by trees splitting data one feature at a time, and is not suitable for linear combination of features such as FLD. Thus, this study proposes another two-feature-two-category benchmark data which can be used to compare the performance of tree-like classifiers and linear determinants. In this study, Gini index is used for the splitting criteria and performance measurement . Based on the testing results of the proposed benchmark data, this study proposes a criterion, namely, Gini Ratio, to combine FLD with the classification tree algorithms. The proposed algorithm uses the Gini Ratio to compare the contribution of the linearly combined features by FLD and the sequential feature splitting by tree classifiers to reduction of the Gini index and select the most efficient classification method for each node with the Wilk's Lambda testing for the stopping criterion. . To test the performance of the integrated algorithm, the study uses a real breast cancer data set to verify the diagnosis performance of the proposed classifiation model compared to conventional tree models and FLD models. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/77476 |
DOI: | 10.6342/NTU201803893 |
全文授權: | 未授權 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-R05546022-1.pdf 目前未授權公開取用 | 6.04 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。