請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17364
標題: | 利用接收者操作特徵曲線建構分類樹之研究與應用 Construction and Applications of Classification Trees with Receiver Operating Characteristic Curve |
作者: | Kang-Heng Ma 馬康恆 |
指導教授: | 陳正剛 |
關鍵字: | 分類樹,接收者操作特徵曲線,部分線下面積, classification tree,CART,ROC curve, |
出版年 : | 2013 |
學位: | 碩士 |
摘要: | CART是最常見的分類樹,其中的每個節點都有兩個子節點。另外一種分類樹多層判別分析有別於CART,每一層可能有兩個或三個節點,其中一節點為未分類資料,再由這一個未分類節點的資料繼續利用其他屬性分割展開新的一層。通常這些分類樹使用Gini Index作為分割的準則,然而在某些特定資料類型,Gini Index無法有效率地做分類。此外,傳統分類樹在找尋屬性時,沒有考慮屬性分辨單一類別的能力,而是考慮同時分辨出兩類別的能力,因此常錯失對分類有幫助的屬性。
在本研究裡,我們先在理論探討中比證明Gini Index在特定資料時,會選擇不合理的切點,進而提出利用Gini Index與Youden’s Index在一個屬性中找兩個切點的方法。在每一個節點進入演算法時,本模型利用接收者操作特徵曲線的部分線下面積(partial area under curve,簡稱PAUC)的統計檢定結果找出適合的屬性,並決定該節點是否有分割的必要。若需再分割,則再次利用接收者操作特徵曲線的部分線下面積判斷此節點該分割為兩個或是三個子節點,最後用Gini Index或Youden’s Index找出分割的切點。 為了驗證此模型,我們利用了數個模擬案例與腫瘤分類的實例來測試,比較新判別模型與原始的CART和多層判別分析的判別結果,驗證此判別模型效能。從案例驗證的結果,可以看出利用接收者操作特徵曲線的部分線下面積之分類模型可以較有效率的分類資料。 The Classification and Regression Tree (CART) is the most commonly used classification tree consisting of a hierarchy of decision nodes. Each decision node in CART can only be split into two child-nodes. To construct a more effective tree, an alternative classification tree called multi-layer classifier can be built with each node split into up to three child-nodes. Among the child-nodes, one is called undetermined node with instances clearly classified. The tree is then further grown by splitting the undetermined node into a new layer of two or three nodes until a stop criterion is reached. Both CART and multi-layer classifier use the Gini Index as the criterion for cutoff point and attribute selection. However, for certain types of classification problems, the Gini Index appears to be inefficient and thus results in falsely identified attributes. In this research, we will first discuss and prove theoretically the weakness of the Gini index. We will then propose a method using the Youden’s index as the criterion. In the proposed algorithm, when a node is to be split, one feature is selected by comparing the test results of partial areas under receiver operating characteristic curve(PAUC). After a feature is picked, the algorithm will also use the PAUC to determine the number of child-nodes required and the corresponding cutoff point(s) by comparing the values of the Youden’s index. The test results of PAUC will also determine whether the tree construction is to be terminated. Simulated and actual cases are used to demonstrate and verify the proposed method and its superior discriminating capability over the original CART and the multi-layer classifier. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/17364 |
全文授權: | 未授權 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 6.06 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。