Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4418
標題: 多層混合分類樹研究及其腫瘤診斷之應用
Study of Multi-layer Hybrid Classification Tree with Applications to Cancer Diagnosis
作者: Huanze Zeng
曾煥澤
指導教授: 陳正剛(Argon Chen)
關鍵字: C&ART分類樹,多層判別分析,多層混合分類樹,接收者操作特徵曲線,費雪線性分析,
Classification and regression trees,Multi-layer Classifier,Nonparametric AUC,Multi-layer Hybrid Classification Tree,Fisher discriminant analysis,
出版年 : 2015
學位: 碩士
摘要: 分類樹(Classification Tree)在資料探勘領域上被廣泛使用來探討感興趣資料的分類,並應用於醫學、工程等領域的機器學習。分類樹主要分為兩個主要的類別,即分類與迴歸樹(Classification and regression trees, C ART) 和多變量分類樹。C ART常用於建構二元分類樹,一般利用Gini index 做為分割的準則。多層判別分析有別於C ART,其每一層的待分割節點皆會分割成兩個或三個節點,允許其中一節點為未分類資料,未分類節點資料可繼續透過使用其他屬性進行分割展開新的一層,而已確定類別的節點,則不再分割。由於在醫學探勘(如腫瘤診斷)中,結合費雪線性判別分析(FLD)的分類樹模型不一定能夠有效提升分類樹的分類效能,本文嘗試構造更有效的演算法並加以實例驗證。
在模型構造中,本研究先通過引入參數 來調節費雪線性組合屬性方案的比例。同時,根據賴淑俐學者(2010)所進行的理論探討發現,多層判別分析與C ART分類樹可以互補不足之處,本研究進而通過引入參數 調整多層判別分析和C ART分類樹的相對比重。當每一個節點進入演算法中時,先通過 和多層組合屬性方案決定是否需要採用費雪線性組合屬性方案及相應的特徵數,再通過 和非參數型接受者操作特徵(NP-ROC)來決定節點和切割方案,即決定是否分割成C ART的兩個節點或多層判別分析的兩個節點或三個節點。
為了驗證此模型,本研究利用臺大醫院所提供的366筆乳房腫瘤案例來測試,其中266筆做為訓練樣本用於選擇和訓練參數,而100筆則固定作為獨立測試樣本,從而比較多層混合分類樹與C ART、多層判別分析和強化多層判別分析的單一分類樹的判別結果和多階段調適樹群(莊曙詮,2012)的BI-RADS分級結果,驗證判別模型效能。
從案例驗證的結果中,可以看出新演算法的分類效能確實優於其他方法,且能在顯著增加多階段調適樹群BIRADS 3的良性個數同時,將惡性比例維持在可接受的範圍內。
The classification decision tree is the most commonly used classification tool in data mining and machine learning in medical and engineering applications. There are mainly two types of classification trees: C ART and multivariate classification tree. The C ART is usually used and constructed by a hierarchical tree of decision nodes. The structure of the Multi-layer Classifier, proposed by Wu (2009), is differs from the C ART by constructing each layer consisting of two or three nodes, of which only the node with unclassified data will be classified further into the next layer and the rest nodes contain data completely classified. The tree construction continues until a stop criterion is reached. However, the structure of the Multi-layer Classifier or C ART combined with Fisher Linear Discriminant analysis (FLD) may not improve classification tree efficiency when it is applied to medical exploration (such as diagnosis of tumor). Hence, this thesis aims at constructing a more effective Multi-layer Hybrid Classification Tree and utilizes empirical data to validate its performance.
In the modeling of tree structure, this study first introduces a parameter, , to be used to adjust the proportion of nodes constructed by FLD. At the same time, according to the theoretical discussion by Lai (2010), the multi-layer classifier and the C ART can complement each other’s insufficiency. Therefore, this study introduces a second parameter, , to be used to adjust likelihood for each tree layer of data to be classified according to the Multi-layer or C ART decision. When a node is to be split, it needs to decide first whether to apply FLD based on the value of . Then it needs to decide whether to split into two nodes with C ART decision or three (or two) nodes with Multi-layer decision based on the value of .
In order to verify the performance of the proposed model, this study uses 366 breast cancer cases provided by National Taiwan University Hospital (NTUH) to test the proposed tree, 266 of which are taken as training samples for selection and training parameters, and the other 100 is isolated as the independent test sample. We compare this proposed Multi-layer Hybrid Classifier with C ART, Multi-Layer Classification Tree (ML-ROC), as well as Enhanced Multi-layer Classification Tree(Enhanced-ML-ROC) proposed by Lai (2010) based on results of single tree performance and BI-RADS results generated by Adaptive Multi-phase Ensemble (Chuang, 2012).
Based on the verification results, it is found that the classification efficiency of the newly proposed algorithm is indeed superior to other methods, and the BIRADS result shows that it not only increases the benign case number of BIRADS 3 by an observable size, but also maintains the number malignant cases of BIRADS 3 in an acceptable range.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/4418
全文授權: 同意授權(全球公開)
顯示於系所單位:工業工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf2.95 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved