請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63986完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳正剛(Argon Chen) | |
| dc.contributor.author | Po-Hsun Wang | en |
| dc.contributor.author | 王柏勛 | zh_TW |
| dc.date.accessioned | 2021-06-16T17:25:22Z | - |
| dc.date.available | 2015-08-18 | |
| dc.date.copyright | 2012-08-18 | |
| dc.date.issued | 2012 | |
| dc.date.submitted | 2012-08-15 | |
| dc.identifier.citation | Breiman, L. (1984). Classification and regression trees, Chapman & Hall/CRC.
Breiman, L. (1996). 'Bagging predictors.' Machine learning 24(2): 123-140. Cochran, W. G. (2007). Sampling techniques, Wiley-India. Efron, B. (1979). 'Bootstrap methods: another look at the jackknife.' The annals of Statistics 7(1): 1-26. Efron, B. and G. Gong (1983). 'A leisurely look at the bootstrap, the jackknife, and cross-validation.' American Statistician: 36-48. Hosmer, D. W. and S. Lemeshow (2000). Applied logistic regression, Wiley-Interscience. Kleinbaum, D. G. and M. Klein (2010). 'Maximum likelihood techniques: An overview.' Logistic regression: 103-127. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/63986 | - |
| dc.description.abstract | 一般認為 CART 傳統分類樹可以有效率地分類某些特定資料類型,實際上因其切割準則演算法的計算方式不同,所分類出來的子結點有所不同,當使用較不佳的切割準則時並非一直能有效率地做分類。導致後續子結點的分割效果累積越來越差。
本研究使用常見的Variation reduction方法,並且在做搜尋切點動作之前先對原來樣本以本研究提出之方法先將資料分層在對每層內以bootstrap method得到多組新樣本,再對多組新樣本分別搜尋切點,以此切點的平均做為分類的切點準則。藉由bootstrap method可以彌補切點附近資訊量不足的狀況,且由於多組樣本求切點結果所得到的新切點,可以使得切點的選擇更加穩定,減少錯誤分類的狀況。 但根據本研究之模擬案例探討發現,在不同的資料分布狀況下,切點的表現狀況不同且有互補的狀況,並且發現主要差異來自於切點附近資料的分布疏密狀況與其他之間的比例關係。因此提出了判別的方式以決定在何種狀況下使用原來切點或以bootstrap method再抽樣後的新切點做為最適切點,以此方法建構規則做為兩切點的權重得到一個以兩者加權後結果的新切點,此切點可應用於各種資料分布狀況下,並且相較下都為變異最低最為穩定的切點選擇。經過統計檢定後證實是有顯著的優於原來的兩種切點結果,在有準確且穩定的切點選擇下,對於分類樹的效率將會有效的提升,並且使每一次子結點的切割更加有效。 | zh_TW |
| dc.description.abstract | Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The unsuitable split selected will result in many problems such as sample size depletion and over fitting. Without enough sample size, split in the lower hierarchical levels becomes incorrect selection of attributes extremely unreliable.
In order to improve the CART performance, we use the Variation Reduction criterion to select the split of a node that splits a node into two child nodes in the next layer. In this research, we propose a new method to improve the split selection. We use stratified sampling to stratify data into multiple sub-sample and use bootstrap method to re-sampling incidences in each sub-sample. The splits are then selected by the variation reduction criterion. Finally, we calculate the mean of each split of bootstrap sample as the “stratified bootstrap split” . The stratified bootstrap splits can improve the variability of splits for certain types of sample distribution and obtain a more stable split to avoid incorrect splits and attribute selection. According to the simulation results in this research, the densities of sample distribution is the most important factor that affects the “Original split” and “Stratified Bootstrap split” performance. We propose a “Weighted split” to integrate the original CART split and the proposed “Stratified Bootstrap split”. It is shown that the weighted split is robust and thus avoid incorrect split and selection of attributes. Though out this thesis, examples are use to illustrate the proposed method. Finally, a hypothetic tree is used to demonstrate how the performance of CART can be improved by the proposed weighted split. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T17:25:22Z (GMT). No. of bitstreams: 1 ntu-101-R99546028-1.pdf: 2298984 bytes, checksum: 75e333d02e72940970eda6757dfca4a1 (MD5) Previous issue date: 2012 | en |
| dc.description.tableofcontents | 口試委員會審定書 #
誌謝 i 中文摘要 ii Abstract iii 目錄 iv 圖目錄 vi 表目錄 vii 第1章 緒論 1 1.1 研究背景 1 1.2 研究動機與研究目標 2 1.3 論文架構 2 第2章 文獻探討 3 2.1 CART分類樹 3 2.2 Bootstrap抽樣法 5 2.3 分層抽樣(stratified sampling) 6 第3章 以分層拔靴抽樣於分類樹尋找最適切點辦法 7 3.1 以分層拔靴抽樣法應用於分類樹 8 3.2 起始切點及分層拔靴抽樣切點比較 15 3.3 平均數差異及疏密程度計算 20 3.3.1 平均數差異C 20 3.3.2 疏密程度指標 21 3.4 切點差異程度預測 23 3.5 最適切點選擇之完整流程 32 第4章 切點選擇優劣之理論探討 36 4.1 模擬資料特性及基本假設 36 4.2 三種切點分析比較 40 4.2.1 當C<0.3(共96案例): 40 4.2.2 0.3≤C≤0.5(共144案例): 41 4.2.3 C>0.5(共432案例): 52 4.3 Original Split與Stratified Bootstrap Split於CART分類樹下表現 64 第5章 結論與未來研究建議 67 參考文獻 68 | |
| dc.language.iso | zh-TW | |
| dc.subject | 分類樹 | zh_TW |
| dc.subject | 資料分布疏密程度 | zh_TW |
| dc.subject | 分層抽樣 | zh_TW |
| dc.subject | bootstrap method | zh_TW |
| dc.subject | Bootstrap method | en |
| dc.subject | Stratified sampling | en |
| dc.subject | Density of sample distribution | en |
| dc.subject | Classification and Regression Trees | en |
| dc.title | 以分層拔靴抽樣法改善分類樹之切割能力 | zh_TW |
| dc.title | Improving Selected split by Stratified Bootstrap Methods | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 100-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 汪上曉(David Shan-Hill Wong),桑慧敏(Wheyming Tina Song),鄭順林(Shuen-Lin Jeng),蔡雅蓉(Ya-Jung Tsai) | |
| dc.subject.keyword | 分類樹,資料分布疏密程度,分層抽樣,bootstrap method, | zh_TW |
| dc.subject.keyword | Classification and Regression Trees,Density of sample distribution,Stratified sampling,Bootstrap method, | en |
| dc.relation.page | 68 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2012-08-16 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 工業工程學研究所 | zh_TW |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-101-1.pdf 未授權公開取用 | 2.25 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
