請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98432完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 徐永豐 | zh_TW |
| dc.contributor.advisor | Yung-Fong Hsu | en |
| dc.contributor.author | 何庭妤 | zh_TW |
| dc.contributor.author | Ting-Yu Ho | en |
| dc.date.accessioned | 2025-08-14T16:05:40Z | - |
| dc.date.available | 2025-08-15 | - |
| dc.date.copyright | 2025-08-14 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-07-30 | - |
| dc.identifier.citation | Batchelder, W. H., & Anders, R. (2012). Cultural consensus theory: Comparing different concepts of cultural truth. Journal of Mathematical Psychology, 56(5), 316–332. https://doi.org/10.1016/j.jmp.2012.06.002
Batchelder, W. H., & Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making (pp. 103–112). JAI Press. Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53(1), 71–92. https://doi.org/10.1007/BF02294195 Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655 Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324 Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Chapman and Hall/ CRC. https://doi.org/10.1201/9781315139470 Chao, C., Andy, L., & Breiman, L. (2004). Using random forest to learn imbalanced data (No. 666). University of California, Berkeley. https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953 Collell, G., Prelec, D., & Patil, K. R. (2018). A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing, 275, 330–340. https://doi.org/10.1016/j.neucom.2017.08.035 Dal Pozzolo, A., Caelen, O., & Bontempi, G. (2015). When is undersampling effective in unbalanced classification tasks? In A. Appice, P. P. Rodrigues, V. Santos Costa, C. Soares, J. Gama, & A. Jorge (Eds.), Machine learning and knowledge discovery in databases (pp. 200–215). Springer. https://doi.org/10.1007/978-3-319-23528-8_13 D’Andrade, R. G. (1981). The cultural part of cognition. Cognitive Science, 5(3), 179–195. https://doi.org/10.1016/S0364-0213(81)80012-2 Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09 Drummond, C., & Holte, R. C. (2003). C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. Workshop on Learning from Imbalanced Datasets II, 11(1–8). http://www.eiti.uottawa.ca/~nat/Workshop2003/drummondc.pdf Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, 973–978. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. International Conference on Machine Learning, 96, 148–156. Grofman, B., & Owen, G. (1986). Review essay: Condorcet models, avenues for further research. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making (pp. 93–102). JAI Press. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239 Johnson, J. M., & Khoshgoftaar, T. M. (2021). Output thresholding for ensemble learners and imbalanced big data. 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), 1449–1454. https://doi.org/10.1109/ICTAI52525.2021.00230 Karabatsos, G., & Batchelder, W. H. (2003). Markov Chain estimation for test theory without an answer key. Psychometrika, 68(3), 373–389. https://doi.org/10.1007/BF02294733 Kelly, M., Longjohn, R., & Nottingham, K. (2023). The UCI machine learning repository [Data set]. University of California, Irvine. https://archive.ics.uci.edu Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22. https://www.r-project.org/doc/Rnews/Rnews_2002-3.pdf Lipton, Z. C., Elkan, C., & Naryanaswamy, B. (2014). Optimal thresholding of classifiers to maximize F1 measure. In T. Calders, F. Esposito, E. Hüllermeier, & R. Meo (Eds.), Machine learning and knowledge discovery in databases (pp. 225–239). Springer. https://doi.org/10.1007/978-3-662-44851-9_15 Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. Proceedings of the 22nd International Conference on Machine Learning, 625–632. https://doi.org/10.1145/1102351.1102430 Provost, F. (2000). Machine learning from imbalanced data sets 101 [Extended abstract]. Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets. https://cdn.aaai.org/Workshops/2000/WS-00-05/WS00-05-001.pdf Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231. https://doi.org/10.1023/A:1007601015854 Romney, A. K., & Weller, S. C. (1984). Predicting informant accuracy from patterns of recall among individuals. Social Networks, 6(1), 59–77. https://doi.org/10.1016/0378-8733(84)90004-2 Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88(2), 313–338. https://doi.org/10.1525/aa.1986.88.2.02a00020 Scrucca, L., Fraley, C., Murphy, T. B., & Raftery, A. E. (2023). Model-based clustering, classification, and density estimation using mclust in R. Chapman; Hall/CRC. https://doi.org/10.1201/9781003277965 Sheng, V. S., & Ling, C. X. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, 476–481. https://cdn.aaai.org/AAAI/2006/AAAI06-076.pdf Spelmen, V. S., & Porkodi, R. (2018). A review on handling imbalanced data. 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–11. https://doi.org/10.1109/ICCTCT.2018.8551020 Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 Wu, D. (2024). RSBID: Resampling strategies for binary imbalanced datasets [R package version 0.0.2.0000]. https://github.com/dongyuanwu/RSBID | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98432 | - |
| dc.description.abstract | 在分類任務中,不平衡資料是一個常見的挑戰。當類別分布不均時,模型通常在多數類別上有較好的預測較表現,卻難以正確識別少數類別,而這些少數類別往往是實務應用中更關注的類別。隨機森林(Random Forest, RF)透過多數決整合多棵決策樹的預測結果,以提升整體分類表現。然而,由於決策樹本身在不平衡資料上容易傾向多數類別,RF 的多數決機制也會延續此傾向。廣義 Condorcet 模型(General Condorcet Model, GCM) 是 1980 年代中期由 Batchelder 等人提出的一種資料整合模型。相較於 RF 採用的多數決策略,GCM 進一步考慮了決策樹的回答偏誤(如 RF 預測傾向多數類別)以及能力。因此,本研究將 RF 中的多數決整合步驟替換為 GCM,期望能改善 RF 在不平衡資料上的表現。此外,從分類錯誤成本不對稱的觀點來看,閾值移動(調整模型預測機率的分類門檻)是一種直接對應該問題的方法,而我們觀察到在合理限制下的GCM可視為閾值移動。本研究比較了 GCM、幾種閾值移動方法(如移動至先驗機率、基於模型表現動態調整),以及主流的重新平衡方法(如合成少數類別過採樣技術,Synthetic Minority Over-sampling Technique, SMOTE,以及 Balanced Random Forest, BRF)。結果顯示,各方法在不同評估指標上展現不同優勢,移動至先驗機率雖在 G-mean 上表現最佳,但其 F1分數表現最差;SMOTE 則呈現相反趨勢;而 GCM 和 BRF 則在 G-mean 與 F1 分數間取得較佳的平衡。其中 BRF 整體較平均,GCM 則有較高 G-mean,適合對敏感度要求較高、但不希望過度犧牲精確率的情境。 | zh_TW |
| dc.description.abstract | Class imbalance is a common challenge in classification tasks. When class distributions are skewed, models usually perform better on the majority class while struggling to identify the minority class, which is often of greater interest in real-world applications. Random Forest (RF), which aggregates the predictions of multiple decision trees through majority rule, aims to enhance overall classification performance. However, since individual decision trees tend to be biased towards the majority class in imbalanced datasets, the majority rule of RF maintains this bias. The General Condorcet Model (GCM), developed by Batchelder et al. in the mid-1980s, is an information pooling model. Compared with majority rule, the GCM takes into account response bias (e.g., tendency toward the majority class) and competency. This study replaces the majority-rule step in RF with the GCM, aiming to improve RF's performance on imbalanced datasets. Moreover, from a cost-sensitive perspective, threshold-moving is a direct and intuitive approach that involves adjusting the decision criterion. We observed that under reasonable restrictions, the GCM can be interpreted as a form of threshold-moving. This study compares the GCM with several threshold-moving techniques (e.g., prior-based and performance-based adjustments) and popular rebalancing methods (e.g., Synthetic Minority Over-sampling Technique, SMOTE, and Balanced Random Forest, BRF). Results indicate that these methods exhibit varying strengths across different evaluation metrics. While the prior-based approach achieves the highest G-mean, it yields the worst F1 score; SMOTE shows the opposite pattern. Both GCM and BRF offer a better trade-off between G-mean and F1 score. Among them, BRF performs more evenly across metrics, while GCM has a higher G-mean. Thus, GCM may be suitable for applications that require high sensitivity without overly compromising precision. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:05:40Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-14T16:05:40Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書 i
誌謝 ii 摘要 iii Abstract v 目次 vii 圖次 ix 表次 x 第一章 緒論 1 第二章 文獻回顧 7 第一節RF 以及它在不平衡資料上的預測偏向 . . . . . . . . . . . . . . . . 7 第二節GCM 的結構和估計 . . . . . . . . . . . . . . . . . . . . . . . . . . 9 第三節過去對於不平衡資料的處理方法 . . . . . . . . . . . . . . . . . . . 15 第三章 模擬實驗 21 第一節資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 第二節評估指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 第三節實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 第四章 結果 26 第一節可靠度圖(reliability plot) . . . . . . . . . . . . . . . . . . . . . . 26 第二節閾值移動方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 第三節閾值移動以及重新平衡方法比較 . . . . . . . . . . . . . . . . . . . 31 第五章 討論 35 參考文獻 38 附錄 42 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 廣義 Condorcet 模型 | zh_TW |
| dc.subject | 類別不平衡 | zh_TW |
| dc.subject | 整合策略 | zh_TW |
| dc.subject | 閾值移動 | zh_TW |
| dc.subject | 隨機森林 | zh_TW |
| dc.subject | Random Forest | en |
| dc.subject | threshold-moving | en |
| dc.subject | aggregation strategy | en |
| dc.subject | General Condorcet Model | en |
| dc.subject | class imbalance | en |
| dc.title | 引入廣義 Condorcet 模型以提升隨機森林在不平衡資料上的表現 | zh_TW |
| dc.title | Incorporating the General Condorcet Model to Improve Random Forest Performance on Imbalanced Data | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 蔡政安;黃從仁 | zh_TW |
| dc.contributor.oralexamcommittee | Chen-An Tsai;Tsung-Ren Huang | en |
| dc.subject.keyword | 類別不平衡,廣義 Condorcet 模型,隨機森林,閾值移動,整合策略, | zh_TW |
| dc.subject.keyword | class imbalance,General Condorcet Model,Random Forest,threshold-moving,aggregation strategy, | en |
| dc.relation.page | 42 | - |
| dc.identifier.doi | 10.6342/NTU202502381 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2025-07-31 | - |
| dc.contributor.author-college | 理學院 | - |
| dc.contributor.author-dept | 心理學系 | - |
| dc.date.embargo-lift | 2030-07-27 | - |
| 顯示於系所單位: | 心理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 2.08 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
