Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 理學院
  3. 心理學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98432
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor徐永豐zh_TW
dc.contributor.advisorYung-Fong Hsuen
dc.contributor.author何庭妤zh_TW
dc.contributor.authorTing-Yu Hoen
dc.date.accessioned2025-08-14T16:05:40Z-
dc.date.available2025-08-15-
dc.date.copyright2025-08-14-
dc.date.issued2025-
dc.date.submitted2025-07-30-
dc.identifier.citationBatchelder, W. H., & Anders, R. (2012). Cultural consensus theory: Comparing different concepts of cultural truth. Journal of Mathematical Psychology, 56(5), 316–332. https://doi.org/10.1016/j.jmp.2012.06.002
Batchelder, W. H., & Romney, A. K. (1986). The statistical analysis of a general Condorcet model for dichotomous choice situations. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making (pp. 103–112). JAI Press.
Batchelder, W. H., & Romney, A. K. (1988). Test theory without an answer key. Psychometrika, 53(1), 71–92. https://doi.org/10.1007/BF02294195
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Chapman and Hall/ CRC. https://doi.org/10.1201/9781315139470
Chao, C., Andy, L., & Breiman, L. (2004). Using random forest to learn imbalanced data (No. 666). University of California, Berkeley. https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Collell, G., Prelec, D., & Patil, K. R. (2018). A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing, 275, 330–340. https://doi.org/10.1016/j.neucom.2017.08.035
Dal Pozzolo, A., Caelen, O., & Bontempi, G. (2015). When is undersampling effective in unbalanced classification tasks? In A. Appice, P. P. Rodrigues, V. Santos Costa, C. Soares, J. Gama, & A. Jorge (Eds.), Machine learning and knowledge discovery in databases (pp. 200–215). Springer. https://doi.org/10.1007/978-3-319-23528-8_13
D’Andrade, R. G. (1981). The cultural part of cognition. Cognitive Science, 5(3), 179–195. https://doi.org/10.1016/S0364-0213(81)80012-2
Denwood, M. J. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09
Drummond, C., & Holte, R. C. (2003). C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. Workshop on Learning from Imbalanced Datasets II, 11(1–8). http://www.eiti.uottawa.ca/~nat/Workshop2003/drummondc.pdf
Elkan, C. (2001). The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, 973–978.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. International Conference on Machine Learning, 96, 148–156.
Grofman, B., & Owen, G. (1986). Review essay: Condorcet models, avenues for further research. In B. Grofman & G. Owen (Eds.), Information pooling and group decision making (pp. 93–102). JAI Press.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
Johnson, J. M., & Khoshgoftaar, T. M. (2021). Output thresholding for ensemble learners and imbalanced big data. 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), 1449–1454. https://doi.org/10.1109/ICTAI52525.2021.00230
Karabatsos, G., & Batchelder, W. H. (2003). Markov Chain estimation for test theory without an answer key. Psychometrika, 68(3), 373–389. https://doi.org/10.1007/BF02294733
Kelly, M., Longjohn, R., & Nottingham, K. (2023). The UCI machine learning repository [Data set]. University of California, Irvine. https://archive.ics.uci.edu
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22. https://www.r-project.org/doc/Rnews/Rnews_2002-3.pdf
Lipton, Z. C., Elkan, C., & Naryanaswamy, B. (2014). Optimal thresholding of classifiers to maximize F1 measure. In T. Calders, F. Esposito, E. Hüllermeier, & R. Meo (Eds.), Machine learning and knowledge discovery in databases (pp. 225–239). Springer. https://doi.org/10.1007/978-3-662-44851-9_15
Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. Proceedings of the 22nd International Conference on Machine Learning, 625–632. https://doi.org/10.1145/1102351.1102430
Provost, F. (2000). Machine learning from imbalanced data sets 101 [Extended abstract]. Proceedings of the AAAI 2000 Workshop on Imbalanced Data Sets. https://cdn.aaai.org/Workshops/2000/WS-00-05/WS00-05-001.pdf
Provost, F., & Fawcett, T. (2001). Robust classification for imprecise environments. Machine Learning, 42(3), 203–231. https://doi.org/10.1023/A:1007601015854
Romney, A. K., & Weller, S. C. (1984). Predicting informant accuracy from patterns of recall among individuals. Social Networks, 6(1), 59–77. https://doi.org/10.1016/0378-8733(84)90004-2
Romney, A. K., Weller, S. C., & Batchelder, W. H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88(2), 313–338. https://doi.org/10.1525/aa.1986.88.2.02a00020
Scrucca, L., Fraley, C., Murphy, T. B., & Raftery, A. E. (2023). Model-based clustering, classification, and density estimation using mclust in R. Chapman; Hall/CRC. https://doi.org/10.1201/9781003277965
Sheng, V. S., & Ling, C. X. (2006). Thresholding for making classifiers cost-sensitive. Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, 476–481. https://cdn.aaai.org/AAAI/2006/AAAI06-076.pdf
Spelmen, V. S., & Porkodi, R. (2018). A review on handling imbalanced data. 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), 1–11. https://doi.org/10.1109/ICCTCT.2018.8551020
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1
Wu, D. (2024). RSBID: Resampling strategies for binary imbalanced datasets [R package version 0.0.2.0000]. https://github.com/dongyuanwu/RSBID
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98432-
dc.description.abstract在分類任務中,不平衡資料是一個常見的挑戰。當類別分布不均時,模型通常在多數類別上有較好的預測較表現,卻難以正確識別少數類別,而這些少數類別往往是實務應用中更關注的類別。隨機森林(Random Forest, RF)透過多數決整合多棵決策樹的預測結果,以提升整體分類表現。然而,由於決策樹本身在不平衡資料上容易傾向多數類別,RF 的多數決機制也會延續此傾向。廣義 Condorcet 模型(General Condorcet Model, GCM) 是 1980 年代中期由 Batchelder 等人提出的一種資料整合模型。相較於 RF 採用的多數決策略,GCM 進一步考慮了決策樹的回答偏誤(如 RF 預測傾向多數類別)以及能力。因此,本研究將 RF 中的多數決整合步驟替換為 GCM,期望能改善 RF 在不平衡資料上的表現。此外,從分類錯誤成本不對稱的觀點來看,閾值移動(調整模型預測機率的分類門檻)是一種直接對應該問題的方法,而我們觀察到在合理限制下的GCM可視為閾值移動。本研究比較了 GCM、幾種閾值移動方法(如移動至先驗機率、基於模型表現動態調整),以及主流的重新平衡方法(如合成少數類別過採樣技術,Synthetic Minority Over-sampling Technique, SMOTE,以及 Balanced Random Forest, BRF)。結果顯示,各方法在不同評估指標上展現不同優勢,移動至先驗機率雖在 G-mean 上表現最佳,但其 F1分數表現最差;SMOTE 則呈現相反趨勢;而 GCM 和 BRF 則在 G-mean 與 F1 分數間取得較佳的平衡。其中 BRF 整體較平均,GCM 則有較高 G-mean,適合對敏感度要求較高、但不希望過度犧牲精確率的情境。zh_TW
dc.description.abstractClass imbalance is a common challenge in classification tasks. When class distributions are skewed, models usually perform better on the majority class while struggling to identify the minority class, which is often of greater interest in real-world applications. Random Forest (RF), which aggregates the predictions of multiple decision trees through majority rule, aims to enhance overall classification performance. However, since individual decision trees tend to be biased towards the majority class in imbalanced datasets, the majority rule of RF maintains this bias. The General Condorcet Model (GCM), developed by Batchelder et al. in the mid-1980s, is an information pooling model. Compared with majority rule, the GCM takes into account response bias (e.g., tendency toward the majority class) and competency. This study replaces the majority-rule step in RF with the GCM, aiming to improve RF's performance on imbalanced datasets. Moreover, from a cost-sensitive perspective, threshold-moving is a direct and intuitive approach that involves adjusting the decision criterion. We observed that under reasonable restrictions, the GCM can be interpreted as a form of threshold-moving. This study compares the GCM with several threshold-moving techniques (e.g., prior-based and performance-based adjustments) and popular rebalancing methods (e.g., Synthetic Minority Over-sampling Technique, SMOTE, and Balanced Random Forest, BRF). Results indicate that these methods exhibit varying strengths across different evaluation metrics. While the prior-based approach achieves the highest G-mean, it yields the worst F1 score; SMOTE shows the opposite pattern. Both GCM and BRF offer a better trade-off between G-mean and F1 score. Among them, BRF performs more evenly across metrics, while GCM has a higher G-mean. Thus, GCM may be suitable for applications that require high sensitivity without overly compromising precision.en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-14T16:05:40Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-14T16:05:40Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract v
目次 vii
圖次 ix
表次 x
第一章 緒論 1
第二章 文獻回顧 7
第一節RF 以及它在不平衡資料上的預測偏向 . . . . . . . . . . . . . . . . 7
第二節GCM 的結構和估計 . . . . . . . . . . . . . . . . . . . . . . . . . . 9
第三節過去對於不平衡資料的處理方法 . . . . . . . . . . . . . . . . . . . 15
第三章 模擬實驗 21
第一節資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
第二節評估指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
第三節實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
第四章 結果 26
第一節可靠度圖(reliability plot) . . . . . . . . . . . . . . . . . . . . . . 26
第二節閾值移動方法比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
第三節閾值移動以及重新平衡方法比較 . . . . . . . . . . . . . . . . . . . 31
第五章 討論 35
參考文獻 38
附錄 42
-
dc.language.isozh_TW-
dc.subject廣義 Condorcet 模型zh_TW
dc.subject類別不平衡zh_TW
dc.subject整合策略zh_TW
dc.subject閾值移動zh_TW
dc.subject隨機森林zh_TW
dc.subjectRandom Foresten
dc.subjectthreshold-movingen
dc.subjectaggregation strategyen
dc.subjectGeneral Condorcet Modelen
dc.subjectclass imbalanceen
dc.title引入廣義 Condorcet 模型以提升隨機森林在不平衡資料上的表現zh_TW
dc.titleIncorporating the General Condorcet Model to Improve Random Forest Performance on Imbalanced Dataen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee蔡政安;黃從仁zh_TW
dc.contributor.oralexamcommitteeChen-An Tsai;Tsung-Ren Huangen
dc.subject.keyword類別不平衡,廣義 Condorcet 模型,隨機森林,閾值移動,整合策略,zh_TW
dc.subject.keywordclass imbalance,General Condorcet Model,Random Forest,threshold-moving,aggregation strategy,en
dc.relation.page42-
dc.identifier.doi10.6342/NTU202502381-
dc.rights.note同意授權(限校園內公開)-
dc.date.accepted2025-07-31-
dc.contributor.author-college理學院-
dc.contributor.author-dept心理學系-
dc.date.embargo-lift2030-07-27-
顯示於系所單位:心理學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
2.08 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved