Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84203
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor藍俊宏(Jakey Blue)
dc.contributor.authorBo-Ru Yangen
dc.contributor.author楊博儒zh_TW
dc.date.accessioned2023-03-19T22:06:15Z-
dc.date.copyright2022-07-22
dc.date.issued2022
dc.date.submitted2022-06-29
dc.identifier.citationBach, S. H., & Maloof, M. A. (2008, December). Paired learners for concept drift. In 2008 the 8th IEEE International Conference on Data Mining (pp. 23-32). IEEE. Bifet, A., & Gavalda, R. (2007, April). Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining (pp. 443-448). Society for Industrial and Applied Mathematics. Bifet, A., Holmes, G., & Pfahringer, B. (2010, September). Leveraging bagging for evolving data streams. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 135-150). Springer, Berlin, Heidelberg. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. Chen, T., & Guestrin, C. (2016, August). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). Ditzler, G., & Polikar, R. (2012). Incremental learning of concept drift from streaming imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 25(10), 2283-2301. Elwell, R., & Polikar, R. (2011). Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks, 22(10), 1517-1531. Ferreira, L. E. B., Gomes, H. M., Bifet, A., & Oliveira, L. S. (2019, July). Adaptive random forests with resampling for imbalanced data streams. In 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1-6). IEEE. Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In ICML (Vol. 96, pp. 148-156). Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189-1232. Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004, September). Learning with drift detection. In Brazilian Symposium on Artificial Intelligence (pp. 286-295). Springer, Berlin, Heidelberg. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 1-37. Gomes, H. M., Bifet, A., Read, J., Barddal, J. P., Enembreck, F., Pfharinger, B., Holmes, G., & Abdessalem, T. (2017). Adaptive random forests for evolving data stream classification. Machine Learning, 106(9), 1469-1495. Han, H., Wang, W. Y., & Mao, B. H. (2005, August). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing (pp. 878-887). Springer, Berlin, Heidelberg. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 1322-1328). IEEE. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30. Liu, X. Y., Wu, J., & Zhou, Z. H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539-550. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346-2363. Mani, I., & Zhang, I. (2003, August). kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of Workshop on Learning from Imbalanced Datasets (Vol. 126, pp. 1-7). ICML. Nishida, K., & Yamauchi, K. (2007, October). Detecting concept drift using statistical testing. In International Conference on Discovery Science (pp. 264-269). Springer, Berlin, Heidelberg. Oza, N. C., & Russell, S. J. (2001, January). Online bagging and boosting. In International Workshop on Artificial Intelligence and Statistics (pp. 229-236). PMLR. Smith, M. R., Martinez, T., & Giraud-Carrier, C. (2014). An instance level analysis of data complexity. Machine Learning, 95(2), 225-256. Wang, H., & Abraham, Z. (2015, July). Concept drift detection for streaming data. In 2015 International Joint Conference on Neural Networks (IJCNN) (pp. 1-9). IEEE. Wang, S., Minku, L. L., Ghezzi, D., Caltabiano, D., Tino, P., & Yao, X. (2013, August). Concept drift detection for online class imbalance learning. In The 2013 International Joint Conference on Neural Networks (IJCNN) (pp. 1-10). IEEE. Wang, S., Minku, L. L., & Yao, X. (2014). Resampling-based ensemble methods for online class imbalance learning. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1356-1368. Wang, S., Minku, L. L., & Yao, X. (2018). A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 29(10), 4802-4821. Yang, B., Zhai, Y., Qu, W., & An, B. (2010, October). The problem of classification in imbalanced data sets in knowledge discovery. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010) (Vol. 9, pp. V9-658). IEEE. 黃鍾承(2021)。發展資料不平衡與類別變數限制下的生產良率分類模型。國立臺灣大學工業工程學研究所。
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84203-
dc.description.abstract現今資料流的取得和分析應用越來越普遍,然而資料流經常有類別不平衡(瑕疵與良品比例相差懸殊)和概念飄移(資料分布隨時間改變)的問題,而當兩者共同出現並相互影響時,會使得多數機器學習模型難以完成目標任務。且當資料的型態皆為類別變數時,如何定義適應類別不平衡和概念飄移的距離測量方式,使得分析和預測的挑戰更加困難。 本研究提出一線上學習分類架構:ARF-WRE (Adaptive Random Forest with Weighted REsampling),以ARF模型為基礎框架,考量類別型態資料的變數重要性,進行資料重採樣後再進行線上分類。ARF-WRE首先透過動態決定的加權漢明距離改變資料分布結構,以因應持續概念飄移的資料,再藉由不同重採樣技術處理類別不平衡的問題,最後並結合預警重訓練機制提升線上分類模型預測表現。 本研究使用台灣TFT-LCD製造商收集的製程事件、警報資料來預測其出貨前品質檢測結果,由於成熟製程的良率極高,因此資料的不平衡度可達1:2000;另外由於事件、警報資料並非規律發生,資料分布也因此動態改變。經實驗結果發現,本論文提出的ARF-WRE面對極端類別不平衡且概念飄移的資料集時,能夠展現有效的預測結果。此外透過不同模型架構的比較,發現ARF-WRE透過重採樣基礎讓模型能在保持好的預測表現同時還大幅提升訓練的效率,再輔以加權漢明距離和預警重訓練的機制,可以在具有資料類別不平衡和概念飄移的情境下達到最佳的模型表現。zh_TW
dc.description.abstractNowadays, data stream acquisition and analysis are becoming a fashion, but data quality/consistency is critical to analytical performance. Common issues, such as class imbalance (the ratio of non-defective units to defects is high) and concept drift (data distribution changes over time), jeopardize the resulting quality of machine learning models. Moreover, when data types are mostly categorical, it is essential and challenging to find a proper distance metric to tackle the issues of class imbalance and concept drift. This thesis proposes an online learning classification architecture: ARF-WRE (Adaptive Random Forest with Weighted REsampling), which takes ARF as a basis model. ARF-WRE aims at handling the binary variable importance and resampling techniques simultaneously. It firstly changes the data distribution through the weighted hamming distance based on the dynamic variable importance to cope with constantly drifting data. Different resampling techniques are then used to tackle the class imbalance issue. Finally, an early warning retrain mechanism is proposed to improve online classification performance. This research employs the process event log and alarm data, provided by a Taiwan TFT-LCD manufacturer, to predict its pre-shipping inspection results. Due to its high yield of matured products, the data imbalance can reach 1:2000. Moreover, the irregular occurrence of event and alarm makes data distribution change dynamically. The experimental results show that the proposed ARF-WRE improves the prediction results significantly. Through comparing different model designs, ARF-WRE further enhances the training efficiency through data resampling.en
dc.description.provenanceMade available in DSpace on 2023-03-19T22:06:15Z (GMT). No. of bitstreams: 1
U0001-2406202211354900.pdf: 2015404 bytes, checksum: 40677a86c2a70e18bff62f68866b82b2 (MD5)
Previous issue date: 2022
en
dc.description.tableofcontents中文摘要 ii Abstract iii 目錄 iv 圖目錄 vi 表目錄 vii 第一章 緒論 1 1.1 研究動機與目的 1 1.2 研究架構 3 第二章 文獻探討 4 2.1 類別不平衡 4 2.1.1 資料重採樣 4 2.1.2 集成學習 5 2.2 概念飄移 7 2.2.1 概念飄移來源及類型 7 2.2.2 概念飄移偵測 8 2.2.3 模型及演算法 9 2.3 結合類別不平衡與概念飄移 11 2.3.1 模型及演算法 11 2.3.2 模型評估指標 13 2.4 文獻方法歸納 15 第三章 整合加權重採樣之自適應隨機森林模型 16 3.1 加權漢明距離 17 3.2 線上資料重採樣 21 3.3 ARF-WRE 29 第四章 案例研討 33 4.1 資料集描述與特性 33 4.2 資料分析結果 35 第五章 結論 42 5.1 研究成果與貢獻 42 5.2 結論和未來研究方向 43 參考文獻 44
dc.language.isozh-TW
dc.subject概念飄移zh_TW
dc.subject自適應隨機森林zh_TW
dc.subject加權漢明距離zh_TW
dc.subject二元變數zh_TW
dc.subject類別不平衡zh_TW
dc.subjectWeighted Hamming Distanceen
dc.subjectClass Imbalanceen
dc.subjectConcept Driften
dc.subjectBinary Variableen
dc.subjectAdaptive Random Foresten
dc.title發展二元變數與不平衡資料限制下之自適應加權飄移模型zh_TW
dc.titleOn the Development of Concept Drifting Model with Adaptive Weights under the Constraints of Binary Variables and Imbalanced Dataen
dc.typeThesis
dc.date.schoolyear110-2
dc.description.degree碩士
dc.contributor.oralexamcommittee洪一薰(I-Hsuan Hong),陳家正(Chia-Cheng Chen)
dc.subject.keyword類別不平衡,概念飄移,二元變數,自適應隨機森林,加權漢明距離,zh_TW
dc.subject.keywordClass Imbalance,Concept Drift,Binary Variable,Adaptive Random Forest,Weighted Hamming Distance,en
dc.relation.page47
dc.identifier.doi10.6342/NTU202201090
dc.rights.note同意授權(限校園內公開)
dc.date.accepted2022-07-01
dc.contributor.author-college共同教育中心zh_TW
dc.contributor.author-dept統計碩士學位學程zh_TW
dc.date.embargo-lift2027-06-27-
顯示於系所單位:統計碩士學位學程

文件中的檔案:
檔案 大小格式 
U0001-2406202211354900.pdf
  未授權公開取用
1.97 MBAdobe PDF檢視/開啟
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved