Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84203
Title: 發展二元變數與不平衡資料限制下之自適應加權飄移模型
On the Development of Concept Drifting Model with Adaptive Weights under the Constraints of Binary Variables and Imbalanced Data
Authors: Bo-Ru Yang
楊博儒
Advisor: 藍俊宏(Jakey Blue)
Keyword: 類別不平衡,概念飄移,二元變數,自適應隨機森林,加權漢明距離,
Class Imbalance,Concept Drift,Binary Variable,Adaptive Random Forest,Weighted Hamming Distance,
Publication Year : 2022
Degree: 碩士
Abstract: 現今資料流的取得和分析應用越來越普遍,然而資料流經常有類別不平衡(瑕疵與良品比例相差懸殊)和概念飄移(資料分布隨時間改變)的問題,而當兩者共同出現並相互影響時,會使得多數機器學習模型難以完成目標任務。且當資料的型態皆為類別變數時,如何定義適應類別不平衡和概念飄移的距離測量方式,使得分析和預測的挑戰更加困難。 本研究提出一線上學習分類架構:ARF-WRE (Adaptive Random Forest with Weighted REsampling),以ARF模型為基礎框架,考量類別型態資料的變數重要性,進行資料重採樣後再進行線上分類。ARF-WRE首先透過動態決定的加權漢明距離改變資料分布結構,以因應持續概念飄移的資料,再藉由不同重採樣技術處理類別不平衡的問題,最後並結合預警重訓練機制提升線上分類模型預測表現。 本研究使用台灣TFT-LCD製造商收集的製程事件、警報資料來預測其出貨前品質檢測結果,由於成熟製程的良率極高,因此資料的不平衡度可達1:2000;另外由於事件、警報資料並非規律發生,資料分布也因此動態改變。經實驗結果發現,本論文提出的ARF-WRE面對極端類別不平衡且概念飄移的資料集時,能夠展現有效的預測結果。此外透過不同模型架構的比較,發現ARF-WRE透過重採樣基礎讓模型能在保持好的預測表現同時還大幅提升訓練的效率,再輔以加權漢明距離和預警重訓練的機制,可以在具有資料類別不平衡和概念飄移的情境下達到最佳的模型表現。
Nowadays, data stream acquisition and analysis are becoming a fashion, but data quality/consistency is critical to analytical performance. Common issues, such as class imbalance (the ratio of non-defective units to defects is high) and concept drift (data distribution changes over time), jeopardize the resulting quality of machine learning models. Moreover, when data types are mostly categorical, it is essential and challenging to find a proper distance metric to tackle the issues of class imbalance and concept drift. This thesis proposes an online learning classification architecture: ARF-WRE (Adaptive Random Forest with Weighted REsampling), which takes ARF as a basis model. ARF-WRE aims at handling the binary variable importance and resampling techniques simultaneously. It firstly changes the data distribution through the weighted hamming distance based on the dynamic variable importance to cope with constantly drifting data. Different resampling techniques are then used to tackle the class imbalance issue. Finally, an early warning retrain mechanism is proposed to improve online classification performance. This research employs the process event log and alarm data, provided by a Taiwan TFT-LCD manufacturer, to predict its pre-shipping inspection results. Due to its high yield of matured products, the data imbalance can reach 1:2000. Moreover, the irregular occurrence of event and alarm makes data distribution change dynamically. The experimental results show that the proposed ARF-WRE improves the prediction results significantly. Through comparing different model designs, ARF-WRE further enhances the training efficiency through data resampling.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84203
DOI: 10.6342/NTU202201090
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2027-06-27
Appears in Collections:統計碩士學位學程

Files in This Item:
File SizeFormat 
U0001-2406202211354900.pdf
  Restricted Access
1.97 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved