Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85445
標題: 發展二元變數與不平衡資料限制下之水平與垂直堆疊式集成學習架構
Development of the Horizontal and Vertical Stacking Structure for the Imbalanced Data with Binary Variables
作者: Da-Rui Yen
閻大瑞
指導教授: 藍俊宏(Jakey Blue)
關鍵字: 資料不平衡,二元變數,資料重抽樣,集成學習,堆疊法,平衡級聯,良率預測,
Data Imbalance,Binary Variable,Data Resampling,Ensemble Learning,Stacking,Balance Cascade,Yield Prediction,
出版年 : 2022
學位: 碩士
摘要: 在製造業中,產品出貨前皆需要經過不同的品質測試,以確保產品品質達到驗收標準。而隨著資料科學的興起,利用產品之製造歷程資料集,建立分類模型預測產品良莠以取代傳統的品質檢測,已逐漸成為顯學。然而若是大宗量產之產品,其良率相當高,表示製造歷程資料集中瑕疵品和良品數量相差懸殊,造成極端的資料不平衡現象,使一般的分類模型預測表現大幅下降。更甚者,製造歷程資料亦多為類別或字串型態,也往往需透過變數轉換為二元變數才能分析;然而大部分分類器背後的資料假設多為連續型態,因此表現更大打折扣。 集成學習(ensemble learning)是經常用以處理不平衡資料集的方法之一,本研究中我們針對既有的集成學習演算法加以改良,提出了三個堆疊法架構(SCV-1、SCV-2、以及 SCV-3),以及兩個平衡級聯架構(BC-1 以及 BC-2)。此外,我們也建立了一套完整的不平衡資料分析流程,其中包括在資料預處理階段加入資料上抽樣(oversampling)與下抽樣(undersampling)法之組合以大幅減緩資料不平衡之情形;使用 SMBO(sequential model-based optimization)針對集成學習分類模型的眾多超參數進行最佳化;納入多通道(multi-channel)架構以降低資料重抽樣之偏差和增加分類模型多樣性等機制。 本研究以台灣面板廠所收集的製造歷程資料作為案例,由於成熟製程的良率極高,因此不平衡程度可達1:1000;而資料分析結果顯示比起僅使用單一個集成學習模型,使用 SCV-1 與 BC-2 能夠在召回率(recall)相當之前提下獲得更高的精確率(precision),SCV-2 與 SCV-3 則是能夠同時提升召回率與精確率。更高的召回率表示依靠分類模型可以辨認出更多的瑕疵面板,更高的精確率則可以同時儘可能減少需要經過老化測試複檢之面板數量。因此根據此結果,我們提出的修改版本集成學習架構以及不平衡資料分析流程,應能夠節省一部份面板老化測試之成本。
In modern manufacturing industries, products are required to undergo a sampling inspection before shipping to customers. With the rise of data science, making use of the production history data to predict the product quality through the classification models is expected to replace the traditional quality examination. However, the yield of matured products is always high, meaning the number of defects versus the number of non-defective units is extremely imbalanced. Moreover, the production history is usually recorded in categorical strings. It is necessary to encode the data into binary variables before analyzing the data. As a result, under the constraints of extreme imbalance and binary types of data, conventional classifiers/regressors cannot perform well. Ensemble learning is a popular method often used to tackle imbalanced data issues. In this thesis, we modified the existing ensemble learning algorithms and proposed three stacking schemes (SCV-1, SCV-2, and SCV-3) and two balance cascade frameworks (BC-1 and BC-2). In addition, we also set up a complete procedure for imbalanced data analysis, which includes a hybrid of data oversampling and undersampling methods in the preprocessing stage to reduce the imbalance significantly. SMBO (sequential model-based optimization) is adopted to optimize the hyperparameters in the ensemble learning classifiers. A multi-channel architecture is integrated to reduce the bias of data resampling and increase the diversity of classification models. The historical manufacturing data collected by the local panel manufacturer are analyzed as the case study. Since the product is matured and massively produced, the imbalance ratio can easily be 1: 1000. The analytical results showed that using SCV-1 and BC-2 could achieve higher precision than using only a single ensemble classifier on the premise of reaching comparable recall will do. On the other hand, SCV-2 and SCV-3 could improve both recall and precision simultaneously. The higher recall means that more defective panels can be identified by the classification model, while the higher precision can minimize the number of panels that need to be re-inspected by aging tests. Therefore, according to the results obtained in the case study, our modified version of the ensemble learning architectures and the complete procedure of imbalanced data analysis are able to save the cost of the panel aging test significantly.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85445
DOI: 10.6342/NTU202200960
全文授權: 同意授權(全球公開)
電子全文公開日期: 2022-07-19
顯示於系所單位:統計碩士學位學程

文件中的檔案:
檔案 大小格式 
U0001-1506202216005300.pdf1.43 MBAdobe PDF檢視/開啟
顯示文件完整紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved