請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90104| 標題: | 發展基於資料定義域分佈之線上學習框架以分析具概念漂移之數據 On the Development of Domain Distribution-based Online Learning Framework for Concept Drifting Data Analytics |
| 作者: | 陳奕憲 Yi-Hsien Chen |
| 指導教授: | 藍俊宏 Jakey Blue |
| 關鍵字: | 概念飄移偵測,定義域資料分布,線上學習,自適應隨機森林,集成學習,虛擬量測, Concept Drift Detection,Data Domain Distribution,Online Learning,Adaptive Random Forest,Ensemble Learning,Virtual Metrology, |
| 出版年 : | 2023 |
| 學位: | 碩士 |
| 摘要: | 由於物聯網與網通技術的發展,人們得以不間斷地持續取得來自各種感測器的資料流,例如智慧型穿戴式裝置所蒐集的人體健康資料以及生產現場的機台錯誤偵測與分類資料。資料流會因為環境的改變等因素而發生資料分布的改變,即概念飄移(concept drift),使得以數量固定、分配不變的資料集進行訓練的傳統機器學習模型無法準確進行預測。
啟發於自適應隨機森林(ARF, Adaptive Random Forest),本研究將提出基於定義域資料分佈之線上學習框架,結合本研究設計的投票法飄移警示器(VDD, Voted Drift Detector)、背景模型建立及其替換機制,透過監測輸入資料定義域的各類統計數值建立指標,並利用投票法整合各指標形成概念飄移預警機制,在定義域資料分布發生飄移警示時以本研究提出的背景模型建立機制於建立新的基學習器並同時訓練前景、背景模型,當飄移發生時,該模型依照機制以背景模型替換前景中無法針對當前概念進行預測的基學習器。 為驗證所提出之線上學習框架,本研究透過模擬產生合成資料集與半導體化學機械研磨(CMP, Chemical Mechanical Polishing)公開資料集進行案例研究。透過模擬生成已知概念飄移發生時刻與模式的合成資料集,證實VDD對資料定義域改變的概念飄移具備良好的偵測能力,並且透過實際的CMP真實資料集進行分析,結果證實利用本研究提出的線上學習框架較單一的線上學習模型具更良好且穩健的預測效能。 Due to the advancements in Internet of things (IOT) and telecommunication technologies, it is now possible to continuously collect data stream from various sensor, such as human health data collected by smart wearable device, and machine fault detection and classification (FDC) data from production sites. The data distribution of data streams can change due to factors such as environment changes, resulting in concept drift. This makes it challenging for traditional machine learning models trained on fixed and stationary datasets to make accurate prediction. Inspired by Adaptive Random Forest (ARF) algorithm, this study proposes an online learning framework that combines the Voted Drift Detector (VDD) and a background model establishment and replacement mechanism, based on domain distribution. By the statistical indicators which monitors various statistics of data domain, VDD integrates the indicators and use voting method to form a concept drift detection mechanism. When a drift warning occurs, a new background learner is built based on the background model establishment mechanism. Upon detecting a drift, the model replaces the base learner in the foreground, which cannot accurately predict under the current concept, with the corresponding background learner, as per the mechanism proposed in this study. To validate the performance of the proposed online learning framework, this study conducts case studies using a self-generated synthetic dataset and the publicly available Chemical Mechanical Polishing (CMP) dataset in semiconductor manufacturing. By using a synthetic dataset with known occurrence times and patterns of concept drift, VDD demonstrates excellent detection capabilities for concept drift related to changes in the data domain. Furthermore, through the analysis of the CMP dataset, which presents more complex data distributions and drift patterns in real-world scenarios, the results confirm that the proposed online learning framework outperforms individual online learning models in terms of prediction performance, demonstrating enhanced robustness and effectiveness. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/90104 |
| DOI: | 10.6342/NTU202303908 |
| 全文授權: | 同意授權(限校園內公開) |
| 電子全文公開日期: | 2028-08-09 |
| 顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-2.pdf 未授權公開取用 | 6.16 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
