混合資料集之分層組合預測模型

Yu-Hsin Chang; 張鈺欣

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76884

Title:	混合資料集之分層組合預測模型 Using Partial Combination Prediction Models for Mixed Datasets
Authors:	Yu-Hsin Chang 張鈺欣
Advisor:	吳政鴻(Cheng-Hung Wu)
Keyword:	分層方法,分層分群,預測方法,機器學習, Hierarchical method,Hierarchical Clustering,Prediction methods,Regression analysis,Manufacturing,Expert systems,
Publication Year :	2020
Degree:	碩士
Abstract:	本研究開發有效率的可解釋自動化分層預測方法，當多個類別型變數及數值型變數間有高維度且非線性的交互作用時，此一預測方法可對單一類別變數分層進行全自動且有效的分層建模、選模、及預測，不僅兼具準確度更能深入剖析資料探討其模型可解釋性，並探討資料集大小變化對選模預測效能影響。在工業與商業應用領域的混合數據集中，處裡類別變數和數值變數之間具有複雜交互作用的是常見的問題，以製造系統為例，系統中的生產率同時受到不同的類別變數與和數值變數共同影響，例如不同機器類型與產品類別屬性及其數值變數間之複雜有交互作用，進而同時影響生產率。本研究運用機器學習方法提高具有複雜交互作用混合資料集的預測效能並透過選模方法探討模型的透明度與可解釋性。與過往的分層回歸或聚類方法相比，本研究所需更少的訓練資料和計算成本。將資料集劃分為具有不同類別屬性組合的資料子集，並生成多個預測模型，運用訓練和驗證資料集指標作為選擇預測模型依據，並以一階段和兩階段模型選擇方法選出最穩健的預測模型，此外透過選模結果探討模型解釋性並研究資料集大小變化對選模預測效能影響。數值分析結果顯示，在半導體製封測的混合資料集中，與回歸模型相比，運用分層組合的模型選擇方法可以觀察到均方根誤差值降低30％以上，交叉驗證測試結果表示，與優化超參數的XGBoost模型相比，其預測準確性提高7.5％。此外，提出的模型選擇方法與其他回歸或ML預測方法相互兼容，可用於提高預測混合資料集現有方法的模型透明度與可解釋性。 Mixed Datasets with complex interactions between categorical and numerical attributes are common in engineering and business applications. For example, production rates in manufacturing systems are jointly influenced by several categorical and numerical attributes, such as machine and product types and their numerical attributes. This study aims to improve the prediction performance and transparency of mixed datasets with complex interactions using machine learning (ML) methods. The proposed method requires lesser data and computational effort than existing hierarchical or clustering regression methods. Multiple prediction models can be generated by partitioning a dataset into subsets with different categorical attribution combinations. One- and two-stage model selection methods are proposed to use the training and validation datasets in selecting better models among all the prediction models. Numerical results demonstrate the potential of the model selection approach in a mixed dataset from a semiconductor manufacturer. In comparison with regression models, more than 30% reduction in root mean square error is observed using the proposed model selection approach. The cross-validation test results also demonstrated a 7.5% improvement in accuracy against the properly tuned XGBoost models. Moreover, the proposed model selection approach is compatible with other regression or ML prediction methods and can be used to improve the model’s transparency of any existing methods on mixed datasets.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76884
DOI:	10.6342/NTU202003028
Fulltext Rights:	未授權
Appears in Collections:	工業工程學研究所

Files in This Item:

File	Size	Format
U0001-1108202023464000.pdf Restricted Access	2.63 MB	Adobe PDF

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets