請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91250
標題: | 應用於混合資料集的階層式層內聚合預測方法 Hierarchical Level-Aggregating Prediction Method for Mixed Datasets |
作者: | 張鈺鑫 Yu-Hsin Chang |
指導教授: | 吳政鴻 Cheng-Hung Wu |
關鍵字: | 分群,預測模型,集成學習,機器學習,類別屬性值合併, mixed data,prediction model,ensemble learning,machine learning,aggregate attributes of categorical variables, |
出版年 : | 2023 |
學位: | 碩士 |
摘要: | 在現實世界中取得的資料日漸複雜,不再只是擁有單一型態的特徵值,而是同時存在類別與數值屬性的混合型資料,這兩種屬性之間經常具有複雜的交互作用。在工業上,製造業方面的資料除了特徵有混合性,因應客戶需求產生了少量多樣 (High-Mixed-Low-Volume)的特性,使得計算量變得更加龐大。
就紡織業而言,染布的實際加工時間會共同受到類別與數值變數不同程度的影響,像是色系、布料、染缸類別、布匹數量等高維度非線性的組合,導致實際染色加工時間的預測更困難,使後續指定機台排程的規劃不易而難以排定產品交期。 為克服上述困難,本研究分為兩階段,第一階段針對混合且高維度的資料集內部的類別屬性進行階層式切分,一步一步找出關鍵影響變數;第二階段則是在階層展開的同時,對於同一變數下的不同分類值進行聚合,讓資料集即使隨時間變動而增加維度或特徵,訓練模型也不會隨之大幅增加,建立準確模型的同時保有高度的可解釋力。 The majority of datasets found in reality are a combination of both categorical and numerical attributes. When it comes to prediction and decision-making tasks, using mixed datasets is considerably more challenging than using purely numerical datasets, since there are complex interactions that exist between these two types of attributes. For example, in a semiconductor dataset, the throughput rate of a chip is affected by several factors such as the type of machine, the material, and the number of wires. However, the impact of the number of wires on the throughput rate can vary depending on the specific combination of machines and products used. Traditional machine learning methods often convert categorical data into numerical data to make the prediction task much easier, but this practice may lead to some issues, such as creating unnecessary sequences between the nominal attributes or generating a high dimensional data thus making the computational time longer. This study introduces a hierarchical expansion method that segments data hierarchically based on categorical attributes, which helps to address the limitations of traditional machine learning methods. This method also explores the attribute levels in categorical attributes, see if they affect the response variable in a similar way. By applying the method to mixed datasets with complex interaction between variables, the computational effort can be reduced while simultaneously increasing the accuracy of prediction models and can improve the interpretability of machine learning models at the same time. The results of the study indicate that this method has potential for improving mixed datasets from a semiconductor manufacturer. And hierarchical expansion method has higher accuracy than partial combination prediction models which proposed by Chang (2020) and the hierarchical expansion method which proposed by Chang (2021). |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91250 |
DOI: | 10.6342/NTU202304279 |
全文授權: | 同意授權(全球公開) |
電子全文公開日期: | 2028-10-01 |
顯示於系所單位: | 工業工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-1.pdf 此日期後於網路公開 2028-10-01 | 2.29 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。