請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82210完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 吳政鴻(Cheng-Hung Wu) | |
| dc.contributor.author | Kuei-Wen Chang | en |
| dc.contributor.author | 張貴雯 | zh_TW |
| dc.date.accessioned | 2022-11-25T06:33:43Z | - |
| dc.date.copyright | 2021-07-09 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2021-06-28 | |
| dc.identifier.citation | 張鈺欣(2020),混合資料集之分層組合預測模型,臺灣大學工業工程學研究所學位論文 Afzalirad, M., Rezaeian, J. (2016). Resource-constrained unrelated parallel machine scheduling problem with sequence dependent setup times, precedence constraints and machine eligibility restrictions. Computers Industrial Engineering, 98, 40-52. Andreopoulos, B., An, A., Wang, X. (2006). Bi-level clustering of mixed categorical and numerical biomedical data. International journal of data mining and bioinformatics, 1(1), 19-56. doi:10.1504/ijdmb.2006.009920 Asi, H., Duchi, J. C. (2019). The importance of better models in stochastic optimization. Proceedings of the National Academy of Sciences, 116(46), 22924-22930. Barcelo-Rico, F., Diez, J.-L. (2012). Geometrical codification for clustering mixed categorical and numerical databases. Journal of Intelligent Information Systems, 39(1), 167-185. doi:10.1007/s10844-011-0187-y Basu, S., Kumbier, K., Brown, J. B., Yu, B. (2018). Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences, 115(8), 1943-1948. Boriah, S., Chandola, V., Kumar, V. Similarity Measures for Categorical Data: A Comparative Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining (pp. 243-254). Boriah, S., Chandola, V., Kumar, V. (2008). Similarity Measures for Categorical Data: A Comparative Evaluation (Vol. 30). Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. Bülbül, K., Şen, H. (2017). An exact extended formulation for the unrelated parallel machine total weighted completion time problem. Journal of Scheduling, 20(4), 373-389. Cao, D., Chen, M., Wan, G. (2005). Parallel machine selection and job scheduling to minimize machine cost and job tardiness. Computers operations research, 32(8), 1995-2012. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Paper presented at the Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. Chen, K., Chen, H., Liu, L., Chen, S. (2019). Prediction of weld bead geometry of MAG welding based on XGBoost algorithm. The International Journal of Advanced Manufacturing Technology, 101(9-12), 2283-2295. Chen, T., Guestrin, C. (2016a). XGBoost: A Scalable Tree Boosting System. Paper presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA. https://doi.org/10.1145/2939672.2939785 Chen, T., Guestrin, C. (2016b). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. Choi, D.-K. (2019). Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels. International Journal of Precision Engineering and Manufacturing, 20(1), 129-138. Cruz-Chávez, M. A., Juárez-Pérez, F., Ávila-Melgar, E. Y., Martínez-Oropeza, A. (2009). Simulated annealing algorithm for the weighted unrelated parallel machines problem. Paper presented at the 2009 Electronics, Robotics and Automotive Mechanics Conference (CERMA). David, G., Averbuch, A. (2012). SpectralCAT: Categorical spectral clustering of numerical and nominal data. Pattern Recognition, 45(1), 416-433. doi:https://doi.org/10.1016/j.patcog.2011.07.006 Diana, R. O. M., de Souza, S. R. (2020). Analysis of variable neighborhood descent as a local search operator for total weighted tardiness problem on unrelated parallel machines. Computers operations research, 117, 104886. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139-157. Džeroski, S., Ženko, B. (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3), 255-273. Fauzan, M. A., Murfi, H. (2018). The accuracy of XGBoost for insurance claim prediction. Int. J. Adv. Soft Comput. Appl, 10(2). Fleszar, K., Hindi, K. S. (2018). Algorithms for the unrelated parallel machine scheduling problem with a resource constraint. European Journal of Operational Research, 271(3), 839-848. Geng, N., Jiang, Z., Chen, F. (2009). Stochastic programming based capacity planning for semiconductor wafer fab with uncertain demand and capacity. European Journal of Operational Research, 198(3), 899-908. Gislason, P. O., Benediktsson, J. A., Sveinsson, J. R. (2004). Random forest classification of multisource remote sensing and geographic data. Paper presented at the IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium. Goodall, D. W. (1966). A New Similarity Index Based on Probability. Biometrics, 22(4), 882-907. doi:10.2307/2528080 Gower, J. C. (1971). A General Coefficient of Similarity and Some of Its Properties. Biometrics, 27(4), 857-871. doi:10.2307/2528823 Hariri, A., Potts, C. N. (1991). Heuristics for scheduling unrelated parallel machines. Computers operations research, 18(3), 323-331. Horng, S.-M., Fowler, J. W., Cochran, J. K. (2000). A genetic algorithm approach to manage ion implantation processes in wafer fabrication. International Journal of Manufacturing Technology and Management, 1(2-3), 156-172. Hsu, C.-C., Chen, C.-L., Su, Y.-W. (2007). Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 177(20), 4474-4492. doi:https://doi.org/10.1016/j.ins.2007.05.003 Hsu, C.-C., Chen, Y.-C. (2007). Mining of mixed data with application to catalog marketing. Expert Systems with Applications, 32(1), 12-23. doi:https://doi.org/10.1016/j.eswa.2005.11.017 Hu, X., Zhang, H., Mei, H., Xiao, D., Li, Y., Li, M. (2020). Landslide susceptibility mapping using the stacking ensemble machine learning method in Lushui, Southwest China. Applied Sciences, 10(11), 4016. Huang, Z. (1997). CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES. Huang, Z. (1998). Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery, 2(3), 283-304. doi:10.1023/A:1009769707641 Joo, C. M., Kim, B. S. (2015). Hybrid genetic algorithms with dispatching rules for unrelated parallel machine scheduling with setup time and production availability. Computers Industrial Engineering, 85, 102-109. Khan, S. S., Ahmad, A. (2004). Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters, 25(11), 1293-1302. doi:https://doi.org/10.1016/j.patrec.2004.04.007 Kochenderfer, M. J., Wheeler, T. A. (2019). Algorithms for optimization: Mit Press. Koh, P. W., Liang, P. (2017). Understanding Black-box Predictions via Influence Functions. ArXiv, abs/1703.04730. Kumbier, K., Basu, S., Brown, J. B., Celniker, S., Yu, B. (2018). Refining interaction search through signed iterative random forests. arXiv preprint arXiv:1810.07287. Li, F., Wu, J., Dong, F., Lin, J., Sun, G., Chen, H., Shen, J. (2018). Ensemble Machine Learning Systems for the Estimation of Steel Quality Control. Paper presented at the 2018 IEEE International Conference on Big Data (Big Data). Li, Y., Yan, C., Liu, W., Li, M. (2016). Research and application of random forest model in mining automobile insurance fraud. Paper presented at the 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). MacQueen, J. (1967, 1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, Berkeley, Calif. Michael, L. (2008). Pinedo scheduling: theory, algorithms, and system. In: Springer New York, NY. Mir, M. S. S., Rezaeian, J. (2016). A robust hybrid approach based on particle swarm optimization and genetic algorithm to minimize the total machine load on unrelated parallel machines. Applied Soft Computing, 41, 488-504. Olah, C., Mordvintsev, A., Schubert, L. (2017). Feature visualization. Distill, 2(11), e7. Olden, J. D., Joy, M. K., Death, R. G. (2004). An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecological modelling, 178(3-4), 389-397. Olshausen, B. A., Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision research, 37(23), 3311-3325. Panwalkar, S., Smith, M., Koulamas, C. (1993). A heuristic for the single machine tardiness problem. European Journal of Operational Research, 70(3), 304-310. Pfund, M., Fowler, J. W., Gupta, J. N. (2004). A survey of algorithms for single and multi-objective unrelated parallel-machine deterministic scheduling problems. Journal of the Chinese Institute of Industrial Engineers, 21(3), 230-241. Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. doi:10.1109/MCAS.2006.1688199 Probst, P., Wright, M. N., Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3), e1301. Putatunda, S., Rama, K. (2018). A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. Paper presented at the Proceedings of the 2018 International Conference on Signal Processing and Machine Learning. Shi, T., Yu, B., Clothiaux, E. E., Braverman, A. J. (2008). Daytime arctic cloud detection based on multi-angle satellite data with case studies. Journal of the American Statistical Association, 103(482), 584-593. Shin, Y. (2015). Application of boosting regression trees to preliminary cost estimation in building construction projects. Computational intelligence and neuroscience, 2015. Singh, C., Murdoch, W. J., Yu, B. (2019). Hierarchical interpretations for neural network predictions. ArXiv, abs/1806.05337. Spall, J. C. (2005). Introduction to stochastic search and optimization: estimation, simulation, and control (Vol. 65): John Wiley Sons. Spall, J. C. (2012). Stochastic Optimization. In J. E. Gentle, W. K. Härdle, Y. Mori (Eds.), Handbook of Computational Statistics: Concepts and Methods (pp. 173-201). Berlin, Heidelberg: Springer Berlin Heidelberg. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A. (2008). Conditional variable importance for random forests. BMC bioinformatics, 9(1), 307. Sun, W., Trevor, B. (2018). A stacking ensemble learning framework for annual river ice breakup dates. Journal of Hydrology, 561, 636-650. Sundararajan, M., Taly, A., Yan, Q. (2017). Axiomatic Attribution for Deep Networks. Paper presented at the ICML. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. Tin Kam, H. (1995, 14-16 Aug. 1995). Random decision forests. Paper presented at the Proceedings of 3rd International Conference on Document Analysis and Recognition. Torlay, L., Perrone-Bertolotti, M., Thomas, E., Baciu, M. (2017). Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain informatics, 4(3), 159-169. Wang, C., Chi, C.-H., Zhou, W., Wong, R. K. (2015). Coupled Interdependent Attribute Analysis on Mixed Data. Paper presented at the AAAI. Wang, X., Li, Z., Chen, Q., Mao, N. (2020). Meta-heuristics for unrelated parallel machines scheduling with random rework to minimize expected total weighted tardiness. Computers Industrial Engineering, 145, 106505. Wei, D., Zhou, B., Torrabla, A., Freeman, W. (2015). Understanding intra-class knowledge inside cnn. arXiv preprint arXiv:1507.02379. Wei, M., Chow, T. W. S., Chan, R. H. M. (2015). Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation. Entropy, 17(3), 1535-1548. doi:10.3390/e17031535 Wu, J., Pan, S., Zhu, X., Cai, Z. (2014). Boosting for multi-graph classification. IEEE transactions on cybernetics, 45(3), 416-429. Wu, S., Joseph, A., Hammonds, A. S., Celniker, S. E., Yu, B., Frise, E. (2016). Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks. Proceedings of the National Academy of Sciences, 113(16), 4290-4295. XingFen, W., Xiangbin, Y., Yangchun, M. (2018). Research on user consumption behavior prediction based on improved XGBoost algorithm. Paper presented at the 2018 IEEE International Conference on Big Data (Big Data). Yuan-Fu, Y. (2019). A Deep Learning Model for Identification of Defect Patterns in Semiconductor Wafer Map. Paper presented at the 2019 30th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC). Zareapoor, M., Shamsolmoali, P. (2015). Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia computer science, 48(2015), 679-685. Zeiler, M. D., Fergus, R. (2014). Visualizing and understanding convolutional networks. Paper presented at the European conference on computer vision. Zhang, D., Qian, L., Mao, B., Huang, C., Huang, B., Si, Y. (2018). A data-driven design for fault detection of wind turbines using random forests and XGboost. IEEE Access, 6, 21020-21031. Zhong, J., Sun, Y., Peng, W., Xie, M., Yang, J., Tang, X. (2018). XGBFEMF: An XGBoost-based framework for essential protein prediction. IEEE transactions on nanobioscience, 17(3), 243-250. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/82210 | - |
| dc.description.abstract | 在工業與商業應用領域的混合資料集中,經常會面臨類別屬性和數值屬性之間具有複雜交互作用的問題,在商業領域方面,價格預測是重要的商業問題,而產品與服務價格同時受到各種類別與數值屬性影響,例如電力需求和價格會受到不同季節、商務活動與化石燃料價格等因素影響;而製造系統中,不同機器類和產品屬性組合,對生產率有不同程度的影響,過往在面對混合資料集的預測問題,對於複雜的產品組合模式普遍針對單一資料集建構預測模型,其預測模型無法因應生產現場的產品組合變化而動態調整,導致後續安排加工時間不易並難以精準規劃產品交期。 本研究將針對混合資料集,以類別屬性將資料集進行階層式的切分,藉由階層式展開方法,可以改善過去傳統機器學習方法的缺點,在減少運算的複雜度的同時建立準確度更高的預測模型,當系統中的特徵或屬性增加時,所需要訓練的模型不會隨之大幅度增加,仍然能夠維持有效率的運算效能,另外透過階層式展開與模型選擇,追溯機器學習的推論結果以提升模型的解釋力,並且以階層式展開模型應用至非等效機台的排程問題。數值分析結果顯示,在半導體混合資料集中,與XGBoost模型相比,運用階層式展開方法可以降低17.7%的均方根誤差值,且與張鈺欣(2020)提出的分層組合模型更具運算效能與預測效能;由於階層式展開預測模型的準確度與精確度,以其預測非等效機台的排程問題的最佳化模型參數,能夠有效提升排程的效果,且掌握生產系統中的不確定因子。 | zh_TW |
| dc.description.provenance | Made available in DSpace on 2022-11-25T06:33:43Z (GMT). No. of bitstreams: 1 U0001-2406202117124900.pdf: 2850132 bytes, checksum: 0a3d01704b40b8f00e113bfd5a21bd02 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 誌謝 I 中文摘要 II ABSTRACT III 目錄 IV 圖 目 錄 VII 表 目 錄 VIII 第一章 緒論 1 1.1.1 混合資料集造成預測困難 1 1.1.2 資料維度過高導致預測效能不足 2 1.1.3 機器學習模型的透明度與可解釋性問題 3 1.1.4 環境不確定性的挑戰 4 1.1.5 研究背景與動機結論 5 1.2 研究目的 5 1.3 研究方法與流程 6 第二章 文獻回顧 8 2.1 分群方法及其應用 8 2.1.1 區分聚類法 (Partitional clustering) 9 2.1.2 階層聚類法 (Hierarchical clustering) 9 2.2 集成學習方法及其應用 10 2.2.1 提升方法(Boosting) 10 2.2.2 引導聚集法(Bagging) 11 2.2.3 堆疊法(Stacking) 12 2.2.4 XGBoost演算法 12 2.3 機器學習模型的透明度與可解釋性問題 14 2.4 隨機最佳化方法 15 2.5 非等效平行機台排程 15 2.6 文獻探討小節 17 第三章 階層式展開演算法 18 3.1 原始資料集描述與資料前處理 18 3.2 各階層展開定義與相對應資料集 19 3.2.1 第零階層展開定義與相對應之資料集 19 3.2.2 第一階層展開定義與相對應之資料集 20 3.2.3 第N階層展開定義與相對應之資料集 21 3.3 階層式展開預測模型定義 23 3.4 預測模型之訓練 23 3.4.1 預測模型之屬性 24 3.4.2 模型超參數設定 25 3.5 階層式展開方法 26 3.5.1 以預測效能為基準的屬性選擇指標 27 3.5.2 以預測值距離為基準的屬性選擇指標 29 3.6 預測模型驗證方法 31 第四章 非等效平行機台排程 34 4.1 生產系統之資料集 34 4.1.1 歷史資料 34 4.1.2 產生資料集 35 4.2 最佳化模型 36 4.2.1 確定性最佳化模型 36 4.2.2 隨機最佳化模型 36 4.2.3 最佳化模型參數 37 4.3 排程結果驗證 39 第五章 資料集驗證與數值分析 40 5.1 資料集介紹 40 5.2 階層式展開方法與模型選擇驗證 42 5.2.1 半導體資料集之階層展開與預測效能驗證 43 5.2.2 全州保險理賠資料集之階層展開與預測效能驗證 51 5.2.3 車子資料集資料集之階層展開與預測效能驗證 64 5.2.4 紐約Airbnb資料集之階層展開與預測效能驗證 71 5.2.5 模型透明度與可解釋性 77 5.2.6 階層式展開預測模型驗證小結 78 5.3 非等效平行機台排程驗證 81 第六章 結論與未來研究方向 88 6.1 結論 88 6.2 未來研究方向 89 參考文獻 91 | |
| dc.language.iso | zh-TW | |
| dc.subject | 排程 | zh_TW |
| dc.subject | 分層方法 | zh_TW |
| dc.subject | 分群 | zh_TW |
| dc.subject | 預測模型 | zh_TW |
| dc.subject | 機器學習 | zh_TW |
| dc.subject | Hierarchical method | en |
| dc.subject | Manufacturing | en |
| dc.subject | Machine learning | en |
| dc.subject | Scheduling | en |
| dc.subject | Regression analysis | en |
| dc.subject | Prediction method | en |
| dc.subject | Clustering | en |
| dc.title | 混合資料集之階層式展開集成學習預測方法 | zh_TW |
| dc.title | Hierarchical Expansion Approach for Ensemble Learning and Prediction in Mixed Datasets | en |
| dc.date.schoolyear | 109-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 洪一薰(Hsin-Tsai Liu),黃奎隆(Chih-Yang Tseng),藍正宏 | |
| dc.subject.keyword | 分層方法,分群,預測模型,機器學習,排程, | zh_TW |
| dc.subject.keyword | Hierarchical method,Clustering,Prediction method,Regression analysis,Machine learning,Manufacturing,Scheduling, | en |
| dc.relation.page | 96 | |
| dc.identifier.doi | 10.6342/NTU202101129 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2021-06-28 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 工業工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2024-07-31 | - |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2406202117124900.pdf 未授權公開取用 | 2.78 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
