Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76884
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor吳政鴻(Cheng-Hung Wu)
dc.contributor.authorYu-Hsin Changen
dc.contributor.author張鈺欣zh_TW
dc.date.accessioned2021-07-10T21:39:24Z-
dc.date.available2021-07-10T21:39:24Z-
dc.date.copyright2020-08-24
dc.date.issued2020
dc.date.submitted2020-08-12
dc.identifier.citationAhmed, M. M., Abdel-Aty, M. (2013). Application of stochastic gradient boosting technique to enhance reliability of real-time risk assessment: use of automatic vehicle identification and remote traffic microwave sensor data. Transportation research record, 2386(1), 26-34.
Ari, B., Güvenir, H. A. (2002). Clustered linear regression. Knowledge-Based Systems, 15(3), 169-175.
Backus, P., Janakiram, M., Mowzoon, S., Runger, C., Bhargava, A. (2006). Factory cycle-time prediction with a data-mining approach. IEEE Transactions on Semiconductor Manufacturing, 19(2), 252-258.
Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140.
Broelemann, K., Kasneci, G. (2018). A gradient-based split criterion for highly accurate and transparent model trees. arXiv preprint arXiv:1809.09703.
Cadenas, J. M., Garrido, M. C., MartíNez, R. (2013). Feature subset selection filter–wrapper based on low quality data. Expert systems with applications, 40(16), 6241-6252.
Chen, K., Chen, H., Liu, L., Chen, S. (2019). Prediction of weld bead geometry of MAG welding based on XGBoost algorithm. The International Journal of Advanced Manufacturing Technology, 101(9-12), 2283-2295.
Chen, T., Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
Cheng, S., Zhang, S., Li, L., Zhang, D. (2018). Water quality monitoring method based on TLD 3D fish tracking and XGBoost. Mathematical Problems in Engineering, 2018.
Choi, D.-K. (2019). Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels. International Journal of Precision Engineering and Manufacturing, 20(1), 129-138.
Chung, Y.-S. (2013). Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. Accident Analysis Prevention, 61, 107-118.
Divina, F., Gilson, A., Goméz-Vela, F., García Torres, M., Torres, J. F. (2018). Stacking ensemble learning for short-term electricity consumption forecasting. Energies, 11(4), 949.
Došilović, F. K., Brčić, M., Hlupić, N. (2018). Explainable artificial intelligence: A survey. Paper presented at the 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO).
Dumitrescu, D., S rbu, C., Pop, H. (1994). A fuzzy divisive hierarchical clustering algorithm for the optimal choice of sets of solvent systems. Analytical letters, 27(5), 1031-1054.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Kdd.
Fauzan, M. A., Murfi, H. (2018). The accuracy of XGBoost for insurance claim prediction. Int. J. Adv. Soft Comput. Appl, 10(2).
Feng, L., Qiu, M.-H., Wang, Y.-X., Xiang, Q.-L., Yang, Y.-F., Liu, K. (2010). A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognition Letters, 31(11), 1216-1225.
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing, 20(1), 270-281.
Freitas, A. A. (2004). A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations Newsletter, 6(2), 77-86.
Garson, G. D. (2013). Hierarchical linear modeling: Guide and applications: Sage.
Geng, N., Jiang, Z., Chen, F. (2009). Stochastic programming based capacity planning for semiconductor wafer fab with uncertain demand and capacity. European Journal of Operational Research, 198(3), 899-908.
Gislason, P. O., Benediktsson, J. A., Sveinsson, J. R. (2004). Random forest classification of multisource remote sensing and geographic data. Paper presented at the IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.
Goia, A., May, C., Fusai, G. (2010). Functional clustering and linear regression for peak load forecasting. International Journal of Forecasting, 26(4), 700-711.
Goncalves, M., Netto, M., Costa, J., Zullo Junior, J. (2008). An unsupervised method of classifying remotely sensed images using Kohonen self‐organizing maps and agglomerative hierarchical clustering methods. International Journal of Remote Sensing, 29(11), 3171-3207.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5), 1-42.
Ham, J., Chen, Y., Crawford, M. M., Ghosh, J. (2005). Investigation of the random forest framework for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492-501.
Hastie, T., James, G., Witten, D., Tibshirani, R. (2013). An introduction to statistical learning Springer. New York.
Hira, Z. M., Gillies, D. F. (2015). A review of feature selection and feature extraction methods applied on microarray data. Advances in bioinformatics, 2015.
Huang, C.-L., Tsai, C.-Y. (2009). A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting. Expert systems with applications, 36(2), 1529-1539.
Kira, K., Rendell, L. A. (1992). A practical approach to feature selection Machine Learning Proceedings 1992 (pp. 249-256): Elsevier.
Kohavi, R., John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
Koller, D., Sahami, M. (1996). Toward optimal feature selection. Retrieved from
Kravtsov, S., Kondrashov, D., Ghil, M. (2005). Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. Journal of Climate, 18(21), 4404-4424.
Lakkaraju, H., Bach, S. H., Leskovec, J. (2016). Interpretable decision sets: A joint framework for description and prediction. Paper presented at the Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.
Leung, K.-M., Elashoff, R. M., Rees, K. S., Hasan, M. M., Legorreta, A. P. (1998). Hospital-and patient-related characteristics determining maternity length of stay: a hierarchical linear model approach. American journal of public health, 88(3), 377-381.
Li, F., Wu, J., Dong, F., Lin, J., Sun, G., Chen, H., Shen, J. (2018). Ensemble Machine Learning Systems for the Estimation of Steel Quality Control. Paper presented at the 2018 IEEE International Conference on Big Data (Big Data).
Lingitz, L., Gallina, V., Ansari, F., Gyulai, D., Pfeiffer, A., Sihn, W., Monostori, L. (2018). Lead time prediction using machine learning algorithms: A case study by a semiconductor manufacturer. PROCEDIA CIRP, 72, 1051-1056.
Liu, H., Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454): Springer Science Business Media.
Maldonado, S., López, J. (2018). Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification. Applied Soft Computing, 67, 94-105.
Mangal, A., Kumar, N. (2016). Using big data to enhance the bosch production line performance: A kaggle challenge. Paper presented at the 2016 IEEE International Conference on Big Data (Big Data).
Park, Y. W., Jiang, Y., Klabjan, D., Williams, L. (2017). Algorithms for generalized clusterwise linear regression. INFORMS Journal on Computing, 29(2), 301-317.
Pavlyshenko, B. (2016). Machine learning, linear and Bayesian models for logistic regression in failure detection problems. Paper presented at the 2016 IEEE International Conference on Big Data (Big Data).
Petrocelli, J. V. (2003). Hierarchical multiple regression in counseling research: Common problems and possible remedies. Measurement and evaluation in counseling and development, 36(1), 9-22.
Putatunda, S., Rama, K. (2018). A comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. Paper presented at the Proceedings of the 2018 International Conference on Signal Processing and Machine Learning.
Raedt, L. D., Kersting, K., Natarajan, S., Poole, D. (2016). Statistical relational artificial intelligence: Logic, probability, and computation. Synthesis Lectures on Artificial Intelligence and Machine Learning, 10(2), 1-189.
Raudenbush, S. W., Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1): sage.
Raudenbush, S. W., Chan, W.-S. (1993). Application of a hierarchical linear model to the study of adolescent deviance in an overlapping cohort design. Journal of consulting and clinical psychology, 61(6), 941.
Saeys, Y., Inza, I., Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. bioinformatics, 23(19), 2507-2517.
Salvador, S., Chan, P. (2004). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Paper presented at the 16th IEEE international conference on tools with artificial intelligence.
Serrano‐Megías, M., López‐Nicolás, J. M. (2006). Application of agglomerative hierarchical clustering to identify consumer tomato preferences: influence of physicochemical and sensory characteristics on consumer response. Journal of the Science of Food and Agriculture, 86(4), 493-499.
Sha, D., Storch, R., Liu, C.-H. (2007). Development of a regression-based method with case-based tuning to solve the due date assignment problem. International Journal of Production Research, 45(1), 65-82.
Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., Shao, L. (2016). Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE transactions on image processing, 25(12), 5933-5942.
Su, L. (2011). Multivariate local polynomial regression with application to Shenzhen component index. Discrete Dynamics in Nature and Society, 2011.
Su, P., Liu, Y., Song, X. (2018). Research on intrusion detection method based on improved smote and XGBoost. Paper presented at the Proceedings of the 8th International Conference on Communication and Network Security.
Torlay, L., Perrone-Bertolotti, M., Thomas, E., Baciu, M. (2017). Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain informatics, 4(3), 159-169.
Tseng, Y.-H., Durbin, P., Tzeng, G.-H. (2001). Using a fuzzy piecewise regression analysis to predict the nonlinear time-series of turbulent flows with automatic change-point detection. Flow, Turbulence and Combustion, 67(2), 81-106.
Van Dijck, G., Van Hulle, M. M. (2006). Speeding up the wrapper feature subset selection in regression by mutual information relevance and redundancy analysis. Paper presented at the International Conference on Artificial Neural Networks.
Vaughn, B. K. (2008). Data analysis using regression and multilevel/hierarchical models: JSTOR.
Vollmer, S., Mateen, B. A., Bohner, G., Király, F. J., Ghani, R., Jonsson, P., . . . Myles, P. (2018). Machine learning and AI research for patient benefit: 20 critical questions on transparency, replicability, ethics and effectiveness. arXiv preprint arXiv:1812.10404.
Wang, Y. (2011). Prediction of weather impacted airport capacity using ensemble learning. Paper presented at the 2011 IEEE/AIAA 30th Digital Avionics Systems Conference.
Wang, Y., Wang, D., Geng, N., Wang, Y., Yin, Y., Jin, Y. (2019). Stacking-based ensemble learning of decision trees for interpretable prostate cancer detection. Applied Soft Computing, 77, 188-204.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241-259.
Woltman, H., Feldstain, A., MacKay, J. C., Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in quantitative methods for psychology, 8(1), 52-69.
XingFen, W., Xiangbin, Y., Yangchun, M. (2018). Research on user consumption behavior prediction based on improved XGBoost algorithm. Paper presented at the 2018 IEEE International Conference on Big Data (Big Data).
You, M., Liu, J., Li, G.-Z., Chen, Y. (2012). Embedded feature selection for multi-label classification of music emotions. International Journal of Computational Intelligence Systems, 5(4), 668-678.
Yuan-Fu, Y. (2019). A Deep Learning Model for Identification of Defect Patterns in Semiconductor Wafer Map. Paper presented at the 2019 30th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC).
Zhang, Y., Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308-324.
Zhong, J., Sun, Y., Peng, W., Xie, M., Yang, J., Tang, X. (2018). XGBFEMF: An XGBoost-based framework for essential protein prediction. IEEE transactions on nanobioscience, 17(3), 243-250.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76884-
dc.description.abstract本研究開發有效率的可解釋自動化分層預測方法,當多個類別型變數及數值型變數間有高維度且非線性的交互作用時,此一預測方法可對單一類別變數分層進行全自動且有效的分層建模、選模、及預測,不僅兼具準確度更能深入剖析資料探討其模型可解釋性,並探討資料集大小變化對選模預測效能影響。
在工業與商業應用領域的混合數據集中,處裡類別變數和數值變數之間具有複雜交互作用的是常見的問題,以製造系統為例,系統中的生產率同時受到不同的類別變數與和數值變數共同影響,例如不同機器類型與產品類別屬性及其數值變數間之複雜有交互作用,進而同時影響生產率。本研究運用機器學習方法提高具有複雜交互作用混合資料集的預測效能並透過選模方法探討模型的透明度與可解釋性。與過往的分層回歸或聚類方法相比,本研究所需更少的訓練資料和計算成本。將資料集劃分為具有不同類別屬性組合的資料子集,並生成多個預測模型,運用訓練和驗證資料集指標作為選擇預測模型依據,並以一階段和兩階段模型選擇方法選出最穩健的預測模型,此外透過選模結果探討模型解釋性並研究資料集大小變化對選模預測效能影響。數值分析結果顯示,在半導體製封測的混合資料集中,與回歸模型相比,運用分層組合的模型選擇方法可以觀察到均方根誤差值降低30%以上,交叉驗證測試結果表示,與優化超參數的XGBoost模型相比,其預測準確性提高7.5%。此外,提出的模型選擇方法與其他回歸或ML預測方法相互兼容,可用於提高預測混合資料集現有方法的模型透明度與可解釋性。
zh_TW
dc.description.abstractMixed Datasets with complex interactions between categorical and numerical attributes are common in engineering and business applications. For example, production rates in manufacturing systems are jointly influenced by several categorical and numerical attributes, such as machine and product types and their numerical attributes. This study aims to improve the prediction performance and transparency of mixed datasets with complex interactions using machine learning (ML) methods. The proposed method requires lesser data and computational effort than existing hierarchical or clustering regression methods. Multiple prediction models can be generated by partitioning a dataset into subsets with different categorical attribution combinations. One- and two-stage model selection methods are proposed to use the training and validation datasets in selecting better models among all the prediction models. Numerical results demonstrate the potential of the model selection approach in a mixed dataset from a semiconductor manufacturer. In comparison with regression models, more than 30% reduction in root mean square error is observed using the proposed model selection approach. The cross-validation test results also demonstrated a 7.5% improvement in accuracy against the properly tuned XGBoost models. Moreover, the proposed model selection approach is compatible with other regression or ML prediction methods and can be used to improve the model’s transparency of any existing methods on mixed datasets.en
dc.description.provenanceMade available in DSpace on 2021-07-10T21:39:24Z (GMT). No. of bitstreams: 1
U0001-1108202023464000.pdf: 2691067 bytes, checksum: bd90f839907c4bd1c88add58352a5843 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents誌謝 I
中文摘要 II
ABSTRACT III
目錄 IV
圖 目 錄 VII
表 目 錄 VIII
第一章 緒論 1
1.1 研究背景與動機 1
1.1.1 混合資料集預測問題 1
1.1.2 少量多樣化的產品特性 2
1.1.3 分層式建模之應用 2
1.1.4 機器學習模型的透明度與可解釋性問題 3
1.1.5 研究背景與動機結論 4
1.2 研究目的 5
1.3 研究方法與流程 5
第二章 文獻回顧 7
2.1 階層式分群/階層式迴歸方法及其應用 7
2.1.1 多項式回歸/ 分段回歸 7
2.1.2 階層線性模型 8
2.1.3 分層聚類法Hierarchical Clustering 8
2.2 特徵選擇方法及其應用 9
2.2.1 特徵選擇介紹 9
2.2.2 特徵選擇缺點 10
2.3 集成學習方法及其應用 11
2.3.1 引導聚集算法(Bagging) 11
2.3.2 提升方法(Boosting) 11
2.3.3 堆疊法(Stacking) 12
2.3.4 XGBoost演算法 12
2.4 機器學習模型的透明度與可解釋性問題 13
2.5 文獻探討小節 14
第三章 組合資料集與模型定義 15
3.1 原始資料集描述與資料前處裡 15
3.2 基礎、部分與全組合定義與相對應資料集 16
3.2.1 基礎類別屬性組合定義與相對應之資料集 16
3.2.2 部分類別屬性組合定義與相對應之資料集 17
3.2.3 全類別屬性組合定義與相對應之資料集 19
3.3 基礎、部分與全組合模型定義 21
3.3.1 基礎類別屬性組合預測模型 21
3.3.2 部分類別屬性組合預測模型 21
3.3.3 全類別屬性組合預測模型 21
3.4 基礎、部分與全組合模型訓練流程 22
3.5 XGBOOST模型超參數設定 25
第四章 預測模型選擇方法 26
4.1 定義選模指標 26
4.2 一階段與兩階段模型選擇方法之評估指標 27
4.2.1 一階段模型選擇方法(A, B, n) : 27
4.2.2 兩階段模型選擇方法(A, B, n, C, m): 27
4.3 一階段與兩階段模型選擇方法之評估指標 28
第五章 資料集驗證與數值分析 31
5.1 資料集介紹 31
5.2 一階段與兩階段模型選擇演算法驗證 34
5.2.1 半導體資料集模型選擇方法效能比較 34
5.2.2 鑽石資料集模型選擇方法效能比較 42
5.3 模型透明度與可解釋性 49
5.4 資料筆數對選模方法預測效能影響 51
5.5 數值分析小結 54
第六章 結論與未來研究方向 55
6.1 結論 55
6.2 未來研究方向 55
參考文獻 56
dc.language.isozh-TW
dc.subject分層分群zh_TW
dc.subject機器學習zh_TW
dc.subject預測方法zh_TW
dc.subject分層方法zh_TW
dc.subjectManufacturingen
dc.subjectHierarchical methoden
dc.subjectExpert systemsen
dc.subjectHierarchical Clusteringen
dc.subjectPrediction methodsen
dc.subjectRegression analysisen
dc.title混合資料集之分層組合預測模型zh_TW
dc.titleUsing Partial Combination Prediction Models for Mixed Datasetsen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee喻奉天,胡毓仁,余承叡
dc.subject.keyword分層方法,分層分群,預測方法,機器學習,zh_TW
dc.subject.keywordHierarchical method,Hierarchical Clustering,Prediction methods,Regression analysis,Manufacturing,Expert systems,en
dc.relation.page61
dc.identifier.doi10.6342/NTU202003028
dc.rights.note未授權
dc.date.accepted2020-08-12
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept工業工程學研究所zh_TW
Appears in Collections:工業工程學研究所

Files in This Item:
File SizeFormat 
U0001-1108202023464000.pdf
  Restricted Access
2.63 MBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved