Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91250
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor吳政鴻zh_TW
dc.contributor.advisorCheng-Hung Wuen
dc.contributor.author張鈺鑫zh_TW
dc.contributor.authorYu-Hsin Changen
dc.date.accessioned2023-12-12T16:24:16Z-
dc.date.available2023-12-13-
dc.date.copyright2023-12-12-
dc.date.issued2023-
dc.date.submitted2023-10-13-
dc.identifier.citationYu-Hsin Chang (2020), Using partial combination prediction models for mixed datasets, Grdaduate Institue fo Industrial Engineering College of Engineering, National Taiwan University, Master Thesis.
Kuei-Wen Chang (2021), Hierarchical Expansion Approach for Ensemble Learning and Prediction in Mixed Dataset, Grdaduate Institue fo Industrial Engineering College of Engineering, National Taiwan University, Master Thesis.
Pawlak, Z. (1982). Rough sets. International journal of computer & information sciences, 11(5), 341-356.
Hall, M. A. (2000). Correlation-based feature selection of discrete and numeric class machine learning.
Hu, Q., Liu, J., & Yu, D. (2008). Mixed feature selection based on granulation and approximation. Knowledge-Based Systems, 21(4), 294-304.
Paul, J., & Dupont, P. (2015). Kernel methods for heterogeneous feature selection. Neurocomputing, 169, 187-195.
Jiang, S. Y., & Wang, L. X. (2016). Efficient feature selection based on correlation measure between continuous and discrete features. Information Processing Letters, 116(2), 203-215.
Wang, F., & Liang, J. (2016). An efficient feature selection algorithm for hybrid data. Neurocomputing, 193, 33-41.
Zhang, X., Mei, C., Chen, D., & Li, J. (2016). Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognition, 56, 1-15.
Solorio-Fernández, S., Martínez-Trinidad, J. F., & Carrasco-Ochoa, J. A. (2017). A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognition, 72, 314-326.
Lee, J., Jeong, J. Y., & Jun, C. H. (2020). Markov blanket-based universal feature selection for classification and regression of mixed-type data. Expert Systems with Applications, 158, 113398.
Solorio-Fernandez, S., Carrasco-Ochoa, J. A., & Martinez-Trinidad, J. F. (2022). A survey on feature selection methods for mixed data. Artificial Intelligence Review, 55(4), 2821-2846. https://doi.org/10.1007/s10462-021-10072-6
Kim, K. J., & Jun, C. H. (2018). Rough set model based feature selection for mixed-type data with feature space decomposition. Expert Systems with Applications, 103, 196-205.
Chandra, B., & Gupta, M. (2011). An efficient statistical feature selection approach for classification of gene expression data. Journal of biomedical informatics, 44(4), 529-535.
Tang, W., & Mao, K. (2005, May). Feature selection algorithm for data with both nominal and continuous features. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 683-688). Springer, Berlin, Heidelberg.
Tang, W., & Mao, K. Z. (2007). Feature selection algorithm for mixed data with both nominal and continuous features. Pattern Recognition Letters, 28(5), 563-571.
Sang, B., Chen, H., Li, T., Xu, W., & Yu, H. (2020). Incremental approaches for heterogeneous feature selection in dynamic ordered data. Information Sciences, 541, 475-501.
Greco, S., Matarazzo, B., & Slowinski, R. (2001). Rough sets theory for multicriteria decision analysis. European journal of operational research, 129(1), 1-47.
Cover, T. M., & Thomas, J. A. (2006). Elements of information theory 2nd edition (wiley series in telecommunications and signal processing). Acessado em.
Doquire, G., & Verleysen, M. (2011, October). An Hybrid Approach to Feature Selection for Mixed Categorical and Continuous Data. In KDIR (pp. 394-401).
Doquire G, Verleysen M (2011) Mutual information based feature selection for mixed data. In: 19th European In: 19th European symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2011), pp 333–338 symposium on artificial neural networks, computational intelligence and machine learning (ESANN 2011), pp 333–338
Liu, H., Wei, R., & Jiang, G. (2013). A hybrid feature selection scheme for mixed attributes data. Computational and Applied Mathematics, 32(1), 145-161.
Solorio-Fernández, S., Martínez-Trinidad, J. F., & Carrasco-Ochoa, J. A. (2020). A supervised filter feature selection method for mixed data based on spectral feature selection and information-theory redundancy analysis. Pattern Recognition Letters, 138, 321-328.
Zhao, Z. A., & Liu, H. (2012). Spectral feature selection for data mining. Taylor & Francis.
Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. The Journal of Machine Learning Research, 5, 1205-1224.
Solorio-Fernández, S., Martínez-Trinidad, J. F., & Carrasco-Ochoa, J. A. (2017). A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognition, 72, 314-326.
Coelho, F., Braga, A. P., & Verleysen, M. (2016). A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. International Journal of Computational Intelligence Systems, 9(4), 726-733.
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical review E, 69(6), 066138.
Dash, M., Liu, H., & Yao, J. (1997, November). Dimensionality reduction of unsupervised data. In Proceedings ninth ieee international conference on tools with artificial intelligence (pp. 532-539). IEEE.
Liu, H., & Setiono, R. (1995, November). Chi2: Feature selection and discretization of numeric attributes. In Proceedings of 7th IEEE international conference on tools with artificial intelligence (pp. 388-391). IEEE.
Dash, M., & Liu, H. (2000, April). Feature selection for clustering. In Pacific-Asia Conference on knowledge discovery and data mining (pp. 110-121). Springer, Berlin, Heidelberg.
Solorio-Fernández, S., Martínez-Trinidad, J. F., & Carrasco-Ochoa, J. A. (2017). A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recognition, 72, 314-326.
Zhao, Z., & Liu, H. (2007, June). Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th international conference on Machine learning (pp. 1151-1157).
Chaudhuri, A., Samanta, D., & Sarma, M. (2021). Two-stage approach to feature set optimization for unsupervised dataset with heterogeneous attributes. Expert Systems with Applications, 172, 114563.
Huang, Z. (1997, February). Clustering large data sets with mixed numeric and categorical values. In Proceedings of the 1st pacific-asia conference on knowledge discovery and data mining,(PAKDD) (pp. 21-34).
Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304.
Dutta, D., Dutta, P., & Sil, J. (2014). Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. International Journal of Hybrid Intelligent Systems, 11(1), 41-54.
Garg, V. K., Rudin, C., & Jaakkola, T. (2016, May). CRAFT: Cluster-specific assorted feature selection. In Artificial intelligence and statistics (pp. 305-313). PMLR.
Marbac, M., & Sedki, M. (2017). Variable selection for mixed data clustering: a model-based approach. arXiv preprint arXiv:1703.02293.
Marbac, M., & Sedki, M. (2019). VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values. Bioinformatics, 35(7), 1255-1257.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1-22.
Storlie, C. B., Myers, S. M., Katusic, S. K., Weaver, A. L., Voigt, R. G., Croarkin, P. E., ... & Port, J. D. (2018). Clustering and variable selection in the presence of mixed variable types and missing data. Statistics in medicine, 37(19), 2884-2899.
Solorio-Fernandez, S., Carrasco-Ochoa, J. A., & Martinez-Trinidad, J. F. (2022). A survey on feature selection methods for mixed data. Artificial Intelligence Review, 55(4), 2821-2846. https://doi.org/10.1007/s10462-021-10072-6
Hancock, J. T., & Khoshgoftaar, T. M. (2020). Survey on categorical data for neural networks. Journal of Big Data, 7(1). https://doi.org/ARTN 2810.1186/s40537-020-00305-w
Gniazdowski, Z., & Grabowski, M. (2016). Numerical coding of nominal data. arXiv preprint arXiv:1601.01966.
Ruiz-Shulcloper, J. (2008). Pattern recognition with mixed and incomplete data. Pattern Recognition and Image Analysis, 18(4), 563-576.
Kurgan, L. A., & Cios, K. J. (2004). CAIM discretization algorithm. IEEE transactions on Knowledge and Data Engineering, 16(2), 145-153.
Kerber, R. (1992, July). Chimerge: Discretization of numeric attributes. In Proceedings of the tenth national conference on Artificial intelligence (pp. 123-128).
Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.
Wong, A. K., & Chiu, D. K. (1987). Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, (6), 796-805.
Hua, H., & Zhao, H. (2009, November). A discretization algorithm of continuous attributes based on supervised clustering. In 2009 Chinese conference on pattern recognition (pp. 1-5). IEEE.
Foss, A. H., Markatou, M., & Ray, B. (2019). Distance metrics and clustering methods for mixed‐type data. International Statistical Review, 87(1), 80-109.
Cantú-Paz, E. (2001). Supervised and unsupervised discretization methods for evolutionary algorithms (No. UCRL-JC-142243). Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).
Hartemink, A., & Gifford, D. K. (2001). Principled computational methods for the validation and discovery of genetic regulatory networks. Massachusetts Institute of Technology (Doctoral dissertation, Ph. D. dissertation).
Dash, R., Paramguru, R. L., & Dash, R. (2011). Comparative analysis of supervised and unsupervised discretization techniques. International Journal of Advances in Science and Technology, 2(3), 29-37.
De Leon, A. R., & Chough, K. C. (Eds.). (2013). Analysis of mixed data: methods & applications. CRC Press.
Wilson, D. R., & Martinez, T. R. (1997). Improved heterogeneous distance functions. Journal of artificial intelligence research, 6, 1-34.
Lim, M., & Hastie, T. (2015). Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3), 627-654.
Sorokina, D., Caruana, R., Riedewald, M., & Fink, D. (2008, July). Detecting statistical interactions with additive groves of trees. In Proceedings of the 25th international conference on Machine learning (pp. 1000-1007).
Loh, W. Y. (2002). Regression tress with unbiased variable selection and interaction detection. Statistica sinica, 361-386.
Lou, Y., Caruana, R., Gehrke, J., & Hooker, G. (2013, August). Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 623-631).
Henelius, A., Puolamäki, K., Boström, H., Asker, L., & Papapetrou, P. (2014). A peek into the black box: exploring classifiers by randomization. Data mining and knowledge discovery, 28(5), 1503-1529.
Henelius, A., Puolamäki, K., Karlsson, I., Zhao, J., Asker, L., Boström, H., & Papapetrou, P. (2015, April). Goldeneye++: A closer look into the black box. In International symposium on statistical learning and data sciences (pp. 96-105). Springer, Cham.
Oh, S. (2019). Feature interaction in terms of prediction performance. Applied Sciences, 9(23), 5191.
Zhang, X., Zhang, H., Zhu, J., & Li, Z. (2022). Revealing the structure of prediction models through feature interaction detection. Knowledge-Based Systems, 236, 107737.
Ahmad, A., & Khan, S. S. (2019). Survey of State-of-the-Art Mixed Data Clustering Algorithms. Ieee Access, 7, 31883-31902. https://doi.org/10.1109/Access.2019.2903568
MacQueen, J. (1967). Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability (pp. 281-297).
Huang, J. Z., Ng, M. K., Rong, H., & Li, Z. (2005). Automated variable weighting in k-means type clustering. IEEE transactions on pattern analysis and machine intelligence, 27(5), 657-668.
Lu, Y., Lu, S., Fotouhi, F., Deng, Y., & Brown, S. J. (2004, March). FGKA: A fast genetic k-means clustering algorithm. In Proceedings of the 2004 ACM symposium on Applied computing (pp. 622-623).
Chiodi, M. (1990). A partition type method for clustering mixed data. Rivista di statistica applicata, 2, 135-147.
Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence, 110. https://doi.org/ARTN 10474310.1016/j.engappai.2022.104743
D'Urso, P., & Massari, R. (2019). Fuzzy clustering of mixed data. Information Sciences, 505, 513-534. https://doi.org/10.1016/j.ins.2019.07.100
Behzadi, S., Muller, N. S., Plant, C., & Bohm, C. (2020). Clustering of mixed-type data considering concept hierarchies: problem specification and algorithm. International Journal of Data Science and Analytics, 10(3), 233-248. https://doi.org/10.1007/s41060-020-00216-2
Tran, L., Fan, L., & Shahabi, C. (2021, April). Clustering Mixed-Type Data with Correlation-Preserving Embedding. In International Conference on Database Systems for Advanced Applications (pp. 342-358). Springer, Cham.
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008, July). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning (pp. 1096-1103).
Yin, S., Gan, G. J., Valdez, E. A., & Vadiveloo, J. (2021). Applications of Clustering with Mixed Type Data in Life Insurance. Risks, 9(3). https://doi.org/ARTN 4710.3390/risks9030047
Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., Er, M. J., Ding, W. P., & Lin, C. T. (2017). A review of clustering techniques and developments. Neurocomputing, 267, 664-681. https://doi.org/10.1016/j.neucom.2017.06.053
D. Sisodia, S. Sisodia, K. Saxena, Clustering techniques: a brief survey of different clustering algorithms, Int. J. Latest Trends Eng. Technol. 1 (3) (2012) 82–87.
Moschidis, O., Markos, A., & Chadjipadelis, T. (2022). Hierarchical clustering of mixed-type data based on barycentric coding. Behaviormetrika, 1-25.
Melnykov, V., & Maitra, R. (2010). Finite mixture models and model-based clustering. Statistics Surveys, 4, 80-116.
McParland, D., & Gormley, I. C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification, 10(2), 155-169. https://doi.org/10.1007/s11634-016-0238-x
Marbac, M., Biernacki, C., & Vandewalle, V. (2017). Model-based clustering of Gaussian copulas for mixed data. Communications in Statistics-Theory and Methods, 46(23), 11635-11656. https://doi.org/10.1080/03610926.2016.1277753
Tekumalla, L. S., Rajan, V., & Bhattacharyya, C. (2017). Vine copulas for mixed data : multi-view clustering for mixed data beyond meta-Gaussian dependencies. Machine Learning, 106(9-10), 1331-1357. https://doi.org/10.1007/s10994-016-5624-2
McParland, D., Phillips, C. M., Brennan, L., Roche, H. M., & Gormley, I. C. (2017). Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data. Statistics in Medicine, 36(28), 4548-4569. https://doi.org/10.1002/sim.7371
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69.
Lam, D., Wei, M. Z., & Wunsch, D. (2015). Clustering Data of Mixed Categorical and Numerical Type With Unsupervised Feature Learning. Ieee Access, 3, 1605-1613. https://doi.org/10.1109/Access.2015.2477216
Shi, C., & Li, X. (2022). Research on clustering algorithm based on improved SOM neural network. Computational Intelligence and Neuroscience, 2022.
Kumar, S., Kaur, P., & Gosain, A. (2022, April). A Comprehensive Survey on Ensemble Methods. In 2022 IEEE 7th International conference for Convergence in Technology (I2CT) (pp. 1-7). IEEE.
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
Rätsch, G., Warmuth, M. K., & Shawe-Taylor, J. (2005). Efficient Margin Maximizing with Boosting. Journal of Machine Learning Research, 6(12).
Vezhnevets, A., & Vezhnevets, V. (2005, September). Modest AdaBoost-teaching AdaBoost to generalize better. In Graphicon (Vol. 12, No. 5, pp. 987-997).
Haque, A., Parker, B., & Khan, L. (2013, June). Labeling instances in evolving data streams with mapreduce. In 2013 IEEE International Congress on Big Data (pp. 387-394). IEEE.
Indriani, A. F., & Muslim, M. A. (2019). SVM optimization based on PSO and AdaBoost to increasing accuracy of CKD diagnosis. Lontar Komputer: Jurnal Ilmiah Teknologi Informasi, 119-127.
Breiman, L. (1999). Using adaptive bagging to debias regressions (p. 16). Technical Report 547, Statistics Dept. UCB.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8), 832-844.
Bernard, S., Adam, S., & Heutte, L. (2007, September). Using random forests for handwritten digit recognition. In Ninth international conference on document analysis and recognition (ICDAR 2007) (Vol. 2, pp. 1043-1047). IEEE.
Hoens, T. R., Polikar, R., & Chawla, N. V. (2012). Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence, 1(1), 89-101.
Zagajewski, B., Kluczek, M., Raczko, E., Njegovec, A., Dabija, A., & Kycko, M. (2021). Comparison of random forest, support vector machines, and neural networks for post-disaster forest species mapping of the Krkonoše/Karkonosze transboundary biosphere reserve. Remote Sensing, 13(13), 2581.
Ayyadevara, V. K. (2018). Gradient boosting machine. In Pro machine learning algorithms (pp. 117-134). Apress, Berkeley, CA.
Lawrence, R., Bunn, A., Powell, S., & Zambon, M. (2004). Classification of remotely sensed imagery using stochastic gradient boosting as a refinement of classification tree analysis. Remote sensing of environment, 90(3), 331-336.
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., & Chen, K. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4), 1-4.
Freitas, A. A. (2014). Comprehensible classification models: a position paper. ACM SIGKDD explorations newsletter, 15(1), 1-10.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91250-
dc.description.abstract在現實世界中取得的資料日漸複雜,不再只是擁有單一型態的特徵值,而是同時存在類別與數值屬性的混合型資料,這兩種屬性之間經常具有複雜的交互作用。在工業上,製造業方面的資料除了特徵有混合性,因應客戶需求產生了少量多樣 (High-Mixed-Low-Volume)的特性,使得計算量變得更加龐大。
就紡織業而言,染布的實際加工時間會共同受到類別與數值變數不同程度的影響,像是色系、布料、染缸類別、布匹數量等高維度非線性的組合,導致實際染色加工時間的預測更困難,使後續指定機台排程的規劃不易而難以排定產品交期。
為克服上述困難,本研究分為兩階段,第一階段針對混合且高維度的資料集內部的類別屬性進行階層式切分,一步一步找出關鍵影響變數;第二階段則是在階層展開的同時,對於同一變數下的不同分類值進行聚合,讓資料集即使隨時間變動而增加維度或特徵,訓練模型也不會隨之大幅增加,建立準確模型的同時保有高度的可解釋力。
zh_TW
dc.description.abstractThe majority of datasets found in reality are a combination of both categorical and numerical attributes. When it comes to prediction and decision-making tasks, using mixed datasets is considerably more challenging than using purely numerical datasets, since there are complex interactions that exist between these two types of attributes. For example, in a semiconductor dataset, the throughput rate of a chip is affected by several factors such as the type of machine, the material, and the number of wires. However, the impact of the number of wires on the throughput rate can vary depending on the specific combination of machines and products used. Traditional machine learning methods often convert categorical data into numerical data to make the prediction task much easier, but this practice may lead to some issues, such as creating unnecessary sequences between the nominal attributes or generating a high dimensional data thus making the computational time longer.
This study introduces a hierarchical expansion method that segments data hierarchically based on categorical attributes, which helps to address the limitations of traditional machine learning methods. This method also explores the attribute levels in categorical attributes, see if they affect the response variable in a similar way. By applying the method to mixed datasets with complex interaction between variables, the computational effort can be reduced while simultaneously increasing the accuracy of prediction models and can improve the interpretability of machine learning models at the same time. The results of the study indicate that this method has potential for improving mixed datasets from a semiconductor manufacturer. And hierarchical expansion method has higher accuracy than partial combination prediction models which proposed by Chang (2020) and the hierarchical expansion method which proposed by Chang (2021).
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-12-12T16:24:15Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2023-12-12T16:24:16Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents誌謝 i
中文摘要 ii
Abstract iii
目錄 v
圖目錄 vii
表目錄 viii
第一章 緒論 1
1.1 研究背景與動機 1
1.1.1預測混合資料集的挑戰 1
1.1.2 混合資料集的特徵選取 1
1.1.3 機器學習模型的可解釋性與透明化 1
1.1.4 研究背景與動機結論 2
1.2 研究目的 2
1.3 研究方法與流程 2
第二章 文獻探討 4
2.1 應用在混合型資料集的特徵選取方法 4
2.1.1應用在混合型資料的監督式特徵選取方法 4
2.1.2 應用在混合型資料的非監督式特徵選取方法 9
2.1.3 應用在混合型資料特徵選取方法的前處理策略 12
2.2 處理變數間交互作用的方法 13
2.3 應用在混合型資料集的分群方法 16
2.4 應用在混合型資料集的集成學習方法 19
2.5 文獻探討小結 20
第三章 階層式層內聚合演算法 22
3.1原始資料集描述與資料前處理 22
3.2 各階層之定義以及其相對應資料集 23
3.2.1 第零階層之定義以及其相對應資料集 24
3.2.2 第一階層之定義以及其相對應資料集 24
3.2.3 第N階層之定義以及其相對應資料集 26
3.3 各階層預測模型之定義 28
3.4 預測模型之訓練 29
3.4.1 預測模型之屬性 29
3.4.2 模型之超參數設定 30
3.5 階層式層內聚合方法 31
3.6 預測模型之驗證方法 42
第四章 利用資料集做驗證與數值分析 43
4.1 資料集介紹 43
4.2 模型選擇驗證 44
4.2.1 利用半導體資料集的預測效能驗證 44
4.2.2 利用車子資料集的預測效能驗證 46
4.2.3 利用紐約Airbnb資料集的預測效能驗證 48
4.2.4 模型透明度與可解釋性 50
4.3 資料集驗證小結 51
第五章 結論與未來研究方向 52
5.1 結論 52
5.2 未來研究方向 52
參考文獻 53
附錄 61
-
dc.language.isozh_TW-
dc.subject預測模型zh_TW
dc.subject類別屬性值合併zh_TW
dc.subject機器學習zh_TW
dc.subject集成學習zh_TW
dc.subject預測模型zh_TW
dc.subject分群zh_TW
dc.subject類別屬性值合併zh_TW
dc.subject機器學習zh_TW
dc.subject集成學習zh_TW
dc.subject分群zh_TW
dc.subjectaggregate attributes of categorical variablesen
dc.subjectmixed dataen
dc.subjectprediction modelen
dc.subjectensemble learningen
dc.subjectmachine learningen
dc.subjectaggregate attributes of categorical variablesen
dc.subjectmixed dataen
dc.subjectprediction modelen
dc.subjectensemble learningen
dc.subjectmachine learningen
dc.title應用於混合資料集的階層式層內聚合預測方法zh_TW
dc.titleHierarchical Level-Aggregating Prediction Method for Mixed Datasetsen
dc.typeThesis-
dc.date.schoolyear112-1-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳文智;黃道宏zh_TW
dc.contributor.oralexamcommitteeWen-Chih Chen;Dao-Hong Huangen
dc.subject.keyword分群,預測模型,集成學習,機器學習,類別屬性值合併,zh_TW
dc.subject.keywordmixed data,prediction model,ensemble learning,machine learning,aggregate attributes of categorical variables,en
dc.relation.page68-
dc.identifier.doi10.6342/NTU202304279-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2023-10-13-
dc.contributor.author-college工學院-
dc.contributor.author-dept工業工程學研究所-
dc.date.embargo-lift2028-10-01-
顯示於系所單位:工業工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-112-1.pdf
  此日期後於網路公開 2028-10-01
2.29 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved