基於分割法的無母數迴歸

郭晉良; Chin-Liang Kuo

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張明中	zh_TW
dc.contributor.advisor	Ming-Chung Chang	en
dc.contributor.author	郭晉良	zh_TW
dc.contributor.author	Chin-Liang Kuo	en
dc.date.accessioned	2024-07-12T16:23:12Z	-
dc.date.available	2024-07-13	-
dc.date.copyright	2024-07-12	-
dc.date.issued	2024	-
dc.date.submitted	2024-07-08	-
dc.identifier.citation	T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y.Wu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881–892, 2002. Omer Sagi and Lior Rokach. Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery, 8(4):e1249, 2018. Oludare Isaac Abiodun, Aman Jantan, Abiodun Esther Omolara, Kemi Victoria Dada, Nachaat AbdElatif Mohamed, and Humaira Arshad. State-of-the-art in ar-tificial neural network applications: A survey. Heliyon, 4(11):e00938, 2018. Wolfgang Härdle. Applied nonparametric regression. Number 19. Cambridge uni-versity press, 1990. James N. Morgan and John A. Sonquist. Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302):415–434, 1963. Badr HSSINA, Abdelkarim MERBOUHA, Hanane EZZIKOURI, and Mohammed ERRITALI. A comparative study of decision tree id3 and c4.5. International Journal of Advanced Computer Science and Applications(IJACSA), Special Issue on Advances in Vehicular Ad Hoc Networking and Applications 2014, 4(2), 2014. L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees. Taylor & Francis, 1984. F. Esposito, D. Malerba, G. Semeraro, and J. Kay. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476–491, 1997. V. Roshan Joseph and Simon Mak. Supervised compression of big data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14(3):217–229, 2021. Partitioning Estimates, In: A Distribution-Free Theory of Nonparametric Regression, pages 52–69. Springer New York, New York, NY, 2002. J. MacQueen. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1:281–297, 1967.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036	-
dc.description.abstract	本文研究了一種基於分割法的無母數迴歸技術，旨在提升迴歸問題的預測精度。本文首先介紹了使用機器學習處理分類及迴歸問題的基本概念，再針對處理迴歸問題的常見方法進行探討，最後聚焦在本文所使用的無母數迴歸方法。在本文的核心研究中，提出了一種新的演算法 PE-Kmeans，該演算法在非監督式學習中的 K-means 演算法的基礎上進行改進，形成二階段的分群方法。第一階段在輸出空間進行 K-means 分群，第二階段則在每個母群中進行輸入空間的再次 K-means 分群，這種方法充分考慮了輸出變量的信息，使得它成為一個監督式學習模型，可以用來處理迴歸問題。本文以 Supervised Compression 及著名的 Regression Tree作為比較模型，前項方法通過選擇性以輸入空間或輸出空間作為分割中心點，逐步將輸入空間分割成不規則的 Voronoi region 子區域，後項方法則是透過二元分類將輸入空間分割為長方形。通過對模擬資料和真實世界資料的實驗，本文驗證了前述三種方法在不同情境下的性能，實驗結果表明，PE-Kmeans在處理相對不平滑的函數及真實世界資料時，能夠更有效地進行預測。	zh_TW
dc.description.abstract	This study investigates a non-parametric regression technique based on segmentation, aimed at enhancing the prediction accuracy of regression problems. The paper first introduces the basic concepts of using machine learning to handle classification and regression problems. It then discusses common methods for addressing regression issues, with a focus on the non-parametric regression method employed in this study. At the core of this research, a new algorithm called PE-Kmeans is proposed. This algorithm improves upon the K-means algorithm used in unsupervised learning, forming a two-stage clustering method. In the first stage, K-means clustering is performed in the output space. In the second stage, K-means clustering is again performed in the input space within each parent cluster. This method fully considers the information of the output variables, making it a supervised learning model suitable for handling regression problems. The study compares the proposed method with Supervised Compression and the well-known Regression Tree. The former method selectively uses either the input space or the output space as the segmentation center, gradually dividing the input space into irregular Voronoi region sub-regions. The latter method segments the input space into rectangles through binary classification. Through experiments on simulated and real-world data, the study validates the performance of the three aforementioned methods under different scenarios. The experimental results demonstrate that PE-Kmeans can more effectively make predictions when dealing with relatively non-smooth functions and real-world data.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-07-12T16:23:12Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-07-12T16:23:12Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	Verification Letter from the Oral Examination Committee　　　i Acknowledgements　　　ii 摘要　　　iii Abstract　　　iv Contents　　　v List of Figures　　　viii List of Tables　　　x Denotation　　　xi Chapter 1 緒論　　　1 1.1 監督式學習　　　1 1.2 處理迴歸問題的常見方法　　　2 1.2.1 線性模型　　　2 1.2.2 集成學習　　　4 1.2.3 類神經網路　　　4 1.3 無母數迴歸　　　5 1.4 研究目的　　　6 1.5 論文架構　　　6 Chapter 2 文獻回顧　　　7 2.1 Regression Tree　　　7 2.2 Supervised Compression　　　8 2.3 Partitioning Estimate　　　9 Chapter 3 研究方法　　　10 3.1 模型　　　10 3.1.1 以Supervised Compression 做Partitioning Estimate　　　10 3.1.2 以PE-Kmeans 做Partitioning Estimate　　　11 3.1.3 模型超參數設定　　　12 3.2 比較方法之指標　　　13 3.2.1 預測誤差　　　13 3.2.2 組間、組內之殘差平方和分析　　　13 3.2.3 運行時間　　　14 3.2.4 可視化　　　14 Chapter 4 模擬資料分析　　　15 4.1 資料集設定　　　15 4.2 模擬　　　15 4.2.1 Two Dimensional Michalewicz 函數　　　15 4.2.1.1 可視化　　　21 4.2.2 Dropwave 函數　　　26 4.2.2.1 可視化　　　31 4.2.3 OTL Circuit 函數　　　36 4.2.4 Piston 函數　　　41 4.2.5 Borehole 函數　　　44 4.2.6 函數模擬實驗總結　　　46 Chapter 5 真實資料分析　　　48 5.1 資料集設定　　　49 5.2 預測誤差　　　49 5.3 組內組間變異分析　　　50 5.3.1 組內變異　　　50 5.3.2 組間變異　　　50 5.4 運行時間　　　51 Chapter 6 結論與未來展望　　　52 References　　　54	-
dc.language.iso	zh_TW	-
dc.subject	機器學習	zh_TW
dc.subject	無母數迴歸	zh_TW
dc.subject	監督式學習	zh_TW
dc.subject	分割法	zh_TW
dc.subject	資料壓縮	zh_TW
dc.subject	迴歸樹	zh_TW
dc.subject	群集分析	zh_TW
dc.subject	Segmentation Method	en
dc.subject	Non-parametric Regression	en
dc.subject	Supervised Learning	en
dc.subject	Cluster Analysis	en
dc.subject	Regression Tree	en
dc.subject	Data Compression	en
dc.subject	Machine Learning	en
dc.title	基於分割法的無母數迴歸	zh_TW
dc.title	Non-parametric Regression Using Partitioning Methods	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.coadvisor	楊鈞澔	zh_TW
dc.contributor.coadvisor	Chun-Hao Yang	en
dc.contributor.oralexamcommittee	紀建名;黃學涵	zh_TW
dc.contributor.oralexamcommittee	Chien-Ming Chi;Hsueh-Han Huang	en
dc.subject.keyword	機器學習,無母數迴歸,監督式學習,分割法,資料壓縮,迴歸樹,群集分析,	zh_TW
dc.subject.keyword	Machine Learning,Non-parametric Regression,Supervised Learning,Segmentation Method,Data Compression,Regression Tree,Cluster Analysis,	en
dc.relation.page	55	-
dc.identifier.doi	10.6342/NTU202401551	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2024-07-09	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資料科學學位學程	-
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	1.43 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。