基於分割法的無母數迴歸

郭晉良; Chin-Liang Kuo

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036

標題:	基於分割法的無母數迴歸 Non-parametric Regression Using Partitioning Methods
作者:	郭晉良 Chin-Liang Kuo
指導教授:	張明中 Ming-Chung Chang
共同指導教授:	楊鈞澔 Chun-Hao Yang
關鍵字:	機器學習,無母數迴歸,監督式學習,分割法,資料壓縮,迴歸樹,群集分析, Machine Learning,Non-parametric Regression,Supervised Learning,Segmentation Method,Data Compression,Regression Tree,Cluster Analysis,
出版年 :	2024
學位:	碩士
摘要:	本文研究了一種基於分割法的無母數迴歸技術，旨在提升迴歸問題的預測精度。本文首先介紹了使用機器學習處理分類及迴歸問題的基本概念，再針對處理迴歸問題的常見方法進行探討，最後聚焦在本文所使用的無母數迴歸方法。在本文的核心研究中，提出了一種新的演算法 PE-Kmeans，該演算法在非監督式學習中的 K-means 演算法的基礎上進行改進，形成二階段的分群方法。第一階段在輸出空間進行 K-means 分群，第二階段則在每個母群中進行輸入空間的再次 K-means 分群，這種方法充分考慮了輸出變量的信息，使得它成為一個監督式學習模型，可以用來處理迴歸問題。本文以 Supervised Compression 及著名的 Regression Tree作為比較模型，前項方法通過選擇性以輸入空間或輸出空間作為分割中心點，逐步將輸入空間分割成不規則的 Voronoi region 子區域，後項方法則是透過二元分類將輸入空間分割為長方形。通過對模擬資料和真實世界資料的實驗，本文驗證了前述三種方法在不同情境下的性能，實驗結果表明，PE-Kmeans在處理相對不平滑的函數及真實世界資料時，能夠更有效地進行預測。 This study investigates a non-parametric regression technique based on segmentation, aimed at enhancing the prediction accuracy of regression problems. The paper first introduces the basic concepts of using machine learning to handle classification and regression problems. It then discusses common methods for addressing regression issues, with a focus on the non-parametric regression method employed in this study. At the core of this research, a new algorithm called PE-Kmeans is proposed. This algorithm improves upon the K-means algorithm used in unsupervised learning, forming a two-stage clustering method. In the first stage, K-means clustering is performed in the output space. In the second stage, K-means clustering is again performed in the input space within each parent cluster. This method fully considers the information of the output variables, making it a supervised learning model suitable for handling regression problems. The study compares the proposed method with Supervised Compression and the well-known Regression Tree. The former method selectively uses either the input space or the output space as the segmentation center, gradually dividing the input space into irregular Voronoi region sub-regions. The latter method segments the input space into rectangles through binary classification. Through experiments on simulated and real-world data, the study validates the performance of the three aforementioned methods under different scenarios. The experimental results demonstrate that PE-Kmeans can more effectively make predictions when dealing with relatively non-smooth functions and real-world data.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036
DOI:	10.6342/NTU202401551
全文授權:	同意授權(全球公開)
顯示於系所單位：	資料科學學位學程

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf	1.43 MB	Adobe PDF	檢視/開啟

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。