Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036
Title: | 基於分割法的無母數迴歸 Non-parametric Regression Using Partitioning Methods |
Authors: | 郭晉良 Chin-Liang Kuo |
Advisor: | 張明中 Ming-Chung Chang |
Co-Advisor: | 楊鈞澔 Chun-Hao Yang |
Keyword: | 機器學習,無母數迴歸,監督式學習,分割法,資料壓縮,迴歸樹,群集分析, Machine Learning,Non-parametric Regression,Supervised Learning,Segmentation Method,Data Compression,Regression Tree,Cluster Analysis, |
Publication Year : | 2024 |
Degree: | 碩士 |
Abstract: | 本文研究了一種基於分割法的無母數迴歸技術,旨在提升迴歸問題的預測精度。本文首先介紹了使用機器學習處理分類及迴歸問題的基本概念,再針對處理迴歸問題的常見方法進行探討,最後聚焦在本文所使用的無母數迴歸方法。在本文的核心研究中,提出了一種新的演算法 PE-Kmeans,該演算法在非監督式學習中的 K-means 演算法的基礎上進行改進,形成二階段的分群方法。第一階段在輸出空間進行 K-means 分群,第二階段則在每個母群中進行輸入空間的再次 K-means 分群,這種方法充分考慮了輸出變量的信息,使得它成為一個監督式學習模型,可以用來處理迴歸問題。本文以 Supervised Compression 及著名的 Regression Tree作為比較模型,前項方法通過選擇性以輸入空間或輸出空間作為分割中心點,逐步將輸入空間分割成不規則的 Voronoi region 子區域,後項方法則是透過二元分類將輸入空間分割為長方形。通過對模擬資料和真實世界資料的實驗,本文驗證了前述三種方法在不同情境下的性能,實驗結果表明,PE-Kmeans在處理相對不平滑的函數及真實世界資料時,能夠更有效地進行預測。 This study investigates a non-parametric regression technique based on segmentation, aimed at enhancing the prediction accuracy of regression problems. The paper first introduces the basic concepts of using machine learning to handle classification and regression problems. It then discusses common methods for addressing regression issues, with a focus on the non-parametric regression method employed in this study. At the core of this research, a new algorithm called PE-Kmeans is proposed. This algorithm improves upon the K-means algorithm used in unsupervised learning, forming a two-stage clustering method. In the first stage, K-means clustering is performed in the output space. In the second stage, K-means clustering is again performed in the input space within each parent cluster. This method fully considers the information of the output variables, making it a supervised learning model suitable for handling regression problems. The study compares the proposed method with Supervised Compression and the well-known Regression Tree. The former method selectively uses either the input space or the output space as the segmentation center, gradually dividing the input space into irregular Voronoi region sub-regions. The latter method segments the input space into rectangles through binary classification. Through experiments on simulated and real-world data, the study validates the performance of the three aforementioned methods under different scenarios. The experimental results demonstrate that PE-Kmeans can more effectively make predictions when dealing with relatively non-smooth functions and real-world data. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93036 |
DOI: | 10.6342/NTU202401551 |
Fulltext Rights: | 同意授權(全球公開) |
Appears in Collections: | 資料科學學位學程 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-112-2.pdf | 1.43 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.