用於高維變數選擇的最小角迴歸演算法之研究

余信宏; Hsin-Hung Yu

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100218

標題:	用於高維變數選擇的最小角迴歸演算法之研究 Enhancement of LARS algorithm for high dimensional variable selection
作者:	余信宏 Hsin-Hung Yu
指導教授:	陳正剛 Argon Chen
關鍵字:	變數選擇,最小角迴歸,前向分段選擇法,前向逐步選擇法,普適型相對重要性,高維數據分析,小樣本問題,非滿秩, Variable Selection,Least Angle Regression,Forward Stepwise Selection,Forward Stepwise Selection,Comprehensive Relative Importance,LASSO,Elastic Net,
出版年 :	2025
學位:	碩士
摘要:	最小角迴歸（LARS）是一種高效率的迴歸與特徵選取演算法，其思想源於較慢的「前向分段」(Forward Stagewise)方法。LARS 不再採取微小的步伐，而是精確計算出最大的一步，沿著與多個變數保持「等夾角」的獨特路徑前進。這個優雅的幾何路徑不僅大幅提升了運算效率，也可快速產生與 LASSO 方法完全相同的變數選取路徑，能清晰地觀察變數被納入模型的順序。然而在高維度資料中，變數篩選對於如最小角迴歸（LARS）等傳統演算法而言，是一個巨大的挑戰，因為這些演算法在根本上被限制，無法篩選出比觀測值數量還多的預測變數。本論文旨在介紹並評估數種為克服此限制而提出的新穎修正方法。首先，我們提出了一些混合方法，這些方法動態地結合了LARS、前向逐步選擇法 (Forward Stepwise Selection) 以及基於相關性的策略。然而，我們最主要的貢獻是提出了一種(repetition modification）方法。這是一個通用的程序，它透過對殘差進行迭代擬合，讓序列篩選方法能夠有效地處理自變數個數多於觀測值（p>n）的秩虧（rank-deficient）情況。我們透過大量的模擬和真實資料集，將這些新方法與現有的技術（包括LASSO、Elastic Net和CRI）進行了評估。結果表明，重複修正成功地提升了LARS和LARS-LASSO的變數選擇能力，使其在高維度設定下能比標準方法識別出更多相關的預測變數，並在真實案例研究中達到了具有競爭力的準確率（accuracy）和接收者操作特徵曲線下面積(AUC)。本研究證實，我們所提出的重複修正方法是擴展序列篩選演算法實用性的一個有價值且有效的策略。 Least Angle Regression (LARS) is a highly efficient algorithm for regression and feature selection, whose idea originates from the slower Forward Stagewise method. LARS no longer takes tiny steps, but instead precisely calculates the largest possible step, proceeding along a unique path that remains 'equiangular' to multiple variables. This elegant geometric path not only drastically improves computational efficiency but can also be quickly modified to generate the exact same variable selection path as the LASSO method, allowing for a clear observation of the order in which variables are included in the model. However, high-dimensional variable selection presents a significant challenge for traditional algorithms like Least Angle Regression, which are fundamentally limited to selecting fewer predictors than there are observations. This thesis introduces and evaluates several novel modifications to overcome this constraint. We first propose hybrid methods that dynamically combine LARS with Forward Stepwise Selection and correlation-based heuristics. Our primary contribution, however, is a repetition modification, a general procedure that allows sequential selection methods to effectively operate in rank-deficient, p>n (the number of predictors is larger than the number of observations) scenarios by iteratively fitting residuals. We evaluated these new methods against established techniques, including LASSO, Elastic Net, and CRI, through extensive simulations and on real datasets. The results demonstrate that the repetition modification successfully enhances the performance of LARS and LARS-LASSO, allowing them to identify more relevant predictors than standard methods in high-dimensional settings and achieving competitive accuracy and area under receiver operating characteristic curve (AUC) in real-data case studies. This work confirms that the proposed repetition framework is a valuable and effective strategy for extending the utility of sequential selection algorithms.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100218
DOI:	10.6342/NTU202504031
全文授權:	同意授權(全球公開)
電子全文公開日期:	2030-08-01
顯示於系所單位：	統計碩士學位學程

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 此日期後於網路公開 2030-08-01	6.52 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。