Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100218
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳正剛zh_TW
dc.contributor.advisorArgon Chenen
dc.contributor.author余信宏zh_TW
dc.contributor.authorHsin-Hung Yuen
dc.date.accessioned2025-09-24T16:53:29Z-
dc.date.available2025-09-25-
dc.date.copyright2025-09-24-
dc.date.issued2025-
dc.date.submitted2025-08-12-
dc.identifier.citationEfron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.
Weisberg, S. (2005). Applied linear regression (Vol. 528). John Wiley & Sons.
Johnson, R. M. (1966). The minimal transformation to orthonormality. Psychometrika, 31(1), 61-66.
Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate behavioral research, 35(1), 1-19.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 1226-1238.
Friedman, J. H., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33, 1-22.
Shen, Z., & Chen, A. (2020). Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis. Knowledge-Based Systems, 203, 106120.
Chang, T. & Chen, A. (2025). Understanding and Using the Relative Importance Measures Based on Orthonormality Transformation. Manuscript submitted for publication.
Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., ... & Golub, T. R. (2002). Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870), 436-442.
Freije, W. A., Castro-Vargas, F. E., Fang, Z., Horvath, S., Cloughesy, T., Liau, L. M., ... & Nelson, S. F. (2004). Gene expression profiling of gliomas strongly predicts survival. Cancer research, 64(18), 6503-6510.
Borovecki, F., Lovrecic, L., Zhou, J., Jeong, H., Then, F., Rosas, H. D., ... & Krainc, D. (2005). Genome-wide expression profiling of human blood reveals biomarkers for Huntington's disease. Proceedings of the National Academy of Sciences, 102(31), 11023-11028.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., ... & Mesirov, J. P. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545-15550.
Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., ... & Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer cell, 1(2), 203-209.
Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12), 6745-6750.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
Zuber, V., & Strimmer, K. (2011). High-dimensional regression and variable selection using CAR scores. Statistical Applications in Genetics and Molecular Biology, 10(1).
Waller, N. G. (2018). Generating Correlation Matrices With Specified Eigenvalues Using the Method of Alternating Projections. The American Statistician, 74(1), 21–28. https://doi.org/10.1080/00031305.2017.1401960
Hastie, T., Tibshirani, R., & Tibshirani, R. (2020). Best subset, forward stepwise or lasso? Analysis and recommendations based on extensive comparisons. Statistical Science, 35(4), 579-592.
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5), 849-911.
Hocking, R. R., & Leslie, R. N. (1967). Selection of the best subset in regression analysis. Technometrics, 9(4), 531-540.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100218-
dc.description.abstract最小角迴歸(LARS)是一種高效率的迴歸與特徵選取演算法,其思想源於較慢的「前向分段」(Forward Stagewise)方法。LARS 不再採取微小的步伐,而是精確計算出最大的一步,沿著與多個變數保持「等夾角」的獨特路徑前進。這個優雅的幾何路徑不僅大幅提升了運算效率,也可快速產生與 LASSO 方法完全相同的變數選取路徑,能清晰地觀察變數被納入模型的順序。
然而在高維度資料中,變數篩選對於如最小角迴歸(LARS)等傳統演算法而言,是一個巨大的挑戰,因為這些演算法在根本上被限制,無法篩選出比觀測值數量還多的預測變數。本論文旨在介紹並評估數種為克服此限制而提出的新穎修正方法。
首先,我們提出了一些混合方法,這些方法動態地結合了LARS、前向逐步選擇法 (Forward Stepwise Selection) 以及基於相關性的策略。然而,我們最主要的貢獻是提出了一種(repetition modification)方法。這是一個通用的程序,它透過對殘差進行迭代擬合,讓序列篩選方法能夠有效地處理自變數個數多於觀測值(p>n)的秩虧(rank-deficient)情況。
我們透過大量的模擬和真實資料集,將這些新方法與現有的技術(包括LASSO、Elastic Net和CRI)進行了評估。結果表明,重複修正成功地提升了LARS和LARS-LASSO的變數選擇能力,使其在高維度設定下能比標準方法識別出更多相關的預測變數,並在真實案例研究中達到了具有競爭力的準確率(accuracy)和接收者操作特徵曲線下面積(AUC)。本研究證實,我們所提出的重複修正方法是擴展序列篩選演算法實用性的一個有價值且有效的策略。
zh_TW
dc.description.abstractLeast Angle Regression (LARS) is a highly efficient algorithm for regression and feature selection, whose idea originates from the slower Forward Stagewise method. LARS no longer takes tiny steps, but instead precisely calculates the largest possible step, proceeding along a unique path that remains 'equiangular' to multiple variables. This elegant geometric path not only drastically improves computational efficiency but can also be quickly modified to generate the exact same variable selection path as the LASSO method, allowing for a clear observation of the order in which variables are included in the model.
However, high-dimensional variable selection presents a significant challenge for traditional algorithms like Least Angle Regression, which are fundamentally limited to selecting fewer predictors than there are observations. This thesis introduces and evaluates several novel modifications to overcome this constraint. We first propose hybrid methods that dynamically combine LARS with Forward Stepwise Selection and correlation-based heuristics. Our primary contribution, however, is a repetition modification, a general procedure that allows sequential selection methods to effectively operate in rank-deficient, p>n (the number of predictors is larger than the number of observations) scenarios by iteratively fitting residuals.
We evaluated these new methods against established techniques, including LASSO, Elastic Net, and CRI, through extensive simulations and on real datasets. The results demonstrate that the repetition modification successfully enhances the performance of LARS and LARS-LASSO, allowing them to identify more relevant predictors than standard methods in high-dimensional settings and achieving competitive accuracy and area under receiver operating characteristic curve (AUC) in real-data case studies. This work confirms that the proposed repetition framework is a valuable and effective strategy for extending the utility of sequential selection algorithms.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-09-24T16:53:29Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-09-24T16:53:29Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents論文口試審定書 I
摘要 II
ABSTRACT III
目次 V
圖次 VII
表次 VIII
CHAPTER 1 INTRODUCTION 1
1.1 VARIABLE SELECTION USING LARS IN A MULTIPLE LINEAR REGRESSION MODEL 1
1.2 LIMITATIONS IN HIGH-DIMENSIONAL DATA 1
CHAPTER 2 LITERATURE REVIEW 3
2.1 CHOOSING VARIABLES THROUGH THE ORDER OF THE PROCEDURE OF PREDICTORS CHOSEN IN THE REGRESSION MODEL. 3
2.2 THE LASSO AND THE LASSO MODIFICATION OF LARS (LARS-LASSO) 11
CHAPTER 3 ENHANCEMENT OF LARS 14
3.1 HYBRID OF LARS AND FORWARD STEPWISE SELECTION 15
3.1.1 LARSStepPotential 16
3.1.2 LARSStepAngle 19
3.2 LARS AND CORRELATION 23
3.2.1 LARSCor 23
3.3 THE REPETITION MODIFICATION CORRESPONDS TO THE RANK OF THE DESIGN MATRIX 27
CHAPTER 4 SIMULATION 31
4.1 SIMULATION DESIGN 31
4.2 COMPARISON METHODS 33
4.3 THE DIFFERENT PARAMETERS FOR LARSSTEPPOTENTIAL AND LARSSTEPANGLE 35
4.4 COMPARISON OF ALL METHODS 37
CHAPTER 5 REAL CASE COMPARISON 39
5.1 EXPERIMENT PROCEDURE 40
5.2 EXPERIMENT RESULT 41
5.2.1 Real Cases that Our Methods outperform 42
5.2.2 Real Cases that Our Methods are less competitive 44
CHAPTER 6 CONCLUSION 46
REFERENCE 49
APPENDIX A1: PLOTS FOR OVERALL COMPARISON IN LOW DIMENSIONAL DATASET (N, P, S) = (100, 10, 5) 52
APPENDIX A2: PLOTS FOR OVERALL COMPARISON IN HIGH DIMENSIONAL DATASET (N, P, S) = (50, 1000, 5) 56
-
dc.language.isoen-
dc.subject變數選擇zh_TW
dc.subject最小角迴歸zh_TW
dc.subject前向分段選擇法zh_TW
dc.subject前向逐步選擇法zh_TW
dc.subject普適型相對重要性zh_TW
dc.subject高維數據分析zh_TW
dc.subject小樣本問題zh_TW
dc.subject非滿秩zh_TW
dc.subjectLASSOen
dc.subjectLeast Angle Regressionen
dc.subjectForward Stepwise Selectionen
dc.subjectForward Stepwise Selectionen
dc.subjectComprehensive Relative Importanceen
dc.subjectElastic Neten
dc.subjectVariable Selectionen
dc.title用於高維變數選擇的最小角迴歸演算法之研究zh_TW
dc.titleEnhancement of LARS algorithm for high dimensional variable selectionen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee胡明哲;藍俊宏;黃奎隆zh_TW
dc.contributor.oralexamcommitteeMing-Che Hu;Jakey Blue;Kweilong Huangen
dc.subject.keyword變數選擇,最小角迴歸,前向分段選擇法,前向逐步選擇法,普適型相對重要性,高維數據分析,小樣本問題,非滿秩,zh_TW
dc.subject.keywordVariable Selection,Least Angle Regression,Forward Stepwise Selection,Forward Stepwise Selection,Comprehensive Relative Importance,LASSO,Elastic Net,en
dc.relation.page59-
dc.identifier.doi10.6342/NTU202504031-
dc.rights.note同意授權(全球公開)-
dc.date.accepted2025-08-14-
dc.contributor.author-college共同教育中心-
dc.contributor.author-dept統計碩士學位學程-
dc.date.embargo-lift2030-08-01-
顯示於系所單位:統計碩士學位學程

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  此日期後於網路公開 2030-08-01
6.52 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved