Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21388
Title: 正則化變數選取方法應用於高度相關資料之比較
Comparison of Regularization Methods for Variable Selection in Highly Correlated Data
Authors: Ching-Hsuan Chang
張靜萱
Advisor: 周呈霙
Keyword: 變數選取,正則化方法,群體效應,健檢資料,迴歸資料,
variable selection,regularization method,grouping effect,healthcare data,regression data,
Publication Year : 2019
Degree: 碩士
Abstract: 正則化變數選取方法能對共線性資料進行維度縮減。舉例來說,彈性網(elastic net)就被證明對高度相關的變數群具備整群選入或選出的群體效應(grouping effect)。本研究旨在比較四種正則化變數選取方法:LASSO、彈性網、經驗貝氏 LASSO(empirical Bayesian LASSO,EBLASSO),以及經驗貝氏彈性網(empirical Bayesian elastic net,EBENet),在資料具有不同程度相關性下的選取行為。經由模擬研究,在固定的樣本大小和變數數量下,我有以下發現:
(1)對於高度相關僅存在於真實變數間的資料,彈性網具較好的選擇、估計係數,和預測能力。
(2)對於相關性存在於真實變數和無關變數(irrelevant variables)間的資料,EBLASSO和EBENet在選擇、估計係數,和預測能力方面都是很好的選擇。
(3)一般而言,隨著相關性降低,四種正則化方法在變數選擇及係數估計能力上都會有所提升。
最後,在對真實健康檢查資料集進行正則化方法比較時,發現EBLASSO表現最佳。由於相關性有可能存在於真實變數和無關變數間,所以此結果可呼應上面的模擬結論(2)。另外,觀察資料集中存在高度相關的幾個變數群,可以發現彈性網對於這些變數群(如:身高、體重、除脂肪淨體重)有整群選入的行為,此現象則可呼應彈性網的群體效應定理。因此,本研究認為由於正則化方法在模擬結果和真實資料分析的表現可以相對應,故模擬研究的發現或可做為實際資料分析時選擇變數選取方法的參考。
The regularization methods are capable of performing variable selection for collinear data. For example, elastic net has been proven to have a grouping effect to select all or none of a group of highly correlated variables. The objective of the study is to compare four regularization methods, i.e., LASSO, elastic net, empirical Bayesian LASSO (EBLASSO), and empirical Bayesian elastic net (EBENet), with their selection behaviors under different levels of correlations. Through simulation studies, at a fixed sample size and number of variables, I found that:
(1) For data in which high correlation only exists between true variables, elastic net should be chosen for its better abilities to select, to estimate coefficients, and to predict.
(2) For data in which correlations exist between true variables and irrelevant variables, regardless of the levels of correlations, EBLASSO and EBENet are good choices because of the abilities to select, to estimate coefficients, and to predict.
(3) In general, as the correlations decrease, the four regularization methods improve in terms of variable selection and coefficient estimation.
Finally, EBLASSO was found to outperform the other three regularization methods for the healthcare dataset. Since the correlation may exist between true variables and irrelevant variables, this result can echo the second conclusion above. In addition, for several groups of highly correlated variables in the real datasets, such as height, weight and lean body mass, the elastic net selected all variables from the variables groups into the model. This phenomenon matches the theorem of grouping effect. Therefore, this study believes that because of the consistency between the results from simulation and the outcomes from real data analysis, the findings from the simulation study may be used as a reference for selecting the variable selection method in the real data analysis.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/21388
DOI: 10.6342/NTU201902881
Fulltext Rights: 未授權
Appears in Collections:統計碩士學位學程

Files in This Item:
File SizeFormat 
ntu-108-1.pdf
  Restricted Access
923.91 kBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved