請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93452
標題: | 高維度下 Spike-and-Slab與Horseshoe 先驗的穩健性比較 A Comparative Study of Spike-and-Slab and Horseshoe Prior for Robustness in High Dimension |
作者: | 練政楷 Cheng-Kai Lien |
指導教授: | 楊鈞澔 Chun-Hao YANG |
關鍵字: | 貝葉斯變量選擇,SpikeandSlab先驗,Horseshoe先驗,穩健性質,高維數據, Bayesian variable selection,Spike-and-Slab prior,Horseshoe prior,robustness,high-dimension data, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 在統計分析中,當觀測數量 (n) 大於變量數 (p) 的情況下,一般採用頻率學派的方法進行變量選擇,這些方法在大多數狀況,模型的表現都還不錯。然而,當數據為稀疏高維度時,即變量數 (p) 遠大於觀測數量 (n),使用一般常用的傳統頻率學派的方法可能面臨一些挑戰。本文使用貝葉斯方法作為變量選擇的一種替代方案。貝葉斯方法通過引入先驗分佈,為處理稀疏高維度數據提供了一種靈活的做法。本文首先比較兩種貝葉斯先驗分佈,分別是Spike-and-Slab先驗和Horseshoe先驗。這兩種先驗分佈在處理稀疏高維度數據時各有優缺點。本文將探討不同情況下這兩種先驗分佈的表現差異。此外,在本研究中,我們將挑選較為穩健的先驗分佈,並探討使用「後驗極端值調整平均數」取代「後驗平均數」比較兩者在變量選擇上的差異。所謂後驗極端值調整平均數,是指我們採用公認的穩健方法來調整先前選定的穩健先驗分佈,以進行有依據的極端值調整操作。文章將分析在不同數據集及參數設定下,各方法的表現如何,並評估哪種方法對數據異常值的穩健性質更佳。本研究的目的為在稀疏高維度數據的變量選擇提供更有效且穩健的貝葉斯方法,並期望應用在實際資料時,提供實用的參考依據。 When the number of observations (n) exceeds the number of variables (p), classic frequentist approaches for variable selection are commonly utilized, and they work well in the majority of situations. Traditional frequentist approaches, on the other hand, could be challenged when the data is sparse and high-dimensional, which means that the number of variables (p) greatly exceeds the number of observations (n). This research employs Bayesian approaches as an alternative way of variable selection. By incorporating prior distributions, Bayesian approaches provide a flexible method to dealing with sparse, high-dimensional data. This study begins by comparing two Bayesian priors: the Spike-and-Slab prior and the Horseshoe prior. Each of these priors has pros and cons when working with sparse, high-dimensional data. The study will investigate the performance differences between these two priors under a variety of situations. Furthermore, in this study, we will choose a more robust prior distribution and use the "posterior winsorized mean" instead of the "posterior mean," comparing the differences in variable selection between the two methods. The "posterior winsorized mean" refers to utilizing a recognized robust method to adjust the previously selected robust prior distribution, hence executing a justified winsorization rather than an arbitrary adjustment. This paper will compare the performance of various algorithms across different datasets and parameter settings, determining which method provides greater resistance against outliers. This study aims to provide a more effective and robust Bayesian approach for variable selection in sparse high-dimensional data, as well as practical advices for actual data applications. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/93452 |
DOI: | 10.6342/NTU202402347 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 統計與數據科學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-2.pdf | 569.17 kB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。