Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 共同教育中心
  3. 統計碩士學位學程
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84158
Title: 基於變數置換以及隨機森林之變數挑選方法
A Permutation-Based Feature Selection (PBFS) Approach Using Random Forests
Authors: Yi-Jhen Wu
吳宜珍
Advisor: 蔡政安(Chen-An Tsai)
Keyword: 隨機森林,基於變數置換的特徵挑選,共現網絡分析,變數關係視覺化,變數共線性,
Random Forest,Permutation-Based Feature Selection,co-occurrence analysis,variable relationship visualization,variable multicollinearity,
Publication Year : 2022
Degree: 碩士
Abstract: 在過去的幾十年中,變數挑選經常被使用於降維,根據特定的閥值從原始特徵集中選擇合適的特徵子集,在高維數據中選擇顯著的變量來優化模型識別和分類非常重要,因此 在許多研究和應用領域,數據挖掘技術非常依賴變數挑選,尤其是在機器學習算法中。 在本文中,我們提出了一種新的特徵選擇方法PBFS,PBFS使用隨機森林模型同時控制FDR來進行變數挑選,與其他現有的變數挑選方法相比,我們使用兩個真實數據集和四個模擬數據來評估我們提出的方法的有效性,發現多重共線性可能對所選變量產生很大影響。 一般來說,PBFS方法比其他四種特徵選擇方法具有優勢;此外,我們通過共現網絡分析的PBFS中引導聚合決策樹結果可視化變量之間的關係。
During the past decades, feature selection has been used in dimensionality reduction to select suitable feature subsets from the original set of features according to certain criteria. It is especially important to choose significant variables in high-dimensional data to improve model identification and classification accuracy. In many research and application areas, data mining techniques rely heavily on feature selection methods, especially in machine learning algorithms. In this thesis, a new feature selection approach called Permutation-Based Feature Selection (PBFS) is proposed by using a random forest model while controlling false discovery rate (FDR) to perform the feature selection. Two real datasets and four simulation studies are used to evaluate the effectiveness of our proposed approach compared to the other well-known existing feature selection methods. It was found that multicollinearity could have a great impact on the selected variables. In general, the PBFS method showed advantages over the other four feature selection methods. In addition, we visualized the relationship among variables through bagged decision trees results from PBFS based on the co-occurrence network analysis.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/84158
DOI: 10.6342/NTU202104467
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2022-07-12
Appears in Collections:統計碩士學位學程

Files in This Item:
File SizeFormat 
U0001-0711202113092500.pdf
Access limited in NTU ip range
837.35 kBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved