請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749
標題: | 探討變數型態以及分群方式對本質相關係數估值之影響 Behavior of CID under Different Association Types and Binning Processes |
作者: | Miao-Shan Yen 顏妙珊 |
指導教授: | 劉力瑜(Li-Yu Daisy Liu) |
關鍵字: | 樣式辨認,特徵選取,本質相關係數,分群過程,階層式分群法,分位數分群法, pattern recognition,feature selection,coefficient of intrinsic dependence,binning process,hierarchical allocation,quantile allocation, |
出版年 : | 2010 |
學位: | 碩士 |
摘要: | 樣式辨認(pattern recognition) 經常運用於資訊檢索上, 其目的為建立一個分類的標準以辨識未知的數據。通常使用特徵選取feature selection) 作為樣式辨認的方法, 意即在一群資料中選出足以表現此筆資料的特徵進而建立資料分類的標準, 以便日後預測新的資料。由於現今生物資訊學的發展使得生物資料產出日益龐大, 高維度的資料使得樣式辨認變得難以進行,而特徵選取因為能夠有效減少資料的維度故成為在樣式辨認中重要的一環。近年來, 本質相關係數已被提出可運用到特徵選取的課題上。本論文的目的在於檢視本質相關係數在變數間有不同相關程度下的表現, 以及研究利用階層式分群法或分位數分群法作為分群過程(binning process) 對本質相關係數在假說檢定上的影響。我們亦探討分群數量對本質相關係數估值的影響。經由模擬的結果可以發現, 本質相關係數可被運用在辨別不同程度或者是不同形式相關的變數, 並且使用階層式分群法(hierarchical allocation) 或使用分位數分群法(quantile allocation) 做分群, 其排列檢定(permutation test) 的虛無假設分布不相同。另外, 不論相關性強弱與否, 較少的分群數會有較佳的檢定力。 Pattern recognition is often used in information retrieval on the purpose for establishing a classification criterion to identify the unknown data. Typically, pattern recognition begins with feature selection that aims to select a subset of features which performs the best under certain evaluation system and to predict future cases. Due to today’s blooming developments in bioinformatics, tons of high-throughput data have been released. High dimensionality in the high throughput data brings difficulties to analyses and increases the needs of feature selection to effectively reduce the complexity of the data. The coefficient of intrinsic dependence (CID) has been recently proposed to deal with feature selection issues. The goal of this study is to survey the behavior of CID under different strength of association between variables. In addition, we carefully examine how the binning process (hierarchical or quantile allocation) and number of bins affect the hypothesis test of CID. The simulation results demonstrate that CID is capable of identifying different levels or types of association. Besides, the null distributions of hierarchical and quantile allocation are similar. We also observe that a smaller number of subgroups has higher prediction power regardless of the strength or type of associations. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749 |
全文授權: | 未授權 |
顯示於系所單位: | 農藝學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf 目前未授權公開取用 | 594.45 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。