Please use this identifier to cite or link to this item:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35199
Title: | KFD分析的分類方法與評估研究 Classification Methods and Their Evaluation for Kernel Fisher Discriminant Analysis |
Authors: | Hsuan-Hsien Chou 周軒賢 |
Advisor: | 陳正剛(Argon Chen) |
Keyword: | 核心費雪區別,最近中心點分類法,馬氏距離,敏感度分析, kernel Fisher discriminant,nearest center classifier,Mahalanobis distance,sensitivity analysis, |
Publication Year : | 2005 |
Degree: | 碩士 |
Abstract: | 費雪線性區別(Fisher’s linear discriminant)分析是一個可有效進行資料特徵攫取(feature extraction)的方法,其目的在於找到某個方向,使得訓練資料(training instances)於其上的線性投影點—稱之為「區別得點(discriminant score)」—可展現各群體間最大的區別性。核心費雪區別(kernel Fisher discriminant)分析則將訓練資料傳送到一個更高維度的空間,並於該空間中執行費雪線性區別分析,此種非線性的方法可藉由將費雪線性區別的求解問題完全以內積的形式表示並運用核技巧(kernel trick)來使之成為可行。
以區別得點為基礎,我們可使用各種分類方法(classifier)來對驗證資料(testing instances)進行分類。最近中心點分類法(nearest center classifier)通常會與費雪線性區別與核心費雪區別結合使用,但是當區別得點的分佈產生重心偏移的現象時,無論是歐氏(Euclidean)或馬氏(Mahalanobis)距離的使用都將導致不理想的分類效能。我們提出兩種分類方法來改善此種情形下的分類結果。這兩種方法皆只使用部份的訓練資料來計算共變異矩陣,其中一種方法的資料選取方式是以驗證和訓練資料所定義的夾角為基礎,而另一種方法下的資料選取則是根據兩兩群體中心所決定的超平面(hyperplanes)。這些分類法的優點在於去除了不適當的訓練資料,使得我們可避免不確實的共變異數矩陣的估計。 另一個我們欲探討的議題為某分類方法對環境變化的敏感度(sensitivity),而我們所考慮的變化包含了參數數值、訓練資料群(training dataset)大小以及訓練資料群抽樣結果的改變,此外,我們也將討論綜合性的敏感度,即一個分類方法對某種環境變化的敏感度將如何被其他類型的環境變化所影響。 最後,我們在案例學習中透過兩筆實際資料來驗證所提出的方法。根據這些資料,我們對多種分類方法的效能進行比較,並對其執行敏感性分析。分析的結果指出,我們提出的方法其效能等同—甚至優於—其他以核心費雪區別為基礎的分類方法,同時我們也發現在提出的方法中,有一個方法對於不同的環境變化會較其他分類方法來得更為敏感。 Fisher’s Linear Discriminant (FLD) analysis is an efficient method for feature extraction. The objective of that is to find the direction on which the linear projection of training instances, called “discriminant scores”, can provide the maximal separability of classes. Kernel Fisher Discriminant (KFD) analysis will map training instances into a higher dimensional space and perform FLD analysis there. Such a nonlinear approach can be feasible by reformulating the FLD problem in terms of only inner products and applying the kernel trick to that. Based on the discriminant scores, we can use various classifiers to classify testing instances. The nearest center classifier is usually combined with FLD and KFD. But when the score distribution is skewed, no matter the use of Euclidean or Mahalanobis distance, it will lead to weak classification performance. We propose two classifiers to enhance the classification result in such a situation. Both of them use a proportion of training instances for covariance matrix computation. The instance selection of one method is based on the included angle defined by testing and training instances; that of the other method is according to the hyperplanes determined by paired-class centers. The advantage of these classifiers is that we can avoid a loose estimate of the covariance matrix by excluding the training instances deemed irrelevant. Another issue we attempt to address in is how sensitive a classifier is to environment changes. These environment changes that we consider include changes of parameter value, changes of training dataset size, and changes of training dataset sampling. We also discuss the combined sensitivity, i.e., how a classifier’s sensitivity to one type of environment change is affected by other types of environment changes. Finally, we demonstrate our proposed methods through two real-world datasets in the case study. We compare the performance of various classifiers based on these datasets and perform sensitivity analysis to them. The results indicate that the performance of the proposed methods is equivalent, even superior, to that of other KFD-based classifiers. We also find that one of our methods is more sensitive to different environment changes than other classifiers. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35199 |
Fulltext Rights: | 有償授權 |
Appears in Collections: | 工業工程學研究所 |
Files in This Item:
File | Size | Format | |
---|---|---|---|
ntu-94-1.pdf Restricted Access | 1.41 MB | Adobe PDF |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.