請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302
標題: | 支撐向量機制:以編碼處理分類問題並利用迴歸模式進行基因選取 Support Vector Machines: Classification with Coding and Regression for Gene Selection |
作者: | Pei-Chun Chen 陳佩君 |
指導教授: | 陳素雲 |
關鍵字: | 編碼,基因選取,核化,線性分類,子空間,微陣列,資料,支撐向量機制,支撐向量迴歸, coding,gene selection,kernel,linear discriminant subspace,machine learning,microarray data analysis,support vector machine,support vector regression, |
出版年 : | 2008 |
學位: | 博士 |
摘要: | 本論文主要分為兩部分。在第一部份中,著重於利用編碼(coding)找出一個低維線性分類子空間(low-dimensional linear discriminant feature subspace)的方法,並探討不同編碼之間的等價性質(equivalence)。透過編碼的方法可以將類別(class label)轉換成多維反應量(multiresponse),將此多維反應量與核化資料(kernelized data)進行迴歸分析,再進一步利用迴歸係數得到低維線性分類子空間。此子空間可結合任意的線性分類法,使計算較為簡潔快速。在這一部份中也證明,任意編碼產生的多維反應量都會生成同樣的低維線性分類子空間,因此任意的線性分類法都會得到相同的分類結果。實際資料分類的結果顯示,本文提出的分類方法與LIBSVM比較,具有相近的正確率,但是需要較少的分類時間。
在第二部分中,本文提出了一個利用支撐向量迴歸(support vector regression)進行基因選取(gene selection)的方法。目前根據微陣列資料(microarray data)作基因選取的方法都將每一片生物晶片視為相同。然而,生物晶片也許來自於不同疾病狀態的病人身上,因此與疾病的相關也不全然相同。所以應當給予生物晶片不同的權重來表示這些生物晶片與疾病之間的相關性。而這些權重可以由支撐向量迴歸估計得來。將這些加權過後的表現(weighted expressions)相加後得到的數值,可以用來決定哪些基因是有顯著意義的基因(significant genes)。我們使用白血病(leukemia)與結腸癌(colon cancer)的資料作分析,並比較其他基因選取的方法所得之正確率。結果顯示,本文提出的基因選取方法可以找出有顯著意義的基因。 This thesis contains two major themes. One is the multiclass support vector machines and the other is the support vector regression for gene selection. In the first part, we propose a regression approach for multiclass support vector classification. We introduce some existing coding schemes into the support vector classification by coding the class labels into multivariate responses. Regression of these multivariate responses on kernelized input data is used to extract a low-dimensional feature subspace for discriminant purpose. We unify these coding schemes by showing that they are equivalent in the sense of leading to the same low-dimensional discriminant feature subspace. Classification is then carried out in this low-dimensional subspace using a linear discriminant algorithm, which can be any reasonable choice. The regression approach for extracting low-dimensional discriminant subspace combined with user-specified linear algorithm can team up into a simple but yet powerful toolkit for multiclass support vector classification. Issues of encoding, decoding and the notions of equivalence of codes are discussed. Experimental results, including prediction ability and CPU time, show that our approach is a competent alternative for the multiclass support vector machine problem. In the second part, we propose a support vector regression approach for gene selection and use the selected genes for disease classification. Current gene selection methods based on microarray data have treated each individual subject with equal weight to the disease of interest. However, tissues collected from different patients can be from different disease stages and may have different strength of association with the disease. To reflect this circumstance, our proposed method will take into account the subject variation by assigning different weights to subjects. The weights are calculated via support vector regression. Then significant genes are selected based on the cumulative sum of weighted expressions. The proposed gene selection procedure is illustrated and evaluated using the acute leukemia and colon cancer data. The results and performance are compared with four other approaches in terms of classification accuracies. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302 |
全文授權: | 有償授權 |
顯示於系所單位: | 流行病學與預防醫學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-97-1.pdf 目前未授權公開取用 | 483.27 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。