Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳素雲
dc.contributor.authorPei-Chun Chenen
dc.contributor.author陳佩君zh_TW
dc.date.accessioned2021-06-12T18:00:37Z-
dc.date.available2009-02-20
dc.date.copyright2008-02-20
dc.date.issued2008
dc.date.submitted2008-01-27
dc.identifier.citation[1] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A.
Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell,
L. Yang, G. E. Marti, T. Moore, J. Jr. Hudson, L. Luo, D. B. Lewis,
T. B. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D.
Weisenburger, J. O. Armitage, R. Wamke, R. Levy, W. Wilson, M.
R. Grever,J. C. Byrd, D. Botstein, P. O. Brown and L. M. Staudt.
Distinct types of diffuse large B-cell lymphoma identified by gene
expression profiling. Nature, 40(3):, 503–511, 2000
[2] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass
to binary: a unifying approach for margin classifiers. J. Machine
Learning Research, 1:113–141, 2000.
[3] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack
and A. J. Levine. Broad patterns of gene expression revealed by
clustering analysis of tumor and normal colon tissues probed by
oligonucletide arrays. Proceedings of the National Academy of Sciences,
96:6745–6750, Cell Biology, 1999.
[4] T. W. Anderson. An Introduction to Multivariate Statistical Analysis,
3rd ed. Wiley, New York, 2003.
[5] J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, M. Song Dimensionality
reduction via sparse support vector machines. Journal
of Machine Learning Research, 3:1229–1243, 2003.
[6] A. Blum and P. Langley. Selection of relevant features and examples
in machine learning. Artificial Intelligence, 97(1-2):245–271, 1997.
[7] M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet,
T. S. Furey, M. A. Jr. and D. Haussler. Knowledge-based analysis
of microarray gene expression data by using support vector machines.
Proceedings of the National Academy of Sciences, 97(1):
262–267, 2000.
[8] E. Bredensteiner and K. P. Bennett. Multicategory classification by
support vector machines. Computational Optimizations and Applications,
12:53–79, 1999.
[9] C. J. C. Burges. A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery, 2:121–167,
1998.
[10] C. C. Chang and C. J. Lin. LIBSVM : a library for support vector
machines, 2001. http://www.csie.ntu.edu.tw/ cjlin/libsvm
[11] O. Chapelle. Training a support vector machine in the primal. Neural
Conputation, 19 (5):1155–1178, 2007.
[12] C. Cortes and V. Vapnik. Support vector networks. Machine Learning,
20:273–297, 1995.
[13] K. Crammer and Y. Singer. Improved output coding for classification
using continuous relaxation. In Proceeding of the Thirteenth Annual
Conference on Neural Information Processing Systems, 2000a.
[14] K. Crammer and Y. Singer. On the learnability and design of output
codes for multiclass problems. In Computational Learning Theory,
pages 35–46, 2000b.
[15] K. Crammer and Y. Singer. On the algorithmic implementation
of multiclass kernel-based vector machines. J. Machine Learning
Research, 2:265–292, 2001.
[16] K. Crammer and Y. Singer. On the learnability and design of output
codes for multiclass problems. Machine Learning, 47:201–233, 2002.
[17] N. Cristianini and J. Shawe-Taylor. An Introduction to Support
Vector Machines. Cambridge University Press, Cambridge, 2000.
[18] T. G. Dietterich and B. Bakiri. Solving multiclass learning problems
via error-correcting output codes. J. Artificial Intelligence Research,
2:263–286, 1995.
[19] S. Dudoit, J. Fridlyand, T. P. Speed Comparison of discrimination
methods for the classification of tumors using gene expression data.
Journal of the American Statistical Association, 97:77–87, 2002.
[20] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics.
Chapmman & Hall, London, 1994.
[21] R. A. Fisher. The use of multiple measurements in taxonomic problems.
Annals of Eugenics, 7:179–188, 1936.
[22] J. Friedman. Multivariate adaptive regression splines (with discussion).
Annals of Statistics, 19:1–141, 1991.
[23] G. M. Fung and O. L. Mangasarian. Proximal support vector machine
classifiers. In F. Provost and R. Srikant (Eds.), Proceedings
KDD-2001: Knowledge Discovery and Data Mining, pages 77–86,
2001.
[24] G. M. Fung and O. L. Mangasarian. Multicategory proximal support
vector machine classifiers. Machine Learning, 59:77–97, 2005.
[25] T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer and
D. Haussler. Support vector machine classification and validation
of cancer tissue samples using microarray data. Bioinformatics, 16:
906–914, 2000.
[26] J. F¨urnkranz. Round robin classification. J. Machine Learning Research,
2:721–747, 2002.
[27] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek,
J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri,
C. D. Bloomfield and E. S. Lander. Molecular classification of cancer:
class discovery and class prediction by gene expression monitoring.
Science, 286:531–537, 1999.
[28] I. Guyon and A. Elisseeff. An introduction to variable and feature
selection. Journal of Machine Learning Research, 3:1157–1182, 2003.
[29] I. Guyon J. Weston, S. Barnhill and V. Vapnik. Gene selection
for cancer classification using support vector machines. Machine
Learning, 46:389–422, 2002.
[30] T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis.
Annals of Statistics, 23:73–102, 1995.
[31] T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis
by optimal scoring. J. American Statistical Association, 89:1255–
1270, 1994.
[32] T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical
Learning. Springer Series in Statistics. New York, Springer
Verlag, 2001.
[33] C. W. Hsu and C. J. Lin. A comparison on methods for multi-class
support vector machines. IEEE Transactions on Neural Networks,
13:415–425, 2002.
[34] C. M. Huang, Y. J. Lee, D. K. J. Lin, and S. Y. Huang. Model selection
for support vector machines via uniform design. Computational
Statistics and Data Analysis, 52(1):335:346, 2007.
[35] R. Kohavi and G. John. Wrappers for feature selection. Artificial
Intelligence, 97(1-2):273–324, 1997.
[36] D. Koller and M. Sahami. Toward optimal feature selection. Proceedings
of the Thirteenth International Conference on Machine Learning,
96:284–292, 1996.
[37] K. E. Lee, N. Sha, E. R. Dougherty, M. Vannucci and B. K. Mallick
Gene selection: a Bayesian variable selection approach. Bioinformatics,
19(1):90–97, 2003.
[38] Y. J. Lee, C. C. Chang and C. H. Chao. Feature selection
for microarray gene expression data. Technique report, 2007.
http://dmlab1.csie.ntust.edu.tw/downloads/papers/
[39] Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines:
Theory and application to the classification of microarray
data and satellite radiance data. J. American Statistical Association,
99:67–81, 2004.
[40] Y. J. Lee, W. F. Hsieh, and C.-M. Huang. -SSVR: A smooth support
vector machine for -insensitive regression. IEEE Transactions
on Knowledge and Data Engineering, 17:678–685, 2005.
[41] Y. J. Lee and S. Y. Huang. Reduced support vector machines: a
statistical theory. IEEE Transactions on Neural Networks, 18:1–13,
2007.
[42] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector
machines. In First SIAM International Conference 3on Data
Mining, Chicago, 2001a.
[43] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector
machine. Computational Optimization and Applications, 20:5–22,
2001b.
[44] Y. Lin. Support vector machines and the Bayes rule in classification.
Data Mining and Knowledge Discovery, 6:–275, 2002.
[45] O. L. Mangasarian and D. R. Musicant. Lagrangian support vector
machines. Journal of Machine Learning Research, 1:161–177, 2001.
[46] D. Michie, D. J. Spiegelhalter and C. C. Taylor. Machine Learning,
Neural and Statistical Classification. ftp.ncc.up.pt/pub/statlog/
[47] S. Mukherjee, P. Tamayo, D. Slonim, A. Verri, T. Golub, J. Mesirov
and T. Poggio. Support vector machine classification of microarray
data. Technical Report AI Memo/CBCL Paper #1677/#182, MIT
AI Lab and CBCL..
[48] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI Repository
of machine learning databases. 1998.
http://www.ics.uci.edu/ mlearn/MLRepository.html. Irvine, CA:
University of California, Department of Information and Computer
Science.
[49] D. V. Nguyen D. M. Rocke Tumor classification by partial least
squares using microarray gene expression data. Bioinformatics, 18:
39–50, 2002.
[50] H. Niederreiter. Random Number Generation and Quasi-Monte
Carlo Methods. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, 1992.
[51] J. C. Platt, N. Cristianini and J. Shawe-Taylor. Large margin DAG’s
for multiclass classification. Advances in Neural Information Processing
System, Cambridge, MA: MIT Press, 12:547–553, 2000.
[52] R. Rifkin and A. Klautau. In defense of one-vs-all classification.
Journal of Machine Learning Research, 5:101–141, 2004.
[53] Y. Saeys, I. Inza and P. Larra˜naga. A review of feature selection
techniques in bioformatics. Bioinformatics, 24, 2007.
[54] B. Sch¨olkopf, and A. J. Smola. Learning with Kernels Cambridge,
MA, MIT Press, 2002.
[55] N. Sha, M. Vannucci, M. G. Tadesse, P. J. Brown, I. Dragoni, N.
Davies, T. C. Roberts, A. Contestabile, M. Salmon, C. Buckley, F.
Falciani. Bayesian variable selection in multinomial probit models
to identify molecular signatures of disease stage. Biometrics, 60:
812–819, 2004.
[56] D. Slonim, P. Tamayo, T. Golub and E. Lander. Class prediction and
discovery using gene expression data. Fourth Annual International
Conference on Computational Molecular Biology, 263–272,2000.
[57] A. J. Smola and B. Sch¨olkopf. A tutorial on support vector regression.
Statistics and Computing, 14:199–222, 2004.
[58] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor,
and J. Vandewalle. Least Squares Support Vector Machines. World
Scientific, New Jersey, 2002.
[59] J. A. K. Suykens and J. Vandewalle. Least squares support vector
machine classifiers. Neural Processing Letters, 9:293–300, 1999.
[60] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer,
New York, 1995.
[61] V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
[62] J. Weston and C. Watkins. Multi-class support vector machines.
Technical Report CSD-TR-98-04, Royal Holloway, University of
London, Department of Computer Science, 1998.
[63] H. M. Wu. Kernel sliced inverse regression with application.
http://idv.sinica.edu.tw/hmwu/Publications/index.htm
[64] K. Y. Yeung, R. E. Bumgarner, A. E. Raftery. Bayesian model
average: development of an improved multi-class, gene selection and
classification tool for microarray data. Bioinformatics, 21(10):2394–
2402, 2005.
[65] L. Yu and H. Liu. Efficient feature selection via analysis of relevance
and redundancy. Journal of Machine Learning Research, 5:1205–
1224, 2004.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302-
dc.description.abstract本論文主要分為兩部分。在第一部份中,著重於利用編碼(coding)找出一個低維線性分類子空間(low-dimensional linear discriminant feature subspace)的方法,並探討不同編碼之間的等價性質(equivalence)。透過編碼的方法可以將類別(class label)轉換成多維反應量(multiresponse),將此多維反應量與核化資料(kernelized data)進行迴歸分析,再進一步利用迴歸係數得到低維線性分類子空間。此子空間可結合任意的線性分類法,使計算較為簡潔快速。在這一部份中也證明,任意編碼產生的多維反應量都會生成同樣的低維線性分類子空間,因此任意的線性分類法都會得到相同的分類結果。實際資料分類的結果顯示,本文提出的分類方法與LIBSVM比較,具有相近的正確率,但是需要較少的分類時間。
在第二部分中,本文提出了一個利用支撐向量迴歸(support vector regression)進行基因選取(gene selection)的方法。目前根據微陣列資料(microarray data)作基因選取的方法都將每一片生物晶片視為相同。然而,生物晶片也許來自於不同疾病狀態的病人身上,因此與疾病的相關也不全然相同。所以應當給予生物晶片不同的權重來表示這些生物晶片與疾病之間的相關性。而這些權重可以由支撐向量迴歸估計得來。將這些加權過後的表現(weighted expressions)相加後得到的數值,可以用來決定哪些基因是有顯著意義的基因(significant genes)。我們使用白血病(leukemia)與結腸癌(colon cancer)的資料作分析,並比較其他基因選取的方法所得之正確率。結果顯示,本文提出的基因選取方法可以找出有顯著意義的基因。
zh_TW
dc.description.abstractThis thesis contains two major themes. One is the multiclass support vector machines and the other is the support vector regression for gene selection. In the first part, we propose a regression approach for multiclass support vector classification. We introduce some existing coding schemes into the support vector classification by coding the class labels into multivariate responses. Regression of these multivariate responses on kernelized input data is used to extract a low-dimensional feature
subspace for discriminant purpose. We unify these coding schemes by showing that they are equivalent in the sense of leading to the same low-dimensional discriminant feature subspace. Classification is then carried out in this low-dimensional subspace using a linear discriminant algorithm, which can be any reasonable choice. The regression approach for extracting low-dimensional
discriminant subspace combined with user-specified linear
algorithm can team up into a simple but yet powerful toolkit for multiclass support vector classification. Issues of encoding, decoding and the notions of equivalence of codes are discussed. Experimental results, including prediction ability and CPU time, show that our approach is a competent alternative for the multiclass support vector machine problem.
In the second part, we propose a support vector regression
approach for gene selection and use the selected genes for disease classification. Current gene selection methods based on microarray data have treated each individual subject with equal weight to the disease of interest. However, tissues collected from different patients can be from different disease stages and may have different strength of association with the disease. To reflect
this circumstance, our proposed method will take into account the subject variation by assigning different weights to subjects. The weights are calculated via support vector regression. Then significant genes are selected based on the cumulative sum of weighted expressions. The proposed gene selection procedure is
illustrated and evaluated using the acute leukemia and colon cancer data. The results and performance are compared with four other approaches in terms of classification accuracies.
en
dc.description.provenanceMade available in DSpace on 2021-06-12T18:00:37Z (GMT). No. of bitstreams: 1
ntu-97-D93842005-1.pdf: 494866 bytes, checksum: a13aaa4b7173905b36455acd3f8067a3 (MD5)
Previous issue date: 2008
en
dc.description.tableofcontents1 Introduction 1
2 Preliminaries: Support Vector Machines 3
2.1 Linearly separable case . . . . . . . . . . . . . . . . . . . 3
2.2 Linearly non-separable case . . . . . . . . . . . . . . . . 6
2.3 Nonlinear extension by kernel trick . . . . . . . . . . . . 8
2.4 Smooth support vector machine . . . . . . . . . . . . . . 9
2.5 Extension to multiclass classification problem . . . . . . 11
2.5.1 One-against-rest and one-against-one . . . . . . . 12
2.5.2 Single machine approach . . . . . . . . . . . . . . 13
2.6 Support vector regression . . . . . . . . . . . . . . . . . . 18
2.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7.1 LIBSVM. . . . . . . . . . . . . . . . . . . . . . . 21
2.7.2 SSVM toolbox . . . . . . . . . . . . . . . . . . . . 21
3 Classification by Coding and Multiresponse Regression 23
3.1 Regression framework: linear and kernel generalization . 24
3.2 Regularized least-squares support vector regression . . . 27
3.3 Decoding and classification rules . . . . . . . . . . . . . . 28
3.4 Encoding and equivalence class of codes . . . . . . . . . 30
3.4.1 Coding and scoring schemes . . . . . . . . . . . . 30
3.4.2 Equivalence class of codes . . . . . . . . . . . . . 33
4 Application to Benchmark Data Sets 37
4.1 Benchmark data sets . . . . . . . . . . . . . . . . . . . . 37
4.2 Comparisons of coding schemes . . . . . . . . . . . . . . 42
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Gene Selection with Support Vector Regression 51
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2 Current methods . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 SVM-based recursive feature elimination . . . . . 54
5.2.2 Incremental forward feature selection . . . . . . . 55
5.2.3 Bayesian variable selection . . . . . . . . . . . . . 55
5.2.4 Bayesian model average . . . . . . . . . . . . . . 56
5.3 Proposed gene selection method . . . . . . . . . . . . . . 57
5.4 Empirical data analysis . . . . . . . . . . . . . . . . . . . 61
5.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Discussions and Future Directions 69
References 75
dc.language.isoen
dc.subject微陣列zh_TW
dc.subject支撐向量迴歸zh_TW
dc.subject支撐向量機制zh_TW
dc.subject資&#63934zh_TW
dc.subject編碼zh_TW
dc.subject基因選取zh_TW
dc.subject核化zh_TW
dc.subject線性分類zh_TW
dc.subject子空間zh_TW
dc.subjectgene selectionen
dc.subjectcodingen
dc.subjectsupport vector regressionen
dc.subjectsupport vector machineen
dc.subjectmachine learningen
dc.subjectlinear discriminant subspaceen
dc.subjectkernelen
dc.subjectmicroarray data analysisen
dc.title支撐向量機制:以編碼處理分類問題並利用迴歸模式進行基因選取zh_TW
dc.titleSupport Vector Machines: Classification with Coding and Regression for Gene Selectionen
dc.typeThesis
dc.date.schoolyear96-1
dc.description.degree博士
dc.contributor.coadvisor蕭朱杏
dc.contributor.oralexamcommittee黃啟瑞,陳珍信,樊采虹,李育杰,陳為堅,李文宗
dc.subject.keyword編碼,基因選取,核化,線性分類,子空間,微陣列,資&#63934,支撐向量機制,支撐向量迴歸,zh_TW
dc.subject.keywordcoding,gene selection,kernel,linear discriminant subspace,machine learning,microarray data analysis,support vector machine,support vector regression,en
dc.relation.page84
dc.rights.note有償授權
dc.date.accepted2008-01-28
dc.contributor.author-college公共衛生學院zh_TW
dc.contributor.author-dept流行病學研究所zh_TW
顯示於系所單位:流行病學與預防醫學研究所

文件中的檔案:
檔案 大小格式 
ntu-97-1.pdf
  未授權公開取用
483.27 kBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved