請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳素雲 | |
dc.contributor.author | Pei-Chun Chen | en |
dc.contributor.author | 陳佩君 | zh_TW |
dc.date.accessioned | 2021-06-12T18:00:37Z | - |
dc.date.available | 2009-02-20 | |
dc.date.copyright | 2008-02-20 | |
dc.date.issued | 2008 | |
dc.date.submitted | 2008-01-27 | |
dc.identifier.citation | [1] A. A. Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A.
Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, X. Yu, J. I. Powell, L. Yang, G. E. Marti, T. Moore, J. Jr. Hudson, L. Luo, D. B. Lewis, T. B. Tibshirani, G. Sherlock, W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Wamke, R. Levy, W. Wilson, M. R. Grever,J. C. Byrd, D. Botstein, P. O. Brown and L. M. Staudt. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 40(3):, 503–511, 2000 [2] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. J. Machine Learning Research, 1:113–141, 2000. [3] U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucletide arrays. Proceedings of the National Academy of Sciences, 96:6745–6750, Cell Biology, 1999. [4] T. W. Anderson. An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, New York, 2003. [5] J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, M. Song Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243, 2003. [6] A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2):245–271, 1997. [7] M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. A. Jr. and D. Haussler. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, 97(1): 262–267, 2000. [8] E. Bredensteiner and K. P. Bennett. Multicategory classification by support vector machines. Computational Optimizations and Applications, 12:53–79, 1999. [9] C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998. [10] C. C. Chang and C. J. Lin. LIBSVM : a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/ cjlin/libsvm [11] O. Chapelle. Training a support vector machine in the primal. Neural Conputation, 19 (5):1155–1178, 2007. [12] C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995. [13] K. Crammer and Y. Singer. Improved output coding for classification using continuous relaxation. In Proceeding of the Thirteenth Annual Conference on Neural Information Processing Systems, 2000a. [14] K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Computational Learning Theory, pages 35–46, 2000b. [15] K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. J. Machine Learning Research, 2:265–292, 2001. [16] K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. Machine Learning, 47:201–233, 2002. [17] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, 2000. [18] T. G. Dietterich and B. Bakiri. Solving multiclass learning problems via error-correcting output codes. J. Artificial Intelligence Research, 2:263–286, 1995. [19] S. Dudoit, J. Fridlyand, T. P. Speed Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97:77–87, 2002. [20] K. T. Fang and Y. Wang. Number-Theoretic Methods in Statistics. Chapmman & Hall, London, 1994. [21] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179–188, 1936. [22] J. Friedman. Multivariate adaptive regression splines (with discussion). Annals of Statistics, 19:1–141, 1991. [23] G. M. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In F. Provost and R. Srikant (Eds.), Proceedings KDD-2001: Knowledge Discovery and Data Mining, pages 77–86, 2001. [24] G. M. Fung and O. L. Mangasarian. Multicategory proximal support vector machine classifiers. Machine Learning, 59:77–97, 2005. [25] T. Furey, N. Cristianini, N. Duffy, D. Bednarski, M. Schummer and D. Haussler. Support vector machine classification and validation of cancer tissue samples using microarray data. Bioinformatics, 16: 906–914, 2000. [26] J. F¨urnkranz. Round robin classification. J. Machine Learning Research, 2:721–747, 2002. [27] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999. [28] I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003. [29] I. Guyon J. Weston, S. Barnhill and V. Vapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002. [30] T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis. Annals of Statistics, 23:73–102, 1995. [31] T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis by optimal scoring. J. American Statistical Association, 89:1255– 1270, 1994. [32] T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. New York, Springer Verlag, 2001. [33] C. W. Hsu and C. J. Lin. A comparison on methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13:415–425, 2002. [34] C. M. Huang, Y. J. Lee, D. K. J. Lin, and S. Y. Huang. Model selection for support vector machines via uniform design. Computational Statistics and Data Analysis, 52(1):335:346, 2007. [35] R. Kohavi and G. John. Wrappers for feature selection. Artificial Intelligence, 97(1-2):273–324, 1997. [36] D. Koller and M. Sahami. Toward optimal feature selection. Proceedings of the Thirteenth International Conference on Machine Learning, 96:284–292, 1996. [37] K. E. Lee, N. Sha, E. R. Dougherty, M. Vannucci and B. K. Mallick Gene selection: a Bayesian variable selection approach. Bioinformatics, 19(1):90–97, 2003. [38] Y. J. Lee, C. C. Chang and C. H. Chao. Feature selection for microarray gene expression data. Technique report, 2007. http://dmlab1.csie.ntust.edu.tw/downloads/papers/ [39] Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. J. American Statistical Association, 99:67–81, 2004. [40] Y. J. Lee, W. F. Hsieh, and C.-M. Huang. -SSVR: A smooth support vector machine for -insensitive regression. IEEE Transactions on Knowledge and Data Engineering, 17:678–685, 2005. [41] Y. J. Lee and S. Y. Huang. Reduced support vector machines: a statistical theory. IEEE Transactions on Neural Networks, 18:1–13, 2007. [42] Y. J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In First SIAM International Conference 3on Data Mining, Chicago, 2001a. [43] Y. J. Lee and O. L. Mangasarian. SSVM: A smooth support vector machine. Computational Optimization and Applications, 20:5–22, 2001b. [44] Y. Lin. Support vector machines and the Bayes rule in classification. Data Mining and Knowledge Discovery, 6:–275, 2002. [45] O. L. Mangasarian and D. R. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research, 1:161–177, 2001. [46] D. Michie, D. J. Spiegelhalter and C. C. Taylor. Machine Learning, Neural and Statistical Classification. ftp.ncc.up.pt/pub/statlog/ [47] S. Mukherjee, P. Tamayo, D. Slonim, A. Verri, T. Golub, J. Mesirov and T. Poggio. Support vector machine classification of microarray data. Technical Report AI Memo/CBCL Paper #1677/#182, MIT AI Lab and CBCL.. [48] D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI Repository of machine learning databases. 1998. http://www.ics.uci.edu/ mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science. [49] D. V. Nguyen D. M. Rocke Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18: 39–50, 2002. [50] H. Niederreiter. Random Number Generation and Quasi-Monte Carlo Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1992. [51] J. C. Platt, N. Cristianini and J. Shawe-Taylor. Large margin DAG’s for multiclass classification. Advances in Neural Information Processing System, Cambridge, MA: MIT Press, 12:547–553, 2000. [52] R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of Machine Learning Research, 5:101–141, 2004. [53] Y. Saeys, I. Inza and P. Larra˜naga. A review of feature selection techniques in bioformatics. Bioinformatics, 24, 2007. [54] B. Sch¨olkopf, and A. J. Smola. Learning with Kernels Cambridge, MA, MIT Press, 2002. [55] N. Sha, M. Vannucci, M. G. Tadesse, P. J. Brown, I. Dragoni, N. Davies, T. C. Roberts, A. Contestabile, M. Salmon, C. Buckley, F. Falciani. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics, 60: 812–819, 2004. [56] D. Slonim, P. Tamayo, T. Golub and E. Lander. Class prediction and discovery using gene expression data. Fourth Annual International Conference on Computational Molecular Biology, 263–272,2000. [57] A. J. Smola and B. Sch¨olkopf. A tutorial on support vector regression. Statistics and Computing, 14:199–222, 2004. [58] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle. Least Squares Support Vector Machines. World Scientific, New Jersey, 2002. [59] J. A. K. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, 9:293–300, 1999. [60] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995. [61] V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. [62] J. Weston and C. Watkins. Multi-class support vector machines. Technical Report CSD-TR-98-04, Royal Holloway, University of London, Department of Computer Science, 1998. [63] H. M. Wu. Kernel sliced inverse regression with application. http://idv.sinica.edu.tw/hmwu/Publications/index.htm [64] K. Y. Yeung, R. E. Bumgarner, A. E. Raftery. Bayesian model average: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics, 21(10):2394– 2402, 2005. [65] L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205– 1224, 2004. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/27302 | - |
dc.description.abstract | 本論文主要分為兩部分。在第一部份中,著重於利用編碼(coding)找出一個低維線性分類子空間(low-dimensional linear discriminant feature subspace)的方法,並探討不同編碼之間的等價性質(equivalence)。透過編碼的方法可以將類別(class label)轉換成多維反應量(multiresponse),將此多維反應量與核化資料(kernelized data)進行迴歸分析,再進一步利用迴歸係數得到低維線性分類子空間。此子空間可結合任意的線性分類法,使計算較為簡潔快速。在這一部份中也證明,任意編碼產生的多維反應量都會生成同樣的低維線性分類子空間,因此任意的線性分類法都會得到相同的分類結果。實際資料分類的結果顯示,本文提出的分類方法與LIBSVM比較,具有相近的正確率,但是需要較少的分類時間。
在第二部分中,本文提出了一個利用支撐向量迴歸(support vector regression)進行基因選取(gene selection)的方法。目前根據微陣列資料(microarray data)作基因選取的方法都將每一片生物晶片視為相同。然而,生物晶片也許來自於不同疾病狀態的病人身上,因此與疾病的相關也不全然相同。所以應當給予生物晶片不同的權重來表示這些生物晶片與疾病之間的相關性。而這些權重可以由支撐向量迴歸估計得來。將這些加權過後的表現(weighted expressions)相加後得到的數值,可以用來決定哪些基因是有顯著意義的基因(significant genes)。我們使用白血病(leukemia)與結腸癌(colon cancer)的資料作分析,並比較其他基因選取的方法所得之正確率。結果顯示,本文提出的基因選取方法可以找出有顯著意義的基因。 | zh_TW |
dc.description.abstract | This thesis contains two major themes. One is the multiclass support vector machines and the other is the support vector regression for gene selection. In the first part, we propose a regression approach for multiclass support vector classification. We introduce some existing coding schemes into the support vector classification by coding the class labels into multivariate responses. Regression of these multivariate responses on kernelized input data is used to extract a low-dimensional feature
subspace for discriminant purpose. We unify these coding schemes by showing that they are equivalent in the sense of leading to the same low-dimensional discriminant feature subspace. Classification is then carried out in this low-dimensional subspace using a linear discriminant algorithm, which can be any reasonable choice. The regression approach for extracting low-dimensional discriminant subspace combined with user-specified linear algorithm can team up into a simple but yet powerful toolkit for multiclass support vector classification. Issues of encoding, decoding and the notions of equivalence of codes are discussed. Experimental results, including prediction ability and CPU time, show that our approach is a competent alternative for the multiclass support vector machine problem. In the second part, we propose a support vector regression approach for gene selection and use the selected genes for disease classification. Current gene selection methods based on microarray data have treated each individual subject with equal weight to the disease of interest. However, tissues collected from different patients can be from different disease stages and may have different strength of association with the disease. To reflect this circumstance, our proposed method will take into account the subject variation by assigning different weights to subjects. The weights are calculated via support vector regression. Then significant genes are selected based on the cumulative sum of weighted expressions. The proposed gene selection procedure is illustrated and evaluated using the acute leukemia and colon cancer data. The results and performance are compared with four other approaches in terms of classification accuracies. | en |
dc.description.provenance | Made available in DSpace on 2021-06-12T18:00:37Z (GMT). No. of bitstreams: 1 ntu-97-D93842005-1.pdf: 494866 bytes, checksum: a13aaa4b7173905b36455acd3f8067a3 (MD5) Previous issue date: 2008 | en |
dc.description.tableofcontents | 1 Introduction 1
2 Preliminaries: Support Vector Machines 3 2.1 Linearly separable case . . . . . . . . . . . . . . . . . . . 3 2.2 Linearly non-separable case . . . . . . . . . . . . . . . . 6 2.3 Nonlinear extension by kernel trick . . . . . . . . . . . . 8 2.4 Smooth support vector machine . . . . . . . . . . . . . . 9 2.5 Extension to multiclass classification problem . . . . . . 11 2.5.1 One-against-rest and one-against-one . . . . . . . 12 2.5.2 Single machine approach . . . . . . . . . . . . . . 13 2.6 Support vector regression . . . . . . . . . . . . . . . . . . 18 2.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.7.1 LIBSVM. . . . . . . . . . . . . . . . . . . . . . . 21 2.7.2 SSVM toolbox . . . . . . . . . . . . . . . . . . . . 21 3 Classification by Coding and Multiresponse Regression 23 3.1 Regression framework: linear and kernel generalization . 24 3.2 Regularized least-squares support vector regression . . . 27 3.3 Decoding and classification rules . . . . . . . . . . . . . . 28 3.4 Encoding and equivalence class of codes . . . . . . . . . 30 3.4.1 Coding and scoring schemes . . . . . . . . . . . . 30 3.4.2 Equivalence class of codes . . . . . . . . . . . . . 33 4 Application to Benchmark Data Sets 37 4.1 Benchmark data sets . . . . . . . . . . . . . . . . . . . . 37 4.2 Comparisons of coding schemes . . . . . . . . . . . . . . 42 4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 45 5 Gene Selection with Support Vector Regression 51 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.2 Current methods . . . . . . . . . . . . . . . . . . . . . . 53 5.2.1 SVM-based recursive feature elimination . . . . . 54 5.2.2 Incremental forward feature selection . . . . . . . 55 5.2.3 Bayesian variable selection . . . . . . . . . . . . . 55 5.2.4 Bayesian model average . . . . . . . . . . . . . . 56 5.3 Proposed gene selection method . . . . . . . . . . . . . . 57 5.4 Empirical data analysis . . . . . . . . . . . . . . . . . . . 61 5.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . 67 6 Discussions and Future Directions 69 References 75 | |
dc.language.iso | en | |
dc.title | 支撐向量機制:以編碼處理分類問題並利用迴歸模式進行基因選取 | zh_TW |
dc.title | Support Vector Machines: Classification with Coding and Regression for Gene Selection | en |
dc.type | Thesis | |
dc.date.schoolyear | 96-1 | |
dc.description.degree | 博士 | |
dc.contributor.coadvisor | 蕭朱杏 | |
dc.contributor.oralexamcommittee | 黃啟瑞,陳珍信,樊采虹,李育杰,陳為堅,李文宗 | |
dc.subject.keyword | 編碼,基因選取,核化,線性分類,子空間,微陣列,資料,支撐向量機制,支撐向量迴歸, | zh_TW |
dc.subject.keyword | coding,gene selection,kernel,linear discriminant subspace,machine learning,microarray data analysis,support vector machine,support vector regression, | en |
dc.relation.page | 84 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2008-01-28 | |
dc.contributor.author-college | 公共衛生學院 | zh_TW |
dc.contributor.author-dept | 流行病學研究所 | zh_TW |
顯示於系所單位: | 流行病學與預防醫學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-97-1.pdf 目前未授權公開取用 | 483.27 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。