請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48424
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 歐陽彥正 | |
dc.contributor.author | Chih-Hung Hsieh | en |
dc.contributor.author | 謝志宏 | zh_TW |
dc.date.accessioned | 2021-06-15T06:56:16Z | - |
dc.date.available | 2013-02-20 | |
dc.date.copyright | 2011-02-20 | |
dc.date.issued | 2011 | |
dc.date.submitted | 2011-02-09 | |
dc.identifier.citation | Reference
[1] A. K. Jain, R. P. W. Duin, and J. C. Mao, 'Statistical pattern recognition: A review,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 4-37, 2000. [2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2 ed: John Wiley & Sons, 2001. [3] I. H. Witten and E. Frank, Data mining : practical machine learning tools and techniques, 2nd ed. Amsterdam ; Boston, MA: Morgan Kaufman, 2005. [4] C. M. Bishop, Pattern recognition and machine learning. New York: Springer, 2006. [5] C. Neocleous and C. Schizas, 'Artificial neural network learning: A comparative review,' Methods and Applications of Artificial Intelligence, vol. 2308, pp. 300-313, 2002. [6] C. Cortes and V. Vapnik, 'Support-Vector Networks,' Machine Learning, vol. 20, pp. 273-297, 1995. [7] S. Y. Ho, C. H. Hsieh, H. M. Chen, and H. L. Huang, 'Interpretable gene expression classifier with an accurate and compact fuzzy rule base for microarray data analysis,' Biosystems, vol. 85, pp. 165-176, 2006. [8] S. A. Vinterbo, E. Y. Kim, and L. Ohno-Machado, 'Small, fuzzy and interpretable gene expression based classifiers,' Bioinformatics, vol. 21, pp. 1964-1970, 2005. [9] A. P. Dempster, N. M. Laird, and D. B. Rubin, 'Maximum Likelihood from Incomplete Data Via Em Algorithm,' Journal of the Royal Statistical Society Series B-Methodological, vol. 39, pp. 1-38, 1977. [10] R. A. Redner and H. F. Walker, 'Mixture Densities, Maximum-Likelihood and the Em Algorithm,' Siam Review, vol. 26, pp. 195-237, 1984. [11] D. Ormoneit and V. Tresp, 'Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates,' IEEE Transactions on Neural Networks, vol. 9, pp. 639-650, 1998. [12] A. M. Martinez and J. Vitria, 'Learning mixture models using a genetic version of the EM algorithm,' Pattern Recognition Letters, vol. 21, pp. 759-769, 2000. [13] N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton, 'SMEM algorithm for mixture models,' Neural Computation, vol. 12, pp. 2109-2128, 2000. [14] G. Celeux, S. Chretien, F. Forbes, and A. Mkhadri, 'A component-wise EM algorithm for mixtures,' Journal of Computational and Graphical Statistics, vol. 10, pp. 697-712, 2001. [15] M. A. T. Figueiredo and A. K. Jain, 'Unsupervised learning of finite mixture models,' Ieee Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 381-396, 2002. [16] L. Xu, 'BYY harmony learning, structural RPCL, and topological self-organizing on mixture models,' Neural Networks, vol. 15, pp. 1125-1151, 2002. [17] J. J. Verbeek, N. Vlassis, and B. Krose, 'Efficient greedy learning of Gaussian mixture models,' Neural Computation, vol. 15, pp. 469-485, 2003. [18] F. Pernkopf and D. Bouchaffra, 'Genetic-based EM algorithm for learning Gaussian mixture models,' IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1344-1348, 2005. [19] H. Joe, 'Generating random correlation matrices based on partial correlations,' Journal of Multivariate Analysis, vol. 97, pp. 2177-2189, 2006. [20] D. E. Goldberg, Genetic algorithms in search, optimization, and machine learning. Boston, MA: USA: Addison-Wesley Longman Publishing Co., Inc.,, 1989. [21] A. E. Eiben and J. E. Smith, Introduction to evolutionary computing. New York: Springer, 2003. [22] D. T. H. Chang, Y. J. Oyang, and J. H. Lin, 'MEDock: a web server for efficient prediction of ligand binding sites based on a novel optimization algorithm,' Nucleic Acids Research, vol. 33, pp. W233-W238, 2005. [23] A. Asuncion and D. J. Newman, '{UCI} Machine Learning Repository,' 2007. [24] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann, 1993. [25] W. W. Cohen, 'Fast effective rule induction ' presented at International Conference on Machine Learning 1995. [26] G. Dror, R. Sorek, and R. Shamir, 'Accurate identification of alternatively spliced exons using support vector machine,' Bioinformatics, vol. 21, pp. 897-901, 2005. [27] C. Latry, C. Panem, and P. Dejean, 'Cloud detection with SVM technique,' presented at Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International, 2007. [28] A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, 'A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,' Bioinformatics, vol. 21, pp. 631-643, 2005. [29] Y. C. Yang, Y. P. Wang, and K. B. Li, 'MiRTif: a support vector machine-based microRNA target interaction filter,' BMC Bioinformatics, vol. 9, pp. -, 2008. [30] I. Jouny, 'On SVM for classification of real and synthetic radar signatures,' presented at Antennas and Propagation Society International Symposium, 2005 IEEE, 2005. [31] T. Mitchell, B. Buchanan, G. Dejong, T. Dietterich, P. Rosenbloom, and A. Waibel, 'Machine Learning,' Annual Review of Computer Science, vol. 4, pp. 417-433, 1989. [32] S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, 'Machine learning: a review of classification and combining techniques,' Artificial Intelligence Review, vol. 26, pp. 159-190, 2006. [33] C. C. Chang and C. J. Lin, 'LIBSVM: a library for support vector machines,' 2001, pp. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. [34] C. Fraley and A. E. Raftery, 'Model-based methods of classification: Using the mclust software in chemometrics,' Journal of Statistical Software, vol. 18, 2007. [35] J. H. Holland, Adaptation in natural and artificial systems : an introductory analysis with applications to biology, control, and artificial intelligence, 1st MIT Press ed. Cambridge, Mass.: MIT Press, 1992. [36] D. E. Clark, Evolutionary algorithms in molecular design. Weinheim ; Chichester: Wiley-VCH, 2000. [37] G. M. Morris, D. S. Goodsell, R. S. Halliday, R. Huey, W. E. Hart, R. K. Belew, and A. J. Olson, 'Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function,' Journal of Computational Chemistry, vol. 19, pp. 1639-1662, 1998. [38] D. P. Bartel, 'MicroRNAs: Genomics, biogenesis, mechanism, and function,' Cell, vol. 116, pp. 281-297, 2004. [39] R. C. Lee, R. L. Feinbaum, and V. Ambros, 'The C-Elegans Heterochronic Gene Lin-4 Encodes Small Rnas with Antisense Complementarity to Lin-14,' Cell, vol. 75, pp. 843-854, 1993. [40] B. J. Reinhart, F. J. Slack, M. Basson, A. E. Pasquinelli, J. C. Bettinger, A. E. Rougvie, H. R. Horvitz, and G. Ruvkun, 'The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans,' Nature, vol. 403, pp. 901-906, 2000. [41] S. Griffiths-Jones, H. K. Saini, S. van Dongen, and A. J. Enright, 'miRBase: tools for microRNA genomics,' Nucleic Acids Research, vol. 36, pp. D154-D158, 2008. [42] P. Y. Chen, H. Manninga, K. Slanchev, M. C. Chien, J. J. Russo, J. Y. Ju, R. Sheridan, B. John, D. S. Marks, D. Gaidatzis, C. Sander, M. Zavolan, and T. Tuschl, 'The developmental miRNA profiles of zebrafish as determined by small RNA cloning,' Genes & Development, vol. 19, pp. 1288-1293, 2005. [43] E. Berezikov, E. Cuppen, and R. H. A. Plasterk, 'Approaches to microRNA discovery,' Nature Genetics, vol. 38, pp. S2-S7, 2006. [44] D. Boffelli, J. McAuliffe, D. Ovcharenko, K. D. Lewis, I. Ovcharenko, L. Pachter, and E. M. Rubin, 'Phylogenetic shadowing of primate sequences to find functional regions of the human genome,' Science, vol. 299, pp. 1391-1394, 2003. [45] Y. Grad, J. Aach, G. D. Hayes, B. J. Reinhart, G. M. Church, G. Ruvkun, and J. Kim, 'Computational and experimental identification of C-elegans microRNAs,' Molecular Cell, vol. 11, pp. 1253-1263, 2003. [46] I. Bentwich, A. Avniel, Y. Karov, R. Aharonov, S. Gilad, O. Barad, A. Barzilai, P. Einat, U. Einav, E. Meiri, E. Sharon, Y. Spector, and Z. Bentwich, 'Identification of hundreds of conserved and nonconserved human microRNAs,' Nature Genetics, vol. 37, pp. 766-770, 2005. [47] E. Berezikov, V. Guryev, J. van de Belt, E. Wienholds, R. H. A. Plasterk, and E. Cuppen, 'Phylogenetic shadowing and computational identification of human microRNA genes,' Cell, vol. 120, pp. 21-24, 2005. [48] A. Sewer, N. Paul, P. Landgraf, A. Aravin, S. Pfeffer, M. J. Brownstein, T. Tuschl, E. van Nimwegen, and M. Zavolan, 'Identification of clustered microRNAs using an ab initio prediction method,' BMC Bioinformatics, vol. 6, pp. -, 2005. [49] C. H. Xue, F. Li, T. He, G. P. Liu, Y. D. Li, and X. G. Zhang, 'Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine,' BMC Bioinformatics, vol. 6, pp. -, 2005. [50] M. Brameier and C. Wiuf, 'Ab initio identification of human microRNAs based on structure motifs,' BMC Bioinformatics, vol. 8, pp. -, 2007. [51] S. Kwang Loong and S. K. Mishra, 'De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures,' Bioinformatics, vol. 23, pp. 1321-1330, 2007. [52] D. T. H. Chang, C. C. Wang, and J. W. Chen, 'Using a kernel density estimation based classifier to predict species-specific microRNA precursors,' BMC Bioinformatics, vol. 9, 2008. [53] E. A. Schultes, P. T. Hraber, and T. H. LaBean, 'Estimating the contributions of selection and self-organization in RNA secondary structure,' Journal of Molecular Evolution, vol. 49, pp. 76-83, 1999. [54] B. H. Zhang, X. P. Pan, S. B. Cox, G. P. Cobb, and T. A. Anderson, 'Evidence that miRNAs are different from other RNAs,' Cellular and Molecular Life Sciences, vol. 63, pp. 246-254, 2006. [55] E. Freyhult, P. P. Gardner, and V. Moulton, 'A comparison of RNA folding measures,' BMC Bioinformatics, vol. 6, pp. -, 2005. [56] H. H. Gan, D. Fera, J. Zorn, N. Shiffeldrim, M. Tang, U. Laserson, N. Kim, and T. Schlick, 'RAG: RNA-As-Graphs database - concepts, analysis, and features,' Bioinformatics, vol. 20, pp. 1285-1291, 2004. [57] J. W. Nam, K. R. Shin, J. J. Han, Y. Lee, V. N. Kim, and B. T. Zhang, 'Human microRNA prediction through a probabilistic co-learning model of sequence and structure,' Nucleic Acids Research, vol. 33, pp. 3570-3581, 2005. [58] G. Terai, T. Komori, K. Asai, and T. Kin, 'miRRim: A novel system to find conserved miRNAs with high sensitivity and specificity,' Rna-a Publication of the Rna Society, vol. 13, pp. 2081-2090, 2007. [59] R. Batuwita and V. Palade, 'microPred: effective classification of pre-miRNAs for human miRNA gene prediction,' Bioinformatics, vol. 25, pp. 989-995, 2009. [60] L. T. Huang, M. M. Gromiha, and S. Y. Ho, 'iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations,' Bioinformatics, vol. 23, pp. 1292-1293, 2007. [61] X. F. Zhou, J. H. Ruan, G. D. Wang, and W. X. Zhang, 'Characterization and identification of microRNA core promoters in four model species,' Plos Computational Biology, vol. 3, pp. 412-423, 2007. [62] G. D. Zhou, 'Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid,' International Journal of Medical Informatics, vol. 75, pp. 456-467, 2006. [63] D. S. Bernstein, Matrix Mathematics: Theory, Facts, and Formulas: Princeton University Press, 2005. [64] S. Y. Ho, C. H. Hsieh, F. C. Yu, and H. L. Huang, 'An intelligent two-stage evolutionary algorithm for dynamic pathway identification from gene expression profiles,' IEEE-ACM Transactions on Computational Biology and Bioinformatics, vol. 4, pp. 648-660, 2007. [65] S. J. Ho, S. Y. Ho, and L. S. Shu, 'OSA: Orthogonal simulated annealing algorithm and its application to designing mixed H-2/H-infinity optimal controllers,' IEEE Transactions on Systems Man and Cybernetics Part a-Systems and Humans, vol. 34, pp. 588-600, 2004. [66] S. Griffiths-Jones, R. J. Grocock, S. van Dongen, A. Bateman, and A. J. Enright, 'miRBase: microRNA sequences, targets and gene nomenclature,' Nucleic Acids Research, vol. 34, pp. D140-D144, 2006. [67] W. Ritchie, M. Legendre, and D. Gautheret, 'RNA stem-loops: To be or not to be cleaved by RNAse III,' RNA, vol. 13, pp. 457-462, 2007. [68] W. Seffens and D. Digby, 'mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences,' Nucleic Acids Research, vol. 27, pp. 1578-1584, 1999. [69] V. Moulton, M. Zuker, M. Steel, R. Pointon, and D. Penny, 'Metrics on RNA secondary structures,' Journal of Computational Biology, vol. 7, pp. 277-292, 2000. [70] D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, H. H. Gan, and T. Schlick, 'RAG: RNA-As-Graphs web resource,' BMC Bioinformatics, vol. 5, pp. -, 2004. [71] W. Z. Li and A. Godzik, 'CD-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences,' Bioinformatics, vol. 22, pp. 1658-1659, 2006. [72] K. D. Pruitt and D. R. Maglott, 'RefSeq and LocusLink: NCBI gene-centered resources,' Nucleic Acids Research, vol. 29, pp. 137-140, 2001. [73] D. Karolchik, R. Baertsch, M. Diekhans, T. S. Furey, A. Hinrichs, Y. T. Lu, K. M. Roskin, M. Schwartz, C. W. Sugnet, D. J. Thomas, R. J. Weber, D. Haussler, and W. J. Kent, 'The UCSC Genome Browser Database,' Nucleic Acids Research, vol. 31, pp. 51-54, 2003. [74] I. L. Hofacker, 'Vienna RNA secondary structure server,' Nucleic Acids Research, vol. 31, pp. 3429-3431, 2003. [75] R. J. Larsen and M. L. Marx, An Introduction to Mathematical Statistics and Its Applications 3rd ed: Prentice Hall, 2005. [76] L. Y. Han, C. Z. Cai, S. L. Lo, M. C. M. Chung, and Y. Z. Chen, 'Prediction of RNA-binding proteins from primary sequence by a support vector machine approach,' RNA, vol. 10, pp. 355-368, 2004. [77] F. Wilcoxon, 'Individual Comparisons by Ranking Methods,' Biometrics Bulletin, vol. 1, pp. 80-83, 1945. [78] R. V. Hogg and E. A. Tanis, Probability and statistical inference, 7th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2006. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/48424 | - |
dc.description.abstract | 在常見的分類演算法中,邏輯式分類器主要提供使用者可解讀之分類模型,而核心函數式分類器則主要追求精確的分類預測結果。在此篇論文中,我們主要提出一以廣義高斯函數成份為基礎之密度估計演算法(簡稱為高斯密度估計算演算法),利用少數個廣義不受限的高斯函數成分建構出的分類模型,來縮小邏輯式分類器及核心式分類器之間在預測效能及可解讀特性上的差距。高斯密度估計算演算法所建構出的分類模型其最顯著的優點在於藉由分析其中高斯函數成分所對的共變異數矩陣之特徵向量及特徵值,使用者可以有效看出資料分佈的特性及概括狀況以利使用者進行更深入的資料分析。此篇論文中,我們也提出如何將高斯密度估計算演算法分類模型的學習程序參數化為參數最佳化問題之方法,並藉由一高效率的位階基礎適應性突變演化式方法求解最佳的分類模型。經由將高斯密度估計演算法分類模型應用於人工合成及實際案例的測試資料後,可由實驗結果看出高斯密度估計算演算法僅利用少數高斯函數成分做為核心函數建構出的分類模型能提供比一般邏輯式分類器或期望值最佳化式的分類器更為準確的分類預測結果。此外藉由實驗結果亦可知高斯密度估計算演算法所建構出的分類模型確實幫助使用者更能了解現有資料分佈的特性及概括狀況。在此篇論文的最後,我們利用所提出的高斯密度估計演算法進一步的建構出兩階段的高斯密度估計演算法架構,並將此方法應用於辨識微型核糖核酸的應用問題之中,藉由與數個常見分類演算法的比較實驗證明,本論文方法對於微型核糖核酸的分類辨識能同時得到:1) 準確預測及 2) 提供有價值可解讀的分類模型之結果。 | zh_TW |
dc.description.abstract | In this thesis, aiming to close the gap between interpretability of the logic based classifiers and the accurate prediction of kernel based classifiers, we propose a generalized Gaussian Density Estimation algorithm (G2DE), to carry out density estimation based on a mixture model composed of a limited number of generalized Gaussian components. One of the most distinct features of the classifier constructed with the proposed approach is that users can easily obtain an overall picture of the distributions of the data set by examining the eigenvectors and eigenvalues of the covariance matrices associated with the generalized Gaussian components. The learning process of the proposed method are parameterized and modeled as a large-parameter-optimization problem solved by an efficient rank-based adaptive mutation evolutionary approach. Experiments on standard benchmarks and synthesized data show that the proposed G2DE, with just a few number of kernels, outperformed the conventional logic based classifiers and the EM (Expectation Maximization) based classifier in terms of prediction accuracy. Furthermore, the proposed classifier enjoys a major advantage that it provides users an overall picture of the underlying distributions. An application of the proposed method on identifying microRNA precursor from pseudo hairpins is also demonstrated in this study. By developing a two-stage framework of the G2DE, it is shown that the proposed G2DE provides i) accurate prediction results comparable to several well-known classifiers and ii) valuable information to interpret the data. | en |
dc.description.provenance | Made available in DSpace on 2021-06-15T06:56:16Z (GMT). No. of bitstreams: 1 ntu-100-D95922003-1.pdf: 1006027 bytes, checksum: 0a9751c0e323d495cfa42dbc11337ab0 (MD5) Previous issue date: 2011 | en |
dc.description.tableofcontents | Table of Contents
致謝 ……………………...………………………………………………………..2 摘要 ………………………...……………………………………………………..3 Abstract ……………...………………………………………………………………..4 Table of Contents.............................................................................................................. 5 List of Tables .................................................................................................................... 7 List of Figures................................................................................................................... 8 Chapter 1. Introduction.................................................................................................... 9 1.1. Motivation...................................................................................................... 9 1.2. Generalized Gaussian Components based Density Estimation Algorithms 10 1.3. Organization................................................................................................. 12 Chapter 2. Background.................................................................................................. 13 2.1. Supervised Machine Learning ..................................................................... 13 2.2. Logic Based Algorithms .............................................................................. 13 2.2.1. Decision Tree.................................................................................. 13 2.2.2. Rule based Classifier ...................................................................... 15 2.3. Kernel Based Algorithms............................................................................. 16 2.3.1. Support Vector Machine ................................................................. 16 2.3.2. Gaussian-Mixture-Model based Expectation Maximization Approach........................................................................................ 17 2.4. Rank-based Adaptive Mutation Evolutionary Approach............................. 19 2.5. Introduction to Prediction of MicroRNA Precursor ................................... 21 Chapter 3. Generalized Gaussian Components based Density Estimation Algorithm.. 23 3.1. The Decision Model of the G2DE Classifier ............................................... 23 3.2. The Learning Process................................................................................... 24 3.3. Adopting RAME Approach to Solving the Learning Process of G2DE ...... 26 3.4. Evaluation of G2DE ..................................................................................... 27 3.4.1. Evaluation of prediction accuracy .................................................. 28 3.4.2. Interpretability of G2DE ................................................................. 31 3.5. Summary of G2DE....................................................................................... 39 Chapter 4. An Application of the Proposed G2DE: Prediction of MicroRNA Precursors ...................................................................................................................... 40 4.1. Applied G2DE to MicroRNA Precursor Prediction.................................... 40 4.2. The materials and the used feature set ......................................................... 42 4.2.1. Data set ........................................................................................... 42 4.2.2. Feature set....................................................................................... 43 4.3. Using G2DE to analyze pre-miRNAs .......................................................... 44 4.4. The prediction performance of G2DE.......................................................... 46 4.5. Interpretability of G2DE............................................................................... 49 4.6. Conclusion of this application ..................................................................... 54 Chapter 5. Conclusion and Future Work ....................................................................... 55 5.1. Modify the Original Decision model of G2DE Algorithm to Construct a Regression Tool..................................................................................................... 56 5.2. Include Feature Selection Mechanism into the learning Process of G2DE.. 57 Reference …………………...…………………………………………………………59 | |
dc.language.iso | en | |
dc.title | 以廣義高斯函數成份為基礎之密度估計演算法及其相關應用 | zh_TW |
dc.title | A Generalized Gaussian Components based Density Estimation Algorithm and its Applications | en |
dc.type | Thesis | |
dc.date.schoolyear | 99-1 | |
dc.description.degree | 博士 | |
dc.contributor.coadvisor | 張天豪 | |
dc.contributor.oralexamcommittee | 賴飛羆,高成炎,趙坤茂,林軒田,陳倩瑜,蔡懷寬 | |
dc.subject.keyword | 機器學習,高斯函數,混合模型,分類,演算法,演化式計算,密度,估計演算法,微型核糖,核酸, | zh_TW |
dc.subject.keyword | Machine Learning,Gaussian Mixture Model,Classification Algorithm,Evolutionary Computing,Kernel Density Estimation Algorithm,MicroRNA, | en |
dc.relation.page | 67 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2011-02-09 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
顯示於系所單位: | 資訊工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-100-1.pdf 目前未授權公開取用 | 982.45 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。