請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 劉力瑜(Li-Yu Daisy Liu) | |
dc.contributor.author | Miao-Shan Yen | en |
dc.contributor.author | 顏妙珊 | zh_TW |
dc.date.accessioned | 2021-06-08T04:26:48Z | - |
dc.date.copyright | 2010-02-24 | |
dc.date.issued | 2010 | |
dc.date.submitted | 2010-02-10 | |
dc.identifier.citation | Bellman, R. E. (2003). Dynamic Programming. Courier Dover Publications, Mineola
NY. Brown, C. and Davis, H. T. (2006). Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and intelligent laboratory systems, 80:24–38. Butte, A. (2002). The use and analysis of microarray data. Nature Reviews Drug Discovery, 1:951–960. D’Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-fit techniques. M. Dekker. Devroye, L., Gy‥orfi, L., and Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. Dougherty, E. R., Kim, S., and Chen, Y. (2000). Coefficient of determination in nonlinear signal processing. Signal Processing, 80:2219–2235. Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. Wiley- Interscience. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27:861–874. Fran﹐cois, D., Wertz, V., and Verleysen, M. (2006). The permutation test for feature selection by mutual information. In European Symposium on Artificial Neural Networks Bruges (Belgium). Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182. Hsing, T., Liu, L.-Y., Brun, M., and Dougherty, E. R. (2005). The coefficient of intrinsic dependence (feature selection using el CID). Pattern Recognition, 38:623– 636. Huang, J.-J., Cai, Y.-Z., and Xu, X.-M. (2008). A parameterless feature ranking algorithm based on mi. Neurocomputing, 71:1656 1668. Huang, S.-Y., Lee, M.-H., and Hsiao, C. K. (2009). Nonlinear measuresofassociationwithkernelcanonicalcorrelationanalysis and applications. Journal ofStatisticalPlanningandInference, 139:2162–2174. Jain, A. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19:153–158. Jain, A. K., Duin, R. P., and Mao, J. (1999). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:4– 37. Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:273–324. Kraskov, A., St‥ogbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69:066138. Lance, G. and Williams, W. T. (1967). A general theory of classificatory sorting strategies. The Computer Journal, 9(4):373–380. Liu, L.-Y. D. (2005). Coefficient of intrinsic dependence: a new measure of association. PhD thesis, Texas A&M University. Liu, L.-Y. D., Chen, C.-Y., Chen, M.-J. M., Tsai, M.-S., Lee, C.-H. S., Phang, T. L., Chang, L.-Y., Kuo, W.-H., Hwa, H.-L., Lien, H.-C., Jung, S.-M., Lin, Y.-S., Chang, K.-J., and Hsieh, F.-J. (2009). Statistical identification of gene association by CID in application of constructing ER regulatory network. BMC Bioinformatics, 10:85–98. Lopes, F. M., Jr, D. C. M., and Jr, R. M. C. (2008). Feature selection environment for genomic applications. BMC Bioinformatics, 9:451–459. Ma, T.-h. (2008). Identification of the gene signatures in microarray data by CID. Master’s thesis, National Taiwan University. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., 1:281–297. Pudil, P., Novoviˇcov′a, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15:1119–1125. Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3:1371–1382. Rizzo, M. L. (2008). Statistical Computing with R. Chapman & Hall/CRC, London, UK. Theodoridis, S. and Koutroumbas, K. (2006). Pattern Recognition. Academic Press. van’t Veer, L. J., Dai, H., ven de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., ven der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536. Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 9:1100–1103. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749 | - |
dc.description.abstract | 樣式辨認(pattern recognition) 經常運用於資訊檢索上, 其目的為建立一個分類的標準以辨識未知的數據。通常使用特徵選取feature selection) 作為樣式辨認的方法, 意即在一群資料中選出足以表現此筆資料的特徵進而建立資料分類的標準, 以便日後預測新的資料。由於現今生物資訊學的發展使得生物資料產出日益龐大, 高維度的資料使得樣式辨認變得難以進行,而特徵選取因為能夠有效減少資料的維度故成為在樣式辨認中重要的一環。近年來, 本質相關係數已被提出可運用到特徵選取的課題上。本論文的目的在於檢視本質相關係數在變數間有不同相關程度下的表現, 以及研究利用階層式分群法或分位數分群法作為分群過程(binning process) 對本質相關係數在假說檢定上的影響。我們亦探討分群數量對本質相關係數估值的影響。經由模擬的結果可以發現, 本質相關係數可被運用在辨別不同程度或者是不同形式相關的變數, 並且使用階層式分群法(hierarchical allocation) 或使用分位數分群法(quantile allocation) 做分群, 其排列檢定(permutation test) 的虛無假設分布不相同。另外, 不論相關性強弱與否, 較少的分群數會有較佳的檢定力。 | zh_TW |
dc.description.abstract | Pattern recognition is often used in information retrieval on the purpose for establishing a classification criterion to identify the unknown data. Typically, pattern recognition begins with feature selection that aims to select a subset of features which performs the best under certain evaluation system and to predict future cases.
Due to today’s blooming developments in bioinformatics, tons of high-throughput data have been released. High dimensionality in the high throughput data brings difficulties to analyses and increases the needs of feature selection to effectively reduce the complexity of the data. The coefficient of intrinsic dependence (CID) has been recently proposed to deal with feature selection issues. The goal of this study is to survey the behavior of CID under different strength of association between variables. In addition, we carefully examine how the binning process (hierarchical or quantile allocation) and number of bins affect the hypothesis test of CID. The simulation results demonstrate that CID is capable of identifying different levels or types of association. Besides, the null distributions of hierarchical and quantile allocation are similar. We also observe that a smaller number of subgroups has higher prediction power regardless of the strength or type of associations. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T04:26:48Z (GMT). No. of bitstreams: 1 ntu-99-R96621211-1.pdf: 608713 bytes, checksum: 13033f8bd580e67da8c197974e301587 (MD5) Previous issue date: 2010 | en |
dc.description.tableofcontents | 口試委員審定書i
誌謝ii 摘要iii Abstract iv 1 INTRODUCTION 1 2 METHOD 4 2.1 The Coefficient of Intrinsic Dependence . . . . . . . . . . . . . . . . . 4 2.2 Quantile versus Hierarchical Allocation . . . . . . . . . . . . . . . . . 6 2.3 The Permutation Test . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Goodness of Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Receiver Operating Characteristics Analysis . . . . . . . . . . . . . . 11 3 Numerical Study 13 3.0.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.0.2 A simple function of ten variables . . . . . . . . . . . . . . . . 13 3.0.3 Combinations of simulation . . . . . . . . . . . . . . . . . . . 14 3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Comparison between quantile and hierarchical allocation . . . 15 3.1.2 Choice of bin number . . . . . . . . . . . . . . . . . . . . . . . 21 4 Conclusion 26 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 References 32 | |
dc.language.iso | en | |
dc.title | 探討變數型態以及分群方式對本質相關係數估值之影響 | zh_TW |
dc.title | Behavior of CID under Different Association Types and
Binning Processes | en |
dc.type | Thesis | |
dc.date.schoolyear | 98-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 彭雲明(Yun-Ming Pong),陳倩瑜(Chien-Yu Chen) | |
dc.subject.keyword | 樣式辨認,特徵選取,本質相關係數,分群過程,階層式分群法,分位數分群法, | zh_TW |
dc.subject.keyword | pattern recognition,feature selection,coefficient of intrinsic dependence,binning process,hierarchical allocation,quantile allocation, | en |
dc.relation.page | 34 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2010-02-10 | |
dc.contributor.author-college | 生物資源暨農學院 | zh_TW |
dc.contributor.author-dept | 農藝學研究所 | zh_TW |
顯示於系所單位: | 農藝學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-99-1.pdf 目前未授權公開取用 | 594.45 kB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。