探討變數型態以及分群方式對本質相關係數估值之影響

Miao-Shan Yen; 顏妙珊

Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749

Full metadata record

???org.dspace.app.webui.jsptag.ItemTag.dcfield???	Value	Language
dc.contributor.advisor	劉力瑜(Li-Yu Daisy Liu)
dc.contributor.author	Miao-Shan Yen	en
dc.contributor.author	顏妙珊	zh_TW
dc.date.accessioned	2021-06-08T04:26:48Z	-
dc.date.copyright	2010-02-24
dc.date.issued	2010
dc.date.submitted	2010-02-10
dc.identifier.citation	Bellman, R. E. (2003). Dynamic Programming. Courier Dover Publications, Mineola NY. Brown, C. and Davis, H. T. (2006). Receiver operating characteristics curves and related decision measures: A tutorial. Chemometrics and intelligent laboratory systems, 80:24–38. Butte, A. (2002). The use and analysis of microarray data. Nature Reviews Drug Discovery, 1:951–960. D’Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-fit techniques. M. Dekker. Devroye, L., Gy‥orfi, L., and Lugosi, G. (1996). A probabilistic theory of pattern recognition. Springer. Dougherty, E. R., Kim, S., and Chen, Y. (2000). Coefficient of determination in nonlinear signal processing. Signal Processing, 80:2219–2235. Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. Wiley- Interscience. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27:861–874. Fran﹐cois, D., Wertz, V., and Verleysen, M. (2006). The permutation test for feature selection by mutual information. In European Symposium on Artificial Neural Networks Bruges (Belgium). Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19:1–141. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182. Hsing, T., Liu, L.-Y., Brun, M., and Dougherty, E. R. (2005). The coefficient of intrinsic dependence (feature selection using el CID). Pattern Recognition, 38:623– 636. Huang, J.-J., Cai, Y.-Z., and Xu, X.-M. (2008). A parameterless feature ranking algorithm based on mi. Neurocomputing, 71:1656　1668. Huang, S.-Y., Lee, M.-H., and Hsiao, C. K. (2009). Nonlinear measuresofassociationwithkernelcanonicalcorrelationanalysis and applications. Journal ofStatisticalPlanningandInference, 139:2162–2174. Jain, A. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 19:153–158. Jain, A. K., Duin, R. P., and Mao, J. (1999). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:4– 37. Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:273–324. Kraskov, A., St‥ogbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69:066138. Lance, G. and Williams, W. T. (1967). A general theory of classificatory sorting strategies. The Computer Journal, 9(4):373–380. Liu, L.-Y. D. (2005). Coefficient of intrinsic dependence: a new measure of association. PhD thesis, Texas A&M University. Liu, L.-Y. D., Chen, C.-Y., Chen, M.-J. M., Tsai, M.-S., Lee, C.-H. S., Phang, T. L., Chang, L.-Y., Kuo, W.-H., Hwa, H.-L., Lien, H.-C., Jung, S.-M., Lin, Y.-S., Chang, K.-J., and Hsieh, F.-J. (2009). Statistical identification of gene association by CID in application of constructing ER regulatory network. BMC Bioinformatics, 10:85–98. Lopes, F. M., Jr, D. C. M., and Jr, R. M. C. (2008). Feature selection environment for genomic applications. BMC Bioinformatics, 9:451–459. Ma, T.-h. (2008). Identification of the gene signatures in microarray data by CID. Master’s thesis, National Taiwan University. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., 1:281–297. Pudil, P., Novoviˇcov′a, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15:1119–1125. Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3:1371–1382. Rizzo, M. L. (2008). Statistical Computing with R. Chapman & Hall/CRC, London, UK. Theodoridis, S. and Koutroumbas, K. (2006). Pattern Recognition. Academic Press. van’t Veer, L. J., Dai, H., ven de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao, M., Peterse, H. L., ven der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415:530–536. Whitney, A. W. (1971). A direct method of nonparametric measurement selection. IEEE Transactions on Computers, 9:1100–1103.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749	-
dc.description.abstract	樣式辨認(pattern recognition) 經常運用於資訊檢索上, 其目的為建立一個分類的標準以辨識未知的數據。通常使用特徵選取feature selection) 作為樣式辨認的方法, 意即在一群資料中選出足以表現此筆資料的特徵進而建立資料分類的標準, 以便日後預測新的資料。由於現今生物資訊學的發展使得生物資料產出日益龐大, 高維度的資料使得樣式辨認變得難以進行,而特徵選取因為能夠有效減少資料的維度故成為在樣式辨認中重要的一環。近年來, 本質相關係數已被提出可運用到特徵選取的課題上。本論文的目的在於檢視本質相關係數在變數間有不同相關程度下的表現, 以及研究利用階層式分群法或分位數分群法作為分群過程(binning process) 對本質相關係數在假說檢定上的影響。我們亦探討分群數量對本質相關係數估值的影響。經由模擬的結果可以發現, 本質相關係數可被運用在辨別不同程度或者是不同形式相關的變數, 並且使用階層式分群法(hierarchical allocation) 或使用分位數分群法(quantile allocation) 做分群, 其排列檢定(permutation test) 的虛無假設分布不相同。另外, 不論相關性強弱與否, 較少的分群數會有較佳的檢定力。	zh_TW
dc.description.abstract	Pattern recognition is often used in information retrieval on the purpose for establishing a classification criterion to identify the unknown data. Typically, pattern recognition begins with feature selection that aims to select a subset of features which performs the best under certain evaluation system and to predict future cases. Due to today’s blooming developments in bioinformatics, tons of high-throughput data have been released. High dimensionality in the high throughput data brings difficulties to analyses and increases the needs of feature selection to effectively reduce the complexity of the data. The coefficient of intrinsic dependence (CID) has been recently proposed to deal with feature selection issues. The goal of this study is to survey the behavior of CID under different strength of association between variables. In addition, we carefully examine how the binning process (hierarchical or quantile allocation) and number of bins affect the hypothesis test of CID. The simulation results demonstrate that CID is capable of identifying different levels or types of association. Besides, the null distributions of hierarchical and quantile allocation are similar. We also observe that a smaller number of subgroups has higher prediction power regardless of the strength or type of associations.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T04:26:48Z (GMT). No. of bitstreams: 1 ntu-99-R96621211-1.pdf: 608713 bytes, checksum: 13033f8bd580e67da8c197974e301587 (MD5) Previous issue date: 2010	en
dc.description.tableofcontents	口試委員審定書i 誌謝ii 摘要iii Abstract iv 1 INTRODUCTION 1 2 METHOD 4 2.1 The Coefficient of Intrinsic Dependence . . . . . . . . . . . . . . . . . 4 2.2 Quantile versus Hierarchical Allocation . . . . . . . . . . . . . . . . . 6 2.3 The Permutation Test . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Goodness of Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Receiver Operating Characteristics Analysis . . . . . . . . . . . . . . 11 3 Numerical Study 13 3.0.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.0.2 A simple function of ten variables . . . . . . . . . . . . . . . . 13 3.0.3 Combinations of simulation . . . . . . . . . . . . . . . . . . . 14 3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Comparison between quantile and hierarchical allocation . . . 15 3.1.2 Choice of bin number . . . . . . . . . . . . . . . . . . . . . . . 21 4 Conclusion 26 4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 References 32
dc.language.iso	en
dc.subject	特徵選取	zh_TW
dc.subject	本質相關係數	zh_TW
dc.subject	分群過程	zh_TW
dc.subject	階層式分群法	zh_TW
dc.subject	分位數分群法	zh_TW
dc.subject	樣式辨認	zh_TW
dc.subject	binning process	en
dc.subject	hierarchical allocation	en
dc.subject	coefficient of intrinsic dependence	en
dc.subject	feature selection	en
dc.subject	pattern recognition	en
dc.subject	quantile allocation	en
dc.title	探討變數型態以及分群方式對本質相關係數估值之影響	zh_TW
dc.title	Behavior of CID under Different Association Types and Binning Processes	en
dc.type	Thesis
dc.date.schoolyear	98-1
dc.description.degree	碩士
dc.contributor.oralexamcommittee	彭雲明(Yun-Ming Pong),陳倩瑜(Chien-Yu Chen)
dc.subject.keyword	樣式辨認,特徵選取,本質相關係數,分群過程,階層式分群法,分位數分群法,	zh_TW
dc.subject.keyword	pattern recognition,feature selection,coefficient of intrinsic dependence,binning process,hierarchical allocation,quantile allocation,	en
dc.relation.page	34
dc.rights.note	未授權
dc.date.accepted	2010-02-10
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
Appears in Collections:	農藝學系

Files in This Item:

File	Size	Format
ntu-99-1.pdf Restricted Access	594.45 kB	Adobe PDF

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets