Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 農藝學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749
Full metadata record
???org.dspace.app.webui.jsptag.ItemTag.dcfield???ValueLanguage
dc.contributor.advisor劉力瑜(Li-Yu Daisy Liu)
dc.contributor.authorMiao-Shan Yenen
dc.contributor.author顏妙珊zh_TW
dc.date.accessioned2021-06-08T04:26:48Z-
dc.date.copyright2010-02-24
dc.date.issued2010
dc.date.submitted2010-02-10
dc.identifier.citationBellman, R. E. (2003). Dynamic Programming. Courier Dover Publications, Mineola
NY.
Brown, C. and Davis, H. T. (2006). Receiver operating characteristics curves and
related decision measures: A tutorial. Chemometrics and intelligent laboratory
systems, 80:24–38.
Butte, A. (2002). The use and analysis of microarray data. Nature Reviews Drug
Discovery, 1:951–960.
D’Agostino, R. B. and Stephens, M. A. (1986). Goodness-of-fit techniques. M.
Dekker.
Devroye, L., Gy‥orfi, L., and Lugosi, G. (1996). A probabilistic theory of pattern
recognition. Springer.
Dougherty, E. R., Kim, S., and Chen, Y. (2000). Coefficient of determination in
nonlinear signal processing. Signal Processing, 80:2219–2235.
Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification. Wiley-
Interscience.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters,
27:861–874.
Fran﹐cois, D., Wertz, V., and Verleysen, M. (2006). The permutation test for feature
selection by mutual information. In European Symposium on Artificial Neural
Networks Bruges (Belgium).
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of
Statistics, 19:1–141.
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection.
Journal of Machine Learning Research, 3:1157–1182.
Hsing, T., Liu, L.-Y., Brun, M., and Dougherty, E. R. (2005). The coefficient of
intrinsic dependence (feature selection using el CID). Pattern Recognition, 38:623–
636.
Huang, J.-J., Cai, Y.-Z., and Xu, X.-M. (2008). A parameterless feature ranking
algorithm based on mi. Neurocomputing, 71:1656 1668.
Huang, S.-Y., Lee, M.-H., and Hsiao, C. K. (2009). Nonlinear measuresofassociationwithkernelcanonicalcorrelationanalysis
and applications. Journal ofStatisticalPlanningandInference,
139:2162–2174.
Jain, A. and Zongker, D. (1997). Feature selection: Evaluation, application, and
small sample performance. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 19:153–158.
Jain, A. K., Duin, R. P., and Mao, J. (1999). Statistical pattern recognition: A
review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:4–
37.
Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial
Intelligence, 97:273–324.
Kraskov, A., St‥ogbauer, H., and Grassberger, P. (2004). Estimating mutual information.
Physical Review E, 69:066138.
Lance, G. and Williams, W. T. (1967). A general theory of classificatory sorting
strategies. The Computer Journal, 9(4):373–380.
Liu, L.-Y. D. (2005). Coefficient of intrinsic dependence: a new measure of association.
PhD thesis, Texas A&M University.
Liu, L.-Y. D., Chen, C.-Y., Chen, M.-J. M., Tsai, M.-S., Lee, C.-H. S., Phang,
T. L., Chang, L.-Y., Kuo, W.-H., Hwa, H.-L., Lien, H.-C., Jung, S.-M., Lin,
Y.-S., Chang, K.-J., and Hsieh, F.-J. (2009). Statistical identification of gene
association by CID in application of constructing ER regulatory network. BMC
Bioinformatics, 10:85–98.
Lopes, F. M., Jr, D. C. M., and Jr, R. M. C. (2008). Feature selection environment
for genomic applications. BMC Bioinformatics, 9:451–459.
Ma, T.-h. (2008). Identification of the gene signatures in microarray data by CID.
Master’s thesis, National Taiwan University.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate
observations. Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., 1:281–297.
Pudil, P., Novoviˇcov′a, J., and Kittler, J. (1994). Floating search methods in feature
selection. Pattern Recognition Letters, 15:1119–1125.
Reunanen, J. (2003). Overfitting in making comparisons between variable selection
methods. Journal of Machine Learning Research, 3:1371–1382.
Rizzo, M. L. (2008). Statistical Computing with R. Chapman & Hall/CRC, London,
UK.
Theodoridis, S. and Koutroumbas, K. (2006). Pattern Recognition. Academic Press.
van’t Veer, L. J., Dai, H., ven de Vijver, M. J., He, Y. D., Hart, A. A. M., Mao,
M., Peterse, H. L., ven der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber,
G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R., and Friend,
S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer.
Nature, 415:530–536.
Whitney, A. W. (1971). A direct method of nonparametric measurement selection.
IEEE Transactions on Computers, 9:1100–1103.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/22749-
dc.description.abstract樣式辨認(pattern recognition) 經常運用於資訊檢索上, 其目的為建立一個分類的標準以辨識未知的數據。通常使用特徵選取feature selection) 作為樣式辨認的方法, 意即在一群資料中選出足以表現此筆資料的特徵進而建立資料分類的標準, 以便日後預測新的資料。由於現今生物資訊學的發展使得生物資料產出日益龐大, 高維度的資料使得樣式辨認變得難以進行,而特徵選取因為能夠有效減少資料的維度故成為在樣式辨認中重要的一環。近年來, 本質相關係數已被提出可運用到特徵選取的課題上。本論文的目的在於檢視本質相關係數在變數間有不同相關程度下的表現, 以及研究利用階層式分群法或分位數分群法作為分群過程(binning process) 對本質相關係數在假說檢定上的影響。我們亦探討分群數量對本質相關係數估值的影響。經由模擬的結果可以發現, 本質相關係數可被運用在辨別不同程度或者是不同形式相關的變數, 並且使用階層式分群法(hierarchical allocation) 或使用分位數分群法(quantile allocation) 做分群, 其排列檢定(permutation test) 的虛無假設分布不相同。另外, 不論相關性強弱與否, 較少的分群數會有較佳的檢定力。zh_TW
dc.description.abstractPattern recognition is often used in information retrieval on the purpose for establishing a classification criterion to identify the unknown data. Typically, pattern recognition begins with feature selection that aims to select a subset of features which performs the best under certain evaluation system and to predict future cases.
Due to today’s blooming developments in bioinformatics, tons of high-throughput data have been released. High dimensionality in the high throughput data brings difficulties to analyses and increases the needs of feature selection to effectively reduce the complexity of the data. The coefficient of intrinsic dependence (CID) has been recently proposed to deal with feature selection issues. The goal of this study is to survey the behavior of CID under different strength of association between variables. In addition, we carefully examine how the binning process (hierarchical or quantile allocation) and number of bins affect the hypothesis test of CID. The simulation results demonstrate that CID is capable of identifying different levels or types of association. Besides, the null distributions of hierarchical and quantile allocation are similar. We also observe that a smaller number of subgroups has higher prediction power regardless of the strength or type of associations.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T04:26:48Z (GMT). No. of bitstreams: 1
ntu-99-R96621211-1.pdf: 608713 bytes, checksum: 13033f8bd580e67da8c197974e301587 (MD5)
Previous issue date: 2010
en
dc.description.tableofcontents口試委員審定書i
誌謝ii
摘要iii
Abstract iv
1 INTRODUCTION 1
2 METHOD 4
2.1 The Coefficient of Intrinsic Dependence . . . . . . . . . . . . . . . . . 4
2.2 Quantile versus Hierarchical Allocation . . . . . . . . . . . . . . . . . 6
2.3 The Permutation Test . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Goodness of Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Receiver Operating Characteristics Analysis . . . . . . . . . . . . . . 11
3 Numerical Study 13
3.0.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.0.2 A simple function of ten variables . . . . . . . . . . . . . . . . 13
3.0.3 Combinations of simulation . . . . . . . . . . . . . . . . . . . 14
3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Comparison between quantile and hierarchical allocation . . . 15
3.1.2 Choice of bin number . . . . . . . . . . . . . . . . . . . . . . . 21
4 Conclusion 26
4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References 32
dc.language.isoen
dc.subject特徵選取zh_TW
dc.subject本質相關係數zh_TW
dc.subject分群過程zh_TW
dc.subject階層式分群法zh_TW
dc.subject分位數分群法zh_TW
dc.subject樣式辨認zh_TW
dc.subjectbinning processen
dc.subjecthierarchical allocationen
dc.subjectcoefficient of intrinsic dependenceen
dc.subjectfeature selectionen
dc.subjectpattern recognitionen
dc.subjectquantile allocationen
dc.title探討變數型態以及分群方式對本質相關係數估值之影響zh_TW
dc.titleBehavior of CID under Different Association Types and
Binning Processes
en
dc.typeThesis
dc.date.schoolyear98-1
dc.description.degree碩士
dc.contributor.oralexamcommittee彭雲明(Yun-Ming Pong),陳倩瑜(Chien-Yu Chen)
dc.subject.keyword樣式辨認,特徵選取,本質相關係數,分群過程,階層式分群法,分位數分群法,zh_TW
dc.subject.keywordpattern recognition,feature selection,coefficient of intrinsic dependence,binning process,hierarchical allocation,quantile allocation,en
dc.relation.page34
dc.rights.note未授權
dc.date.accepted2010-02-10
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept農藝學研究所zh_TW
Appears in Collections:農藝學系

Files in This Item:
File SizeFormat 
ntu-99-1.pdf
  Restricted Access
594.45 kBAdobe PDF
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved