Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工業工程學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36842
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳正剛
dc.contributor.authorChen-Sui Linen
dc.contributor.author林辰穗zh_TW
dc.date.accessioned2021-06-13T08:18:45Z-
dc.date.available2005-07-31
dc.date.copyright2005-07-26
dc.date.issued2005
dc.date.submitted2005-07-19
dc.identifier.citation[1] Anderberg, M. (1973). Cluster Analysis for Applications. Academic Presses.
[2] Lewis-Beck, M.S. (c1994). Factor analysis and related techniques, London : Sage Publications
[3] Lin, W. T. (2004). Systematic data preprocess procedures and factor extraction of multiple phenotypes for one-color microarray, National Taiwan University.
[4] Eisen, M. B., Spellman, P. T., Brown, P.O. and Botstein, D.(1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad.Sci. USA 95, 14863-14868
[5] Rindflesch, T.C., Libbus, B., Hristovski, D., et al. (2003). Semantic Relations Asserting the Etiology of Genetic Diseases. Proc AMIA Symp, Submitted
[6] Heyer, L.J., Kruglyak, S., Yooseph, S.(1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9, 1106-1115.
[7] Milligan, G.W., Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in data set. Psychometrika, 50:159—179
[8] Tibshirani. R, Walther. G, Hastie. T (2001). Estimating the number of clusters in a dataset via the Gap statistic.
[9] D’haeseleer, P. (2000). Reconstructing Gene Networks from Large
Scale Gene Expression Data. University of New Mexico.
[10] Belsley, D.A. et al. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley.
[11] Hotelling, H. (1933). Analysis of a complex of statistical variable into principal componets. J. Educ. Psysch., vol. 24, pp. 417-441.
[12] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd Edition. Springer, New York
[13] Dudoit, S., Yang, Y., Callow, M.J., Speed T.P. (2000). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat.sin., inn press.
[14] Tibshirani. R, Tusher. V, Chu. C. (2001). Significance analysis of microarrays applied to ionizing radiation response. Proceedings of the National Academy of Sciences. First published April 17, 2001, 10.1073/pnas.091062498.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36842-
dc.description.abstract群集分析(Clustering Analysis)和因素分析(Factor Analysis)都是用來探究變數(attributes)間相關結構的統計分析方法,而這些相關結構通常是根據變數彼此的相似程度(similarity),或是相互關係,將變數分成有意義的群組。然而,若樣本數遠小於變數個數,在運算上會造成滿秩(Insufficiency of full-rank)而無法使用因素分析。就生物微陣列晶片(Microarray)資料分析為例,每片生物微晶片都內含數以萬計的基因表現量(gene expressions),但往往基因個數遠超過生物微晶片的片數。另一方面,使用群集分析可幫助處理變數較多但樣本數較少資料,只是使用群集分析也有幾項缺點,包括以皮爾森相關係數(Pearson correlation coefficient)作為變數間非相似程度(dissimilarity)的誤用、判斷分群結果的品質好壞,以及群組個數的決定等。
本研究的第一個目的在於探討變數間的相互關係結構,並且發展新的群集分析方法以將相互關聯的變數分群。相對於 ”R2 with PCA” 較著重於群組之間的線性關係;”Variance explanation” 不只著重變數間的相互關係,亦著重變數間的變異程度。本研究的第二個目的為提出數個評斷分群結果優劣的指標,而這些指標考慮到變數間的相互關係以及不同分群結果所能提供的變異解釋量等。最後,這些新的方法會應用到兩個案例:一為分析十九個人體血液檢測指標;另一為唐氏症生物微晶片資料分析。
zh_TW
dc.description.abstractThe unsupervised classification methods, Clustering analysis and Factor analysis, intend to find meaningful structures existing in the observed attributes. These structures are usually expressed by grouping of attributes based on the similarities, or relationships among the attributes. However, the disadvantage of Factor analysis lies on insufficiency of full-rank in numerical computation. For example, in microarray data analysis, expressions of 10,000~20,000 genes are collected for each array. The number of genes is usually far larger than number of microarray. Clustering analysis, on the other hand, can help handle with a vast amount of attributes with few samples. There are some drawbacks of Clustering analysis, including of misapplying the correlation coefficient and the difficulties of evaluating the cluster quality as well as the determination of the cluster number.
In this research, we first discuss characterization of interrelationships among attributes, and then develop clustering methods suitable for grouping interrelated attributes. The “R2 with PCA” method lays more stress on the linear relationships between two clusters, while the “Variance explanation” method focuses not only on interrelations among attributes but also on attributes variations. This research also proposes the statistics for the evaluation of the cluster quality, and these statistics take into considerations the interrelationships among clusters and the variances explained of clusters. Finally, we apply these novel methods to two cases; one is 19 blood tests of 24 human; and the other is Down syndrome microarray data.
en
dc.description.provenanceMade available in DSpace on 2021-06-13T08:18:45Z (GMT). No. of bitstreams: 1
ntu-94-R92546020-1.pdf: 2707710 bytes, checksum: eeb55eb7089cd265e2d65de50b21e3b4 (MD5)
Previous issue date: 2005
en
dc.description.tableofcontentsAbstract i
Contents iii
Contents of Figures iv
Contents of Tables vi
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Objective 7
1.3 Thesis organization 8
Chapter 2 Clustering by Attributes Interrelations 9
2.1 Characterization of Attribute Interrelations 9
2.2 Clustering methods 16
2.2.1 Clustering by R2 with Single linkage, Complete linkage, and Average linkage 17
2.2.2 Clustering by R2 with PCA 24
2.2.3 Clustering by Variance explanation 27
2.2.4 Comparison of the Blood tests Clustering Results 31
Chapter 3 Clustering Evaluation and group number determination 34
3.1 Evaluation of clustering results 34
3.2 Selection of a proper clustering method and the number of groups 43
Chapter 4 Selection of Differentially expressed gene groups 51
Chapter 5 Conclusions and Future Researches 89
Reference 90
Appendix Factor Analysis 91
dc.language.isoen
dc.subject群組個數決定zh_TW
dc.subject群集分析zh_TW
dc.subject相關性之相異度zh_TW
dc.subject分群結果品質zh_TW
dc.subjectDissimilarity using Correlationen
dc.subjectCluster number determinationen
dc.subjectCluster qualityen
dc.subjectClustering analysisen
dc.title考慮相關性之群集分析及其在基因分群上的應用zh_TW
dc.titleClustering Analysis by Attributes Interrelations and its Application to Clustering of Differentially Expressed Genesen
dc.typeThesis
dc.date.schoolyear93-2
dc.description.degree碩士
dc.contributor.oralexamcommittee林怡杏,范治民,范書愷,謝豐舟
dc.subject.keyword群集分析,相關性之相異度,分群結果品質,群組個數決定,zh_TW
dc.subject.keywordClustering analysis,Dissimilarity using Correlation,Cluster quality,Cluster number determination,en
dc.relation.page95
dc.rights.note有償授權
dc.date.accepted2005-07-19
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept工業工程學研究所zh_TW
顯示於系所單位:工業工程學研究所

文件中的檔案:
檔案 大小格式 
ntu-94-1.pdf
  未授權公開取用
2.64 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved