請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36842完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳正剛 | |
| dc.contributor.author | Chen-Sui Lin | en |
| dc.contributor.author | 林辰穗 | zh_TW |
| dc.date.accessioned | 2021-06-13T08:18:45Z | - |
| dc.date.available | 2005-07-31 | |
| dc.date.copyright | 2005-07-26 | |
| dc.date.issued | 2005 | |
| dc.date.submitted | 2005-07-19 | |
| dc.identifier.citation | [1] Anderberg, M. (1973). Cluster Analysis for Applications. Academic Presses.
[2] Lewis-Beck, M.S. (c1994). Factor analysis and related techniques, London : Sage Publications [3] Lin, W. T. (2004). Systematic data preprocess procedures and factor extraction of multiple phenotypes for one-color microarray, National Taiwan University. [4] Eisen, M. B., Spellman, P. T., Brown, P.O. and Botstein, D.(1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad.Sci. USA 95, 14863-14868 [5] Rindflesch, T.C., Libbus, B., Hristovski, D., et al. (2003). Semantic Relations Asserting the Etiology of Genetic Diseases. Proc AMIA Symp, Submitted [6] Heyer, L.J., Kruglyak, S., Yooseph, S.(1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9, 1106-1115. [7] Milligan, G.W., Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in data set. Psychometrika, 50:159—179 [8] Tibshirani. R, Walther. G, Hastie. T (2001). Estimating the number of clusters in a dataset via the Gap statistic. [9] D’haeseleer, P. (2000). Reconstructing Gene Networks from Large Scale Gene Expression Data. University of New Mexico. [10] Belsley, D.A. et al. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley. [11] Hotelling, H. (1933). Analysis of a complex of statistical variable into principal componets. J. Educ. Psysch., vol. 24, pp. 417-441. [12] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd Edition. Springer, New York [13] Dudoit, S., Yang, Y., Callow, M.J., Speed T.P. (2000). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat.sin., inn press. [14] Tibshirani. R, Tusher. V, Chu. C. (2001). Significance analysis of microarrays applied to ionizing radiation response. Proceedings of the National Academy of Sciences. First published April 17, 2001, 10.1073/pnas.091062498. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/36842 | - |
| dc.description.abstract | 群集分析(Clustering Analysis)和因素分析(Factor Analysis)都是用來探究變數(attributes)間相關結構的統計分析方法,而這些相關結構通常是根據變數彼此的相似程度(similarity),或是相互關係,將變數分成有意義的群組。然而,若樣本數遠小於變數個數,在運算上會造成滿秩(Insufficiency of full-rank)而無法使用因素分析。就生物微陣列晶片(Microarray)資料分析為例,每片生物微晶片都內含數以萬計的基因表現量(gene expressions),但往往基因個數遠超過生物微晶片的片數。另一方面,使用群集分析可幫助處理變數較多但樣本數較少資料,只是使用群集分析也有幾項缺點,包括以皮爾森相關係數(Pearson correlation coefficient)作為變數間非相似程度(dissimilarity)的誤用、判斷分群結果的品質好壞,以及群組個數的決定等。
本研究的第一個目的在於探討變數間的相互關係結構,並且發展新的群集分析方法以將相互關聯的變數分群。相對於 ”R2 with PCA” 較著重於群組之間的線性關係;”Variance explanation” 不只著重變數間的相互關係,亦著重變數間的變異程度。本研究的第二個目的為提出數個評斷分群結果優劣的指標,而這些指標考慮到變數間的相互關係以及不同分群結果所能提供的變異解釋量等。最後,這些新的方法會應用到兩個案例:一為分析十九個人體血液檢測指標;另一為唐氏症生物微晶片資料分析。 | zh_TW |
| dc.description.abstract | The unsupervised classification methods, Clustering analysis and Factor analysis, intend to find meaningful structures existing in the observed attributes. These structures are usually expressed by grouping of attributes based on the similarities, or relationships among the attributes. However, the disadvantage of Factor analysis lies on insufficiency of full-rank in numerical computation. For example, in microarray data analysis, expressions of 10,000~20,000 genes are collected for each array. The number of genes is usually far larger than number of microarray. Clustering analysis, on the other hand, can help handle with a vast amount of attributes with few samples. There are some drawbacks of Clustering analysis, including of misapplying the correlation coefficient and the difficulties of evaluating the cluster quality as well as the determination of the cluster number.
In this research, we first discuss characterization of interrelationships among attributes, and then develop clustering methods suitable for grouping interrelated attributes. The “R2 with PCA” method lays more stress on the linear relationships between two clusters, while the “Variance explanation” method focuses not only on interrelations among attributes but also on attributes variations. This research also proposes the statistics for the evaluation of the cluster quality, and these statistics take into considerations the interrelationships among clusters and the variances explained of clusters. Finally, we apply these novel methods to two cases; one is 19 blood tests of 24 human; and the other is Down syndrome microarray data. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-13T08:18:45Z (GMT). No. of bitstreams: 1 ntu-94-R92546020-1.pdf: 2707710 bytes, checksum: eeb55eb7089cd265e2d65de50b21e3b4 (MD5) Previous issue date: 2005 | en |
| dc.description.tableofcontents | Abstract i
Contents iii Contents of Figures iv Contents of Tables vi Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Research Objective 7 1.3 Thesis organization 8 Chapter 2 Clustering by Attributes Interrelations 9 2.1 Characterization of Attribute Interrelations 9 2.2 Clustering methods 16 2.2.1 Clustering by R2 with Single linkage, Complete linkage, and Average linkage 17 2.2.2 Clustering by R2 with PCA 24 2.2.3 Clustering by Variance explanation 27 2.2.4 Comparison of the Blood tests Clustering Results 31 Chapter 3 Clustering Evaluation and group number determination 34 3.1 Evaluation of clustering results 34 3.2 Selection of a proper clustering method and the number of groups 43 Chapter 4 Selection of Differentially expressed gene groups 51 Chapter 5 Conclusions and Future Researches 89 Reference 90 Appendix Factor Analysis 91 | |
| dc.language.iso | en | |
| dc.subject | 群組個數決定 | zh_TW |
| dc.subject | 群集分析 | zh_TW |
| dc.subject | 相關性之相異度 | zh_TW |
| dc.subject | 分群結果品質 | zh_TW |
| dc.subject | Dissimilarity using Correlation | en |
| dc.subject | Cluster number determination | en |
| dc.subject | Cluster quality | en |
| dc.subject | Clustering analysis | en |
| dc.title | 考慮相關性之群集分析及其在基因分群上的應用 | zh_TW |
| dc.title | Clustering Analysis by Attributes Interrelations and its Application to Clustering of Differentially Expressed Genes | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 93-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 林怡杏,范治民,范書愷,謝豐舟 | |
| dc.subject.keyword | 群集分析,相關性之相異度,分群結果品質,群組個數決定, | zh_TW |
| dc.subject.keyword | Clustering analysis,Dissimilarity using Correlation,Cluster quality,Cluster number determination, | en |
| dc.relation.page | 95 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2005-07-19 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 工業工程學研究所 | zh_TW |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-94-1.pdf 未授權公開取用 | 2.64 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
