利用建模方式發展一針對時間序列之群集演算法

Yu-Ho Kuo; 郭郁禾

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40828

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊曜宇(Eric Y. Chuang)
dc.contributor.author	Yu-Ho Kuo	en
dc.contributor.author	郭郁禾	zh_TW
dc.date.accessioned	2021-06-14T17:02:17Z	-
dc.date.available	2008-08-08
dc.date.copyright	2008-08-08
dc.date.issued	2008
dc.date.submitted	2008-07-29
dc.identifier.citation	1. Simon, R., Challenges of microarray data and the evaluation of gene expression profile signatures. Cancer Invest, 2008. 26(4): p. 327-32. 2. Trevino, V., F. Falciani, and H.A. Barrera-Saldana, DNA microarrays: a powerful genomic tool for biomedical and clinical research. Mol Med, 2007. 13(9-10): p. 527-41. 3. Katagiri, F. and J. Glazebrook, Pattern discovery in expression profiling data. Curr Protoc Mol Biol, 2005. Chapter 22: p. Unit 22 5. 4. Grant, G.R., E. Manduchi, and C.J. Stoeckert, Jr., Analysis and management of microarray gene expression data. Curr Protoc Mol Biol, 2007. Chapter 19: p. Unit 19 6. 5. Eisen, M.B., et al., Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 1998. 95(25): p. 14863-8. 6. Herwig, R., et al., Large-scale clustering of cDNA-fingerprinting data. Genome Res, 1999. 9(11): p. 1093-105. 7. Tamayo, P., et al., Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A, 1999. 96(6): p. 2907-12. 8. McLachlan, G.J., K.-A. Do, and C. Ambroise, Analyzing Microarray Gene Expression Data. 2004. 9. Tseng, G.C. and W.H. Wong, Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics, 2005. 61(1): p. 10-6. 10. Moller-Levet, C.S., K.H. Cho, and O. Wolkenhauer, Microarray data clustering based on temporal variation: FCV with TSD preclustering. Appl Bioinformatics, 2003. 2(1): p. 35-45. 11. Ernst, J. and Z. Bar-Joseph, STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics, 2006. 7: p. 191. 12. Ernst, J., G.J. Nau, and Z. Bar-Joseph, Clustering short time series gene expression data. Bioinformatics, 2005. 21 Suppl 1: p. i159-68. 13. Ramoni, M.F., P. Sebastiani, and I.S. Kohane, Cluster analysis of gene expression dynamics. Proc Natl Acad Sci U S A, 2002. 99(14): p. 9121-6. 14. Costa, I.G., A. Schonhuth, and A. Schliep, The Graphical Query Language: a tool for analysis of gene expression time-courses. Bioinformatics, 2005. 21(10): p. 2544-5. 15. Schliep, A., A. Schonhuth, and C. Steinhoff, Using hidden Markov models to analyze gene expression time course data. Bioinformatics, 2003. 19 Suppl 1: p. i255-63. 16. Sacchi, L., et al., TA-clustering: cluster analysis of gene expression profiles through Temporal Abstractions. Int J Med Inform, 2005. 74(7-8): p. 505-17. 17. Sahoo, D., et al., Extracting binary signals from microarray time-course data. Nucleic Acids Res, 2007. 35(11): p. 3705-12. 18. Tibshirani., R., G. Walther., and T. Hastie, Estimating the number of clusters in a dataset via the Gap statistic Technical Report, 2000. 208. 19. Chuang, E.Y., et al., Gene expression after treatment with hydrogen peroxide, menadione, or t-butyl hydroperoxide in breast cancer cells, in Cancer Res. 2002. p. 6246-54. 20. Clopper, C.J. and E.S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 1934. 26: p. 404-413. 21. Quackenbush, J., Computational analysis of microarray data. Nat Rev Genet, 2001. 2(6): p. 418-27. 22. Thalamuthu, A., et al., Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics, 2006. 22(19): p. 2405-12. 23. Quackenbush, J., Microarray data normalization and transformation. Nat Genet, 2002. 32 Suppl: p. 496-501. 24. Dennis, G., Jr., et al., DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol, 2003. 4(5): p. P3. 25. Hubert, L. and P. Arabie, Comparing partitions. Journal of Classification, 1985. 2: p. 193–218. 26. Mugavin, M.E., Multidimensional scaling: a brief overview. Nurs Res, 2008. 57(1): p. 64-8. 27. Scott, G.K., et al., Vitamin K3 (menadione)-induced oncosis associated with keratin 8 phosphorylation and histone H3 arylation. Mol Pharmacol, 2005. 68(3): p. 606-15. 28. Caricchio, R., et al., Apoptosis provoked by the oxidative stress inducer menadione (Vitamin K(3)) is mediated by the Fas/Fas ligand system. Clin Immunol, 1999. 93(1): p. 65-74. 29. Ullrich, O., O. Ciftci, and R. Hass, Proteasome activation by poly-ADP-ribose-polymerase in human myelomonocytic cells after oxidative stress. Free Radic Biol Med, 2000. 29(10): p. 995-1004. 30. Brown, N.S. and R. Bicknell, Hypoxia and oxidative stress in breast cancer. Oxidative stress: its effects on the growth, metastatic potential and response to therapy of breast cancer. Breast Cancer Res, 2001. 3(5): p. 323-7. 31. Day, W.H. and H. Edelsbrunner, Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification, 1984. 1: p. 1-24.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/40828	-
dc.description.abstract	生物晶片是一使用在大量偵測基因表現量的方法。利用生物晶片的實驗設計主要分成兩大類，一為比較基因在兩種不一樣情況下的表現，另一者為觀看基因在時間變化上的表現。為了分析生物晶片所得到的資料，數種分群方式應運而生。利用分群的方式可以尋找類似表現的晶片或者類似表現的基因，藉由這種方法，找出可能為同種型態的樣本或者互相影響的基因。傳統的分群方法並沒有針對時間序列進行設計，是以分析的結果往往有所缺漏。近年來已有不少團隊針對時間序列設計分群方法，然而這些方法通常只適用在某些情況之下，比如說只適合於時間點多或者時間點少的資料。本研究提出一利用差距統計(Gap statistic)演算法來資料本身的資訊建立可能的時間走勢模型，再以這些模型進行分群的動作。並且使用二項式檢定法(binomial test)偵測分群結果中較為重要的群組。本研究以模擬的資料以及已經發表的生物晶片實驗的資料進行效能的測試，並且與一已經發表的時間序列分群演算法作比較。	zh_TW
dc.description.abstract	Microarray is a high-throughput technology for investigating gene expression. There are two major kinds of experiment designs in Microarray, one is case control study and another is time series study. Clustering methods are developed in order to analyze microarray data. Clustering can help to discover similar samples or co-related genes according to expression profiles of samples or genes. Traditional clustering methods are not designed for analyzing time series therefore are easy to miss information or misclassify. Although there exist several clustering method for time series, these clustering methods is not suitable for all the condition. We create a new time series clustering Gap statistic and Template based clustering (GT-clustering) for analyzing time series microarray data in all condition (not matter long time series or short time series). GT-clustering designs templates for clustering by using Gap statistic. Besides, binomial test is applied to identify the significant clusters. In this study, the algorithm is tested in simulation data and published data and compared the result with a published algorithm.	en
dc.description.provenance	Made available in DSpace on 2021-06-14T17:02:17Z (GMT). No. of bitstreams: 1 ntu-97-R95921053-1.pdf: 3488655 bytes, checksum: 310b3d104eab6fddf46d0733e8464030 (MD5) Previous issue date: 2008	en
dc.description.tableofcontents	口試委員會審定...………………………………………………………………….i 謝誌………...……………………………………………………………………… ii 中文摘要…………..………………………………………………………………..iii Abstract……..………………………………………………………………………iv Contents………………………………………..........................................................v List of Figures……………………………………………………………………….vi List of Tables………………………………………………………………………...viii CHAPTER 1. INTRODUCTION 1 1.1 MICROARRAY USING IN GENE EXPRESSION 1 1.2 CLUSTERING METHODS FOR CLASS DISCOVERY 2 1.3 MOTIVATION OF DEVELOPING A CLUSTERING ALGORITHM 4 1.3.1 Related Work 4 1.4 OVERVIEW OF THE RESEARCH 6 CHAPTER 2. METHODS AND MATERIALS 7 2.1 PRODUCING TEMPLATES 8 2.1.1 Selection of significant change genes for making templates 8 2.1.2 Using gap statistic to estimate number of clusters 9 2.1.3 Generate templates for each cluster 12 2.2 MAPPING GENES TO SIMILAR TEMPLATES 14 2.2.1 Estimation of probability of random noise and FDR 14 2.3 IDENTIFICATION OF SIGNIFICANT CLUSTERS 18 2.4 VALIDATE THE ALGORITHM WITH SIMULATED AND REAL DATA 19 2.4.1 Simulated data 20 2.4.2 Real data 24 CHAPTER 3. RESULTS 25 3.1SIMULATED DATASETS 25 3.1.1 Clustering results of simulated dataset 1 25 3.1.2 Clustering results of simulated dataset 2 26 3.1.3 Clustering results of simulated dataset 3 27 3.1.4 Clustering results of simulated dataset 4 28 3.1.5 Clustering results of simulated dataset 5 29 3.1.6 Clustering results of simulated dataset 6 30 3.1.7 Clustering results of simulated dataset 7 31 3.1.8 Clustering results of simulated dataset 8 32 3.1.9 Clustering results of simulated dataset 9 33 3.1.10 Clustering results of simulated dataset 10 34 3.1.11 Clustering results of simulated dataset 11 35 3.1.12 Clustering results of simulated dataset 12 36 3.1.13 Clustering results of simulated dataset 13 38 3.1.14 Clustering results of simulated dataset 14 39 3.1.15 Clustering results of simulated dataset 15 39 3.1.16 Clustering results of simulated dataset 16 40 3.1.17 Clustering results of simulated dataset 17 41 3.1.18 Clustering results of simulated dataset 18 42 3.1.19 Comparison with STEM 43 3.2REAL DATASETS 59 3.2.1 TBH dataset 59 3.2.2 HP dataset 63 3.2.3 MEN dataset 66 CHAPTER 4.DISCUSSION 70 CHAPTER 5.CONCLUSION 75 REFERENCE 77
dc.language.iso	en
dc.title	利用建模方式發展一針對時間序列之群集演算法	zh_TW
dc.title	Using Dynamic Template Based Clustering to Analyze Time Series Microarray Data	en
dc.type	Thesis
dc.date.schoolyear	96-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	歐陽彥正,賴亮全,蔡孟勳
dc.subject.keyword	生物晶片,基因表現量時間序列分群,差距統計,二項式檢定法,	zh_TW
dc.subject.keyword	Microarray,gene expression,time series,clustering,gap statistic,binomial test,	en
dc.relation.page	80
dc.rights.note	有償授權
dc.date.accepted	2008-07-30
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電機工程學研究所	zh_TW
顯示於系所單位：	電機工程學系

文件中的檔案：

檔案	大小	格式
ntu-97-1.pdf 目前未授權公開取用	3.41 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。