以排列法篩選重複測量微陣列晶片資料中的顯著基因

Tzu-Chi Lee; 李子奇

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34398

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	彭雲明(Yun-ming Pong)
dc.contributor.author	Tzu-Chi Lee	en
dc.contributor.author	李子奇	zh_TW
dc.date.accessioned	2021-06-13T06:06:36Z	-
dc.date.available	2008-06-20
dc.date.copyright	2006-06-20
dc.date.issued	2006
dc.date.submitted	2006-06-12
dc.identifier.citation	Book: 1 Peter J. Diggle, Patrick J. Heagerty, Kung-yee Liang and Scott L. Zeger. (1994) Analysis of longitudinal data. Oxford 2 Richard M. Simon, Edward L. Korn, Lisa M. McShane, et al. (2003) Design and analysis of DNA microarray investigations. Springer 3 Annette J. Dobson. (2002) An introduction to generalized linear models. Chapman & Hall/CRC 2nd edition 4 Phillip Good. (2005) Permutation, parametric, and bootstrap tests of hypotheses. Springer 3rd edition. Journal Article: 1 Redman J. C., Haas B. J., Tanimoto G. and Town C. D. (2004) Development and evaluation of an Arabidopsis whole genome Affymetrix probe array The Plant Journal Vol.38, Issue 3, p.545. 2 Vahey M. T., Nau M. E., Jagodzinsk L. L. et al.(2002) Impact of viral infection on the gene expression profiles of proliferating normal human peripheral blood mononuclear cells infected with HIV type 1 RF AIDS research and human retrovirues Vol.18, No.3, p.179-192. 3 Thomas, J. G., Olson, J. M., Tapscott S. J. and Zhao, L. P. (2001) An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res., Vol.11, p.1227-1236. 4 Zhao, L.P., Prentice R. and Breeden L. (2001) Statistical modeling of large microarray data sets to identity stimulus-response profiles. Proc. Natl Acad. Sci. USA, Vol.98, p.5631-5636. 5 Tusher V. G., Tibshirani R. and Chu G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA, Vol.98, p.5116-5121. 6 Efron B, Tibshirani R, Storey JD, and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. Vol.96, p.1151-1160 7 Pan W and Wall M. M. (2002) Small-sample adjustments in using the sandwich variance estimator in generalized estimating equations. Statistics in Medicine Vol.21, No.10, p.1429-1441. 8 Xu R, Li X. (2003) A comparison of parametric versus permutation methods with applications to general and temporal microarray gene expression data. Bioinformatics. Vol.19, No.10, p.1284-1289. 9 Golub T.R., Slonim D.K., Tamayo P. Huard,C., Gaasenbeek M., Mesirov J.P., Coller H., Loh M.L., Downing J.R., Caligiuri M.A., Bloomfield C.D. and Lander E.S. (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science Vol.286, p.531-537. 10 Hanley J. A., Negassa A., Edwardes M. ,Forrester J. E. (2003) Statistical analysis of correlated data using generalized estimating equations (GEE): an orientation. American Journal of Epidemiology Vol.157, No.4, p.364-375. 11 Nelder J. A. and Wedderburn R. W. M. (1972) Generalized linear models. Journal of the Royal Statistical Society, Series A. Vol.135, p.370-384. 12 Liang K. Y. and Zeger S. L. (1986) Longitudinal data analysis using generalized linear models. Biometrika Vol.73, p.13-22. 13 Mancl L. A. and DeRouen T. A. (2001) A covariance estimator for GEE with improved small-sample properities. Biometrics Vol.57, p.126-134. 14 Benjamin Y. and Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B. Vol.57, p.289-300. 15 Benjamini Y, Yekutieli D. (2001) The control of the false discovery rate in multiple testing under dependency ANN STAT, Vol.29, No.4, p1165-1188. 16 Xu X. L., Olson J. M. and Zhao L. P. (2002) A regression-based methods to identify differentially expressed genes in microarray time course studies and its application in an inducible Huntington’s disease transgenic model. Human Molecular Genetics Vol.11, No 17, p.1977-1985 17 Dobbin K. K., Beer D. G., Meyerson M., Teatman T. J. et al. (2005) Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrrays. Clinical Cancer Research Vol.11, p.565-572
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/34398	-
dc.description.abstract	重複測量試驗設計在研究基因調控路徑上有很多好處，重複觀察多個基因在不同時間點上的表現，將可獲得各個基因表現的先後次序之資訊，而發生時間的先後為建構基因的因果關係之基本要件。最近十年來，基因微陣列技術也對生物相關領域的發展有莫大的幫助；然而，目前基因微陣列實驗的成本仍然很高，大部份的重複測量基因微陣列實驗僅有幾個生物體的重複。由於許多重複測量的分析工具都是基於大樣本理論的架構下發展出來的，這些方法在小樣本資料的應用上通常表現不佳，因此也就不適合應用於重複測量基因微陣列資料的分析上，包括近年來廣泛使用於分析相依資料的廣義估計方程式(GEE)方法。我們提出使用GEE合併「排列法」來處理GEE在小樣本資料表現不佳的問題。電腦模擬的結果顯示，「GEE合併單變數排列法」並使用以模式為基礎的變方估計式(Model-based variance estimator)，在控制名義上所宣告的第一型統計錯誤及維持相對高的統計檢定力上有很好的表現。假如樣本數十分少，例如：少於五個時；我們則建議使用「GEE合併多變數排列法」並使用以模式為基礎的變方估計式，進行篩選重複測量微陣列資料上的顯著基因，這樣的分析架構使得在控制一定數目的偽陽性(False positive)下，可維持相對高的偵測顯著基因之能力。	zh_TW
dc.description.abstract	Repeated measurement design has lots of advantages on the investigation of underlying genetic pathway. Recently decade, microarray technology also has great aid of improvements in biology relative fields. Because the cost of microarray is still high, most of microarray experiments with repeated measurement design are only several biology replicates. Many repeated measurement analysis tools are based on asymptotic theory, the small samples performance of these methods are often unsuitable to microarray repeated measurement data including the popular generalized estimating equations (GEE) method for analysis of correlated data. We suggest by using GEE combining with permutation methods to solve the problem. The simulation results show that model-based variance estimator with univariate permutation GEE to analyze repeated measurement microarray data performs well on the controlling of nominal type I error with maintaining relative high power. If the sample sizes are extremely small, e.g., less than 5, we propose to use model-based variance estimator with multivariate permutation methods to control the number of false positive with maintaining relative high detective ability.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T06:06:36Z (GMT). No. of bitstreams: 1 ntu-95-D90621201-1.pdf: 983404 bytes, checksum: 6d1f803821198e06e7c61b05ed898615 (MD5) Previous issue date: 2006	en
dc.description.tableofcontents	中文摘要 Abstract 1 Introduction 1 2 Biology Background and DNA Microarray Technology 2.1 Basic Biology of Gene Expression 5 2.2 DNA Microarray Technology 6 2.2.1 cDNA Microarrays 2.2.2 Oligonucleotide Microarray 2.3 Gene Expression Data 8 3 Generalized Estimating Equations (GEE) 3.1 Why use GEE for filtering genes? 9 3.2 Generalized Linear Model 10 3.3 Quasi-likelihood 10 3.4 Generalized Estimating Equations 11 4 Small-sample Adjustments in the Sandwich Variance Estimator of GEE 4.1 Wald Chi-square Test 14 4.2 Small Sample Performance of the Robust Wald Chi-square Test 15 4.2.1 Generating Simulated Data With a Known Covariance Matrix 4.2.2 The procedure for Assessment Criteria Estimation: Type I Error and Power 4.2.3 Wald chi-square test with small sample size 4.3 Bias Correction for the Sandwich Variance Estimator 20 4.4 Variation Correction for the Sandwich Variance Estimator 20 5 Significance p-value and Multiple Comparisons Problem 5.1 Multiple Hypothesis Testing 23 5.2 Gene-specific p-value Adjustments in the Repeated Measurement Microarray Significance Analysis 24 5.3 Permutation Methods 25 5.4 Adjustments of P value 30 6 Significance Analysis of Repeated Measurement Gene Expression Data 6.1 Modeling 31 6.1.1 One-group Problem 6.1.2 Multiple-groups Problem 6.2 Simulation Study: GEE and permutation GEE 33 6.2.1 Univariate Permutation GEE and GEE 6.2.2 Mutivariate permutation GEE 6.2.3 Adjustment of P value 7 Real Data analysis 7.1 Dobbin’s Data 37 7.1.1 Data Description 7.1.2 Filtering Genes in Dobbin’s Dataset within 100 housekeeping genes 7.1.3 Filtering Genes in Dobbin’s Dataset within all 22,283 genes 8 Conclusion 40 Figures 42 Tables 56 Reference 94 A Checking for Simulated Data Sets 96 B R package for GEE estimation 102 C Notations and Abbreviations 103 D PC Cluster System 105
dc.language.iso	en
dc.subject	微陣列	zh_TW
dc.subject	重複測量	zh_TW
dc.subject	廣義估計方程式	zh_TW
dc.subject	排列法	zh_TW
dc.subject	多變數排列法	zh_TW
dc.subject	Permutation	en
dc.subject	Repeated measurement	en
dc.subject	Microarray	en
dc.subject	Multivariate permutation	en
dc.subject	Generalized estimating equations	en
dc.title	以排列法篩選重複測量微陣列晶片資料中的顯著基因	zh_TW
dc.title	Permutation Methods for Filtering Genes on Microarray Repeated Measurement Data	en
dc.type	Thesis
dc.date.schoolyear	94-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	林俊隆,蘇秀媛,歐益昌,歐尚靈,劉力瑜
dc.subject.keyword	微陣列,重複測量,廣義估計方程式,排列法,多變數排列法,	zh_TW
dc.subject.keyword	Microarray,Repeated measurement,Generalized estimating equations,Permutation,Multivariate permutation,	en
dc.relation.page	105
dc.rights.note	有償授權
dc.date.accepted	2006-06-12
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-95-1.pdf 未授權公開取用	960.36 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。