基因印記模式的預測性能評估

Ling-Yi Wang; 王齡誼

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18747

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	李文宗
dc.contributor.author	Ling-Yi Wang	en
dc.contributor.author	王齡誼	zh_TW
dc.date.accessioned	2021-06-08T01:23:23Z	-
dc.date.copyright	2014-10-20
dc.date.issued	2014
dc.date.submitted	2014-08-04
dc.identifier.citation	1. Cai YD, Huang T, Feng KY, Hu L, Xie L. A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PLoS ONE 2010; 5: e12726. 2. Chibon F, Lagarde P, Salas S, Perot G, Brouste V, Tirode F, et al. Validated prediction of clinical outcome in sarcomas and multiple types of cancer on the basis of a gene expression signature related to genome complexity. Nat Med 2010; 16: 781-787. 3. Levan K, Partheen K, Osterberg L, Olsson B, Delle U, Eklind S, et al. Identification of a gene expression signature for survival prediction in type I endometrial carcinoma. Gene Expression 2010; 14: 361-370. 4. Roessler S, Jia HL, Budhu A, Forgues M, Ye QH, Lee JS, et al. A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res 2010; 70: 10202-10212. 5. Wan YW, Sabbagh E, Raese R, Qian Y, Luo D, Denvir J, et al. Hybrid models identified a 12-gene signature for lung cancer prognosis and chemoresponse prediction. PLoS ONE 2010; 5: e12222. 6. Zhu CQ, Ding K, Strumpf D, Weir BA, Meyerson M, Pennell N, et al. Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer. J Clin Oncol 2010; 28: 4417-4424. 7. Chen DT, Hsu YL, Fulp WJ, Coppola D, Haura EB, Yeatman TJ, et al. Prognostic and predictive value of a malignancy-risk gene signature in early-stage non-small cell lung cancer. J Natl Cancer Inst 2011; 103: 1859-1870. 8. Herold T, Jurinovic V, Metzeler KH, Boulesteix AL, Bergmann M, Seiler T, et al. An eight-gene expression signature for the prediction of survival and time to treatment in chronic lymphocytic leukemia. Leukemia 2011; 25: 1639-1645. 9. Minguez B, Hoshida Y, Villanueva A, Toffanin S, Cabellos L, Thung S, et al. Gene-expression signature of vascular invasion in hepatocellular carcinoma. J Hepatol 2011; 55: 1325-1331. 10. Salazar R, Roepman P, Capella G, Moreno V, Simon I, Dreezen C, et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J Clin Oncol 2011; 29: 17-24. 11. Wang DY, Done SJ, McCready DR, Boerner S, Kulkarni S, Leong WL. A new gene expression signature, the ClinicoMolecular Triad Classification, may improve prediction and prognostication of breast cancer at the time of diagnosis. Breast Cancer Res 2011; 13: R92. 12. Xie Y, Xiao G, Coombes KR, Behrens C, Solis LM, Raso G, et al. Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non-small-cell lung cancer patients. Clin Cancer Res 2011; 17: 5705-5714. 13. Riester M, Taylor JM, Feifer A, Koppie T, Rosenberg JE, Downey RJ, et al. Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer. Clin Cancer Res 2012; 18: 1323-1333. 14. Schramm SJ, Campain AE, Scolyer RA, Yang YH, Mann GJ. Review and cross-validation of gene expression signatures and melanoma prognosis. J Invest Dermatol 2012; 132: 274-283. 15. Simon R. Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data. Br J Cancer 2003; 89: 1599-1604. 16. Anonymous. Challenges for the 21st century. Nat Genet 2001; 29: 353-354. 17. Moons KG, Kengne AP, Woodward M, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012; 98: 683-690. 18. Refaeilzadeh P, Tang L, Liu H. Cross-validation. In Cross-validation. Springer, 2009. 19. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics 2005; 21: 3301-3307. 20. Efron B. Estimating the error rate of a prediction rule: improvement on cross-validation. J Amer Statist Assoc 1983; 78: 316-331. 21. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC press, 1994. 22. Dougherty ER. Small sample issues for microarray-based classification. Comp Funct Genomics 2001; 2: 28-34. 23. Braga-Neto UM, Dougherty ER. Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004; 20: 374-380. 24. Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691-698. 25. Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol 2008; 61: 1085-1094. 26. Vergouwe Y, Moons KG, Steyerberg EW. External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients. Am J Epidemiol 2010; 172: 971-980. 27. Wolfe D, Hogg R. On constructing statistics and reporting data. Amer Statist 1971; 25: 27-30. 28. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29-36. 29. Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw 1999; 10: 988-999. 30. Karatzoglou A, Meyer D, Hornik K. Support vector machines in R. 2005. 31. Cohen J. Statistical power analysis for the behavioral sciences Lawrence Erlbaum Associates: Hillsdale, NJ, 1988. 32. Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999; 96: 6745-6750. 33. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 2005; 365: 671-679. 34. Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 2006; 98: 262-272. 35. Bolstad BM, Irizarry RA, Åstrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19: 185-193. 36. Tsai CA, Chen CH, Lee TC, Ho IC, Yang UC, Chen JJ. Gene selection for sample classifications in microarray experiments. DNA Cell Biol 2004; 23: 607-614. 37. Chen D, Liu Z, Ma X, Hua D. Selecting genes by test statistics. J Biomed Biotechnol 2005; 2005: 132-138. 38. Liu X, Krishnan A, Mondry A. An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinformatics 2005; 6: 76-89. 39. Yang K, Cai Z, Li J, Lin G. A stable gene selection in microarray data analysis. BMC Bioinformatics 2006; 7: 228-243. 40. Chen JJ, Wang SJ, Tsai CA, Lin CJ. Selection of differentially expressed genes in microarray data analysis. Pharmacogenomics J 2007; 7: 212-220. 41. Consortium TWTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447: 661-678. 42. Samani NJ, Erdmann J, Hall AS, Hengstenberg C, Mangino M, Mayer B, et al. Genomewide association analysis of coronary artery disease. N Engl J Med 2007; 357: 443-453. 43. Garcia-Closas M, Hall P, Nevanlinna H, Pooley K, Morrison J, Richesson DA, et al. Heterogeneity of breast cancer associations with five susceptibility loci by clinical and pathological characteristics. PLoS Genet 2008; 4: e1000054. 44. Benjamini Y, Hochberg Y. Controlling the false discovery rate:a practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57: 289-300. 45. Liu R, Wang X, Chen GY, Dalerba P, Gurney A, Hoey T, et al. The prognostic role of a gene signature from tumorigenic breast-cancer cells. N Engl J Med 2007; 356: 217-226. 46. Chang HY, Sneddon JB, Alizadeh AA, Sood R, West RB, Montgomery K, et al. Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol 2004; 2: 206-214. 47. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 2003; 43: 1947-1958. 48. Mukherjee S, Tamayo P, Rogers S, Rifkin R, Engle A, Campbell C, et al. Estimating dataset size requirements for classifying DNA microarray data. J Comput Biol 2003; 10: 119-142. 49. Jiang W, Simon R. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Stat Med 2007; 26: 5320-5334.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/18747	-
dc.description.abstract	背景：由於DNA微陣列技術的快速發展，讓我們得以利用一個人的基因印記來做疾病類型診斷以及疾病/治療預後的預測。當為了預測而建構一個基因印記時, 常會例行地做基因篩選。然而，基因篩選對一個多基因決定的複雜性疾病不一定有助益。第二，即便交叉驗證或自助抽樣法來做內部驗證可以得到一個基因印記預測表現的合理評估，這些方法的訓練樣本數減少卻是個問題。第三，外部驗證是目前評估一個基因印記預測表現的典範。然而，外部驗證亦可能會受困於異質性的問題。方法：我們推導出一個基因印記的預測表現公式。根據此公式，我們探討了基因篩選的效果並提出一個從配適曲線做一步驟外插法來估計一個用全部樣本所建構的基因印記的預測表現。我們亦提出一個排列檢定法來偵測異質性。模擬研究則用於評估我們所提出的方法。三個DNA微陣列資料則被用於示範。結果：首先，我們發現一個最適基因篩選策略依賴於給定固定樣本數下的基因信噪比與信號強度。而這之中，存在著一個非篩選區使得任何基因篩選程序僅會降低預測表現。第二，我們的一步驟外插法在偏差與變異數之間取得良好的平衡且有小的均方根誤差。最後，排列檢定法的型一誤差非常接近公認的顯著水準，且檢定力隨著樣本數增加而增加。結論：本研究所提出的方法在發展及評估以基因印記為基礎的診斷/預後預測模式應可證明其實用性。	zh_TW
dc.description.abstract	BACKGROUND Microarray technology enables us to make diagnostic and prognostic predictions based on a subject’s gene signature. When building a gene signature for prediction, gene selection is routinely performed. However, gene selection may not be beneficial in polygenic complex diseases. Second, while internal validation by cross-validating or bootstrapping can obtain a fair assessment of the prediction performance of a gene signature, its reduced training sample size is a problem. Third, external validation is the current paradigm for performance evaluation of a gene signature. However, external validation may suffer from the heterogeneity problem. METHODS We derive a learning curve for the prediction performance of a gene signature. Based on it, we study the effect of gene selection and propose a one-step extrapolation method to estimate the prediction performance of the gene signature trained by all the samples. We also propose a permutation test to detect heterogeneity. Simulation studies are implemented to evaluate the proposed methods. Three microarray datasets are used for demonstration. RESULTS First, we found that an optimal gene selection strategy depends on the signal-to-noise ratio and the signal strength given a fixed sample size, and that there exists a no-selecting zone where any gene selection procedure within the zone will jeopardize the prediction performance. Second, we found that the proposed one-step extrapolation method for internal validation strikes a good balance between bias and variance and has small root mean squared error. Finally, we found that the type I error of the proposed permutation test to detect heterogeneity in external validation is close to the nominal significance level, and the power increases when sample size increases. CONCLUSION The proposed three methods in this study should prove useful in developing and evaluating diagnostic/prognostic prediction models based on gene signatures of individual persons.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T01:23:23Z (GMT). No. of bitstreams: 1 ntu-103-D93842001-1.pdf: 3820114 bytes, checksum: ba3ca85a549c6d3b10c9ac0fcb690bc7 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	Catalog 口試委員會審定書中文摘要............................................................................................................................................i 英文摘要..........................................................................................................................................iii Chapter 1. INTRODUCTION………………………….....…….........................................................1 Chapter 2. METHODS………………………….....…… ...................................................................3 2.1 Effect of Gene Selection on Prediction Performance…….................................................3 2.2 One-Step Extrapolation for Internal Validation ……………….........................................5 2.3 Permutation Test to Detect Heterogeneity in External Validation......................................7 Chapter 3. SIMULATION STUDIES............................................................... ..................................8 3.1 One-Step Extrapolation .....................................................................................................8 3.2 Permutation Test Method..................................................................................................11 Chapter 4. RESULTS .........................................................................................................................13 4.1 Effect of Gene Selection on Prediction Performance.......................................................13 4.2 One-Step Extrapolation for Internal Validation ...............................................................15 4.3 Permutation Test to Detect Heterogeneity in External Validation....................................17 Chapter 5. MICROARRAY DATA APPLICATION..........................................................................18 5.1 Effect of Gene Selection on Prediction Performance.......................................................18 5.2 One-Step Extrapolation for Internal Validation ...............................................................20 5.3 Permutation Test to Detect Heterogeneity in External Validation....................................23 Chapter 6. DISSCUSION..................................................................................................................24 Chapter 7. CONCLUSION................................................................................................................29 REFERENCES...................................................................................................................................40 APPENDIX........................................................................................................................................44 List of Figures Figure 1. Contour lines of the optimal levels with respect to the signal-to-noise ratio and the signal strengths ..................................................................................................................30 Figure 2. Contour lines of gene signature sizes with respect to the signal-to-noise ratio and the signal strength to achieve an of 0.95....................................................................31 Figure 3. Contour lines of gene signature sizes with respect to the signal-to-noise ratio and the signal strength to achieve an of 0.90....................................................................32 Figure 4. Comparison of the various methods under different sample size when the naive multiple regression is used to build the gene signature......................................................................33 Figure 5. Comparison of the various methods under different sample size when the support vector machine is used to build the gene signature........................................................................34 Figure 6. The role of training sample size, signal strength, and signal prevalence on the prediction performance and gene selection using the microarray data................................................35 Figure 7. Demonstration of the two microarray data using the proposed extrapolation method................................................................................................................................36 List of Tables Table 1. Type I error of homogeneity testing for the model development dataset and external validation dataset by the permutation test.............................................................................37 Table 2. Power of heterogeneity detection for the model development dataset and external validation dataset by the permutation test..............................................................................................38
dc.language.iso	en
dc.title	基因印記模式的預測性能評估	zh_TW
dc.title	Evaluating the Performances of Gene Signature-based Predictions	en
dc.type	Thesis
dc.date.schoolyear	102-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	鄭光甫,程毅豪,洪弘,林菀俞
dc.subject.keyword	接收者操作特徵曲線,基因印記,預測,學習曲線,排列檢定,驗證,	zh_TW
dc.subject.keyword	receiver operating characteristic curve,gene signature,prediction,learning curve,permutation test,validation,	en
dc.relation.page	51
dc.rights.note	未授權
dc.date.accepted	2014-08-05
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	3.73 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。