Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 公共衛生學院
  3. 流行病學與預防醫學研究所
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69578
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor蕭朱杏(Chuhsing Kate Hsiao)
dc.contributor.authorChi-Hsuan H0en
dc.contributor.author何奇軒zh_TW
dc.date.accessioned2021-06-17T03:19:55Z-
dc.date.available2020-10-03
dc.date.copyright2018-10-03
dc.date.issued2018
dc.date.submitted2018-06-26
dc.identifier.citationAckermann, M., and Strimmer, K. (2009), “A General Modular Framework for Gene Set Enrichment Analysis,” BMC Bioinformatics, 10, 47.
Baggerly, K. A. (2001), “Probability Binning and Testing Agreement between Multivariate Immunofluorescence Histograms: Extending the Chi-Squared Test,” Cytometry, 45, 37-46.
Baringhaus, L., and Franz, C. (2004), “On a New Multivariate Two-Sample Test,” Journal of Multivariate Analysis, 88, 190-206.
Benidt, S., and Nettleton, D. (2015), “SimSeq: A Nonparametric Approach to Simulation of RNA-Sequence Datasets,” Bioinformatics, 31, 2131-2140.
Bubeliny, P. (2011), “Hotelling Test for Highly Correlated Data,” Acta Universitatis Carolinae. Mathematica et Physica, 52, 67-75.
Cheng, S. J. (2015), “Identification of Methylation-Driven Genes with Bayesian Conditional Autoregressive Model,” Master’s thesis, National Taiwan University, Taiwan.
Gentleman, R., Morgan, M., and Huber, W. (2008), “Gene Set Enrichment Analysis,” in: Hahne, F., Gentleman, R., and Falcon, S., (Eds.), (2008), “Bioconductor Case Studies,” New York: Springer, 193-205.
Glazko, G. V., and Emmert-Streib, F. (2009), “Unite and Conquer: Univariate and Multivariate Approaches for Finding Differentially Expressed Gene Sets,” Bioinformatics, 25, 2348-2354.
Goeman, J. J., and Bühlmann, P. (2007), “Analyzing Gene Expression Data in Terms of Gene Sets: Methodological Issues,” Bioinformatics, 23, 980-987.
Goeman, J. J., van de Geer, S. A., de Kort, F., and van Houwelingen, H. C. (2003), “A Global Test for Groups of Genes: Testing Association with a Clinical Outcome,” Bioinformatics, 20, 93-99.
Heller, R., Manduchi, E., Grant, G. R., and Ewens, W. J. (2009), “A Flexible Two-Stage Procedure for Identifying Gene Sets That are Differentially Expressed,” Bioinformatics, 25, 1019-1025.
Henze, N., and Zirkler, B. (1990), “A Class of Invariant Consistent Tests for Multivariate Normality,” Communications in Statistics - Theory and Methods, 19, 3595-3617.
Hummel, M., Meister, R., and Mansmann, U. (2007), “GlobalANCOVA: Exploration and Assessment of Gene Group Effects,” Bioinformatics, 24, 78-85.
Joe, H. (1996), “Families of m-Variate Distributions with Given Margins and m(m-1)/2 Bivariate Dependence Parameters,” in: Rüschendorf, L., Schweizer, B., and Taylor, M. D., (Eds.) “Distributions with Fixed Marginals and Related Topics,” Lecture Notes-Monograph Series, 28, 120-141.
Justel, A. , Pena, D., and Zamar, R. (1997), “A Multivariate Kolmogorov-Smirnov Test of Goodness of Fit,” Statistics & Probability Letters, 35, 251-259.
Kim, S. Y., and Volsky, D. J. (2005), “PAGE: Parametric Analysis of Gene Set Enrichment,” BMC Bioinformatics, 6, 144.
Klebanov, L., Glazko, G., Salzman, P., Yakovlev, A., and Xiao, Y. (2007), “A Multivariate Extension of the Gene Set Enrichment Analysis,” Journal of Bioinformatics and Computational Biology, 5, 1139-1153.
Koizumi, K., Okamoto, N., and Seo, T. (2009), “On Jarque-Bera Tests for Assessing Multivariate Normality,” Journal of Statistics: Advances in Theory and Applications, 1, 207-220.
Korkmaz, S., Goksuluk, D., and Zararsiz, G. (2014), “MVN: An R Package for Assessing Multivariate Normality,” The R Journal, 6, 151-162.
Liang, K., and Nettleton, D. (2010), “A Hidden Markov Model Approach to Testing Multiple Hypotheses on a Tree-Transformed Gene Ontology Graph,” Journal of the American Statistical Association, 105, 1444-1454.
Liu, Q., Dinu, I., Adewale, A. J, Potter, J. D., and Yasui, Y. (2007), “Comparative Evaluation of Gene-Set Analysis Methods,” BMC Bioinformatics, 8, 431.
Lu, Y., Liu, P. Y., Xiao, P., and Deng, H. W. (2005), “Hotelling’s T2 Multivariate Profiling for Detecting Differential Expression in Microarrays,” Bioinformatics, 21, 3105-3113.
Mardia, K. V. (1970), “Measures of Multivariate Skewness and Kurtosis with Applications,” Biometrika, 57, 519-530.
Nettleton, D., Recknor, J., and Reecy, J. M. (2008), “Identification of Differentially Expressed Gene Categories in Microarray Studies Using Nonparametric Multivariate Analysis,” Bioinformatics, 24, 192-201.
Qiu, X., and Yakovlev, A. (2006), “Some Comments on Instability of FDR Estimation,” Journal of Bioinformatics and Computational Biology, 4, 1057-1068.
Royston, P. (1992), “Approximating the Shapiro-Wilk W-test for Non-Normality.” Statistics and Computing, 2, 117-119.
Schäfer, J., and Strimmer, K. (2005), “A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics,” Statistical Applications in Genetics and Molecular Biology, 4, Article 32.
She, C. X. (2016), “Pathway-Based Bayesian Integrative Analysis for Genetic Association Studies,” Master’s thesis, National Taiwan University, Taiwan.
Sklar, M. (1959), “Fonctions de Répartition à n Dimensions et Leurs Marges,” Publications de l’Institut de Statistique de Université de Paris, 8, 229-231.
Song, S., and Black, M. A. (2008), “Microarray-Based Gene Set Analysis: A Comparison of Current Methods,” BMC Bioinformatics, 9, 502.
Székely, G. J., and Rizzo, M. L. (2004), “Testing for Equal Distributions in High Dimension,” InterStat, November (5).
Székely, G. J., and Rizzo, M. L. (2005), “A New Test for Multivariate Normality,” Journal of Multivariate Analysis, 93, 58-80.
Tryputsen, V., Cabrera, J., de Bondt, A., and Amaratunga, D. (2014), “Using Fisher’s Method to Identify Enriched Gene Sets,” Statistics in Biopharmaceutical Research, 6:2, 154-162.
Tsai, C. A., and Chen, J. J. (2009), “Multivariate analysis of variance test for gene set analysis,” Bioinformatics, 25, 897-903.
Wang Z., Maity A., Hsiao C. K., Voora D., Kaddurah-Daouk R., and Tzeng J. Y. (2015), “Module-based association analysis for omics data with network structure,” PLoS One, 10, e0122309.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/69578-
dc.description.abstract隨著科技的進步,有越來越多的統計方法能夠幫助研究者找出致病的生物路徑或是基因,因此如何評估並且有效率地選擇這些方法來進行後續更進一步地研究便成為一個關鍵的步驟。在過去的研究中,研究者大多利用多維度(對數)常態分佈來評估這些統計方法,然而這個方式是否恰當仍具有很大的爭議性,因此,本論文的第一部分將會著重在mRNA基因表現量資料上,我們會從公開網站上蒐集各個疾病的資料集以及相關的生物路徑資訊,並且挑選四種常態性檢定(Mardia’s test, Henze-Zirkler’s test, Royston’s test, One-sample energy test)來探討這些經過正規化處理後的資料集是否符合常態性假設。而從這一部分的結果中可以得知,正規化後的真實基因資料有很高的機率並非分服從多維度常態分佈,也因此,本論文的第二個部分挑選了五種常用的基因集合分析方法(Hotelling’s T2, Global test, GlobalANCOVA, Energy test (N-statistic), GSEA (Category))來探討這些方法在非常態情境底下的表現。在這個過程中,我們利用多維度t分佈以及多維度常態混合分佈來設計一系列的多維度非常態情境,並且藉由這些分析方法在這幾種情境底下的穩健性來評估他們的好壞。而從實驗結果中可以發現,雖然大多數的統計方法在非常態情境底下的表現都不好,但Hotelling’s T2在不同情境的某些特定情況底下卻仍然擁有良好以及穩定的表現,然而這些結果都是無法從傳統多維度常態的模擬方式裡獲得的。因此,總結來說,為了要得到更可靠、更準確的評估資訊,我們建議之後的研究者在模擬階段時可以加入一些非常態情境以及其他細微的設定,並且利用各個情境下整體的穩健性來評估這些方法。最後,本論文採用雷達圖來將上述的多個資訊彙整,以提供研究者一個更清楚的視覺化方式來了解這些方法的表現。zh_TW
dc.description.abstractAs the technology improves, more and more statistical methods for gene-set analysis (GSA) are developed to find pathogenic pathways and genes. Thus, finding a suitable method to make further analysis becomes a critical procedure. In recent years, many studies use the multivariate normal distribution or multivariate lognormal distribution in simulation studies to evaluate the performance of these GSA methods. However, the normality assumption for the gene expression data has been questionable. Therefore, the first part of our study focus on the normality of mRNA gene expression data. We first collect the corresponding pathway information and the gene expression data for each cancer subtype from public website. Then, we choose four normality tests (Mardia’s test, Henze-Zirkler’s test, Royston’s test, One-sample energy test) to analyze these real data, and the results show it is very possible that the normalized gene expression data are not multivariate normally distributed. Thus, in the second part of our study, we consider five GSA methods (Hoteling’s T2, Global test, GlobalANCOVA, Energy Test, GSEA (Category)) in some multivariate non-normal scenarios (including multivariate t distribution and mixture of multivariate normal distributions) to compare the performance and the robustness of these statistical methods. The results of our experiments indicate that although the majority of these GSA methods have a very poor performance under the non-normal scenarios, surprisingly, Hoteling’s T2 still has a consistent and overwhelmingly good performance under different scenarios with some special settings. However, these results cannot be learned from the traditional multivariate normal simulation methods. Thus, to sum up, to get a more reliable and accurate information, we suggest that researchers should add some non-normal scenarios and other settings to the simulation study before using the robustness to evaluate these methods. Finally, these results are demonstrated with radar plots to visualize all the experimental outcomes more clearly.en
dc.description.provenanceMade available in DSpace on 2021-06-17T03:19:55Z (GMT). No. of bitstreams: 1
ntu-107-R05849032-1.pdf: 5672892 bytes, checksum: a3a8e3accfb7787924093b32be867a65 (MD5)
Previous issue date: 2018
en
dc.description.tableofcontents第一章、 研究背景 1
第二章、 真實基因資料常態性假設之分析 5
第一節 資料背景 5
第二節 資料前處理以及資料篩選 7
正規化 7
生物路(途)徑(pathway)的選擇 7
樣本臨床資料的整理 8
第三節 常態性檢定 9
第四節 常態性檢定模擬以及其結果 10
模擬設定 11
模擬結果 12
第五節 真實基因資料分析結果 14
第三章、 基因集合分析方法的模擬比較 17
第一節 模擬設定 17
差異方式A—每個基因的邊際分佈在不同的兩種表現型間有相同的平移差異 18
差異方式B—並非每個基因在不同的兩種表現型間都有相同程度的平移差異 21
第二節 基因集合分析方法 22
方法一:Hotelling’s T2 23
方法二:The Global Test 23
方法三:The GlobalANCOVA 24
方法四:Energy Test (N-statistic) 26
方法五:GSEA (Category) 26
第三節 分析結果 28
第四章、 討論 32
第五章、 參考文獻 35
附錄一、常態性檢定 96
附錄二、其他附錄資料 101
dc.language.isozh-TW
dc.subject真實基因資料常態性zh_TW
dc.subject基因集合分析zh_TW
dc.subject多維度非常態分佈zh_TW
dc.subjectmRNA基因表現量zh_TW
dc.subject穩健性zh_TW
dc.subject雷達圖zh_TW
dc.subject模擬方法zh_TW
dc.subjectsimulation methodsen
dc.subjectgene-set analysisen
dc.subjectmultivariate non-normal distributionen
dc.subjectmRNA gene expressionen
dc.subjectthe normality of real dataen
dc.subjectrobustnessen
dc.subjectradar ploten
dc.title以非常態情境評估基因集合分析方法在真實基因資料下之表現研究zh_TW
dc.titleStatistical Evaluation for Methods of Gene-set Analysis with Multivariate Non-normal Scenariosen
dc.typeThesis
dc.date.schoolyear106-2
dc.description.degree碩士
dc.contributor.oralexamcommittee洪弘(Hung Hung),蔡政安(Chen-An Tsai)
dc.subject.keyword基因集合分析,多維度非常態分佈,mRNA基因表現量,真實基因資料常態性,穩健性,雷達圖,模擬方法,zh_TW
dc.subject.keywordgene-set analysis,multivariate non-normal distribution,mRNA gene expression,the normality of real data,robustness,radar plot,simulation methods,en
dc.relation.page125
dc.identifier.doi10.6342/NTU201801106
dc.rights.note有償授權
dc.date.accepted2018-06-26
dc.contributor.author-college公共衛生學院zh_TW
dc.contributor.author-dept流行病學與預防醫學研究所zh_TW
顯示於系所單位:流行病學與預防醫學研究所

文件中的檔案:
檔案 大小格式 
ntu-107-1.pdf
  未授權公開取用
5.54 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved