請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49336
標題: | 評估基因表現資料的正規化方法:聚焦於RUV系列方法 Evaluations of gene expression normalization methods: a focus on Remove Unwanted Variation (RUV) series methods. |
作者: | Yu-Ting Chen 陳鈺婷 |
指導教授: | 洪弘(Hung Hung),郭柏秀(Po-Hsiu Kuo) |
關鍵字: | 基因表現,正規化,核醣核酸定序分析,差異表現基因檢測, gene expression,normalization,RNA-Seq,differential expression test, |
出版年 : | 2016 |
學位: | 碩士 |
摘要: | 在基因表現的資料中時常包含著許多變異,例如實驗中的批次效果(batch effects)或者在基因上不同的定序深度(sequencing depth),這些變異都會影響到分析中所辨別出的差異表現(differential expression, DE)基因。因此在進行相關的分析前,先做資料前處理-正規化(normalization)是必要的步驟,以用來校正實驗技術上的變異(technical variations),及確保對於差異表現基因的推論。
近年來,許多的正規化方法應用於分析核醣核酸定序(RNA-Seq)的資料,然而這些已存在的方法在校正未知的潛在變異的效用時,尚缺乏系統性的評估。在我們的研究中,我們實作了一個比較性的研究來評估兩種型態的正規化方法,一種是global-scaling的方法,另一種是remove unwanted variation (RUV)。在過去文獻中提及global-scaling的方法能夠校正在基因上不同的定序深度的變異,並且此種方法已廣泛的使用在各種應用RNA-Seq資料來檢測差異基因的研究,此類型方法包括Median、Upper Quartile (UQ)以及Trimmed mean of M-values (TMM)。此外,RUV是新開發的方法,使用對照基因(control genes)或樣本來調整實驗技術上造成變異的影響。我們比較了global-scaling的方法跟RUV系列的方法,包括RUV2、RUV4、RUVr以及結合兩種型態的正規化方法RUV2+UQ、RUV4+UQ和RUVr+UQ。我們考慮了七種不同的變異設定,包括批次效果、潛在的變異和在基因上不同的定序深度。在每一種變異設定中,另外也調整了樣本大小、變異因子的數量、控制基因的多寡等參數來評估這些參數對於正確找出差異表現基因的影響。在我們的模擬中,發現不論基因上是否有不同的定序深度,RUV系列方法(除了RUVr)加上UQ最能有效的校正變異。 本文中,我們討論不同資料情境下的結果,並使用公開取得的基因表現資料,根據不同特性的RNA-Seq資料型態,推薦適合的正規化方法。依據我們的研究結果,提供研究者在進行相關分析前的正規化時,能選取適當的正規化方法來增進研究結果的有效性。 Gene expression data are often embedded with many unwanted variations, such as batch effects in experiments or varying sequencing depth among subjects, which hinder the identification of differentially expressed (DE) genes for the trait of interest. The pre-process of normalization before performing association analysis is proven to be essential in correcting technical variations and increasing the power of identifying DE genes. Recently, several normalization methods have been applied to analyze RNA sequencing (RNA-Seq) data. However, the utility of existing methods in correcting potential bias from unwanted variations is lacking a systematic evaluation. In this work, we conduct a comparison study to evaluate the performances of two types of normalization methods: global-scaling and removing unwanted variation (RUV). It is claimed that the global-scaling methods are able to correct sequencing depth, and are widely used in the literature for testing DE genes in RNA-seq studies, including Median, Upper Quartile (UQ), Trimmed Mean of M-values (TMM) methods. On the other hand, RUV methods are newly developed methods to adjust for nuisance technical effects, by utilizing control genes or samples. We compare a series of RUV methods with the global methods (Median, UQ, TMM), including RUV2, RUV4, RUVr, as well as the combinations of RUV2+UQ, RUV4+UQ and RUVr+UQ. We considered 7 simulation settings of unwanted variations, including various combinations of batch effects, latent variations and sequencing depth, each under different settings of sample size, numbers of factors of unwanted variation and control genes. Our simulations indicate that the RUV series methods (except RUVr) plus UQ are the most effective in correcting unwanted variations, even in the situation of no sequencing depth. We discussed results in different scenarios and provided recommendations for the use of different normalization methods according to the characteristics of RNA-seq data. Our study results could inform researchers for the selection of a suitable normalization method when the data are required for normalization before association testing. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/49336 |
DOI: | 10.6342/NTU201602653 |
全文授權: | 有償授權 |
顯示於系所單位: | 統計碩士學位學程 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-105-1.pdf 目前未授權公開取用 | 2.8 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。