利用穩健之排序方法偵測基因群共表現差異

Yueh Wang; 王悅

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44931

標題:	利用穩健之排序方法偵測基因群共表現差異 A Robust Rank-Based Method for Detection of Differentially Co-Expressed Gene Sets
作者:	Yueh Wang 王悅
指導教授:	洪弘(Hung Hung)
共同指導教授:	蕭朱杏(Chuhsing Kate Hsiao)
關鍵字:	基因群,基因群分析,共表現,相關,穩健, gene set,gene set analysis,co-expressio,correlation,robustness,
出版年 :	2011
學位:	碩士
摘要:	在遺傳流行病學領域當中, 探索與複雜疾病相關的易感受性遺傳物質仍是近年不斷發展的主要研究方向之一。最近幾年陸續發展許多分析方法, 其中以生物學上定義出的基因群為單位的分析方法, 統稱Gene Set Analysis (GSA), 更是被大眾所廣泛使用, 此類方法解決了生物資訊研究上一個重要的議題-簡化欲分析之基因數量, 此外用生物上的知識來解釋分析結果也比較容易。雖然GSA 已被廣為使用, 但大多數GSA 的方法偵測的是基因群平均值的變化, 這只是基因表現變化量的一部分資訊, 用此方法探討基因群與複雜疾病之關聯嚴格說來並不全面。一個基因群在不同疾病狀態中, 可能在”共表現量” 上有所差異, 但此種差異卻不一定會被偵測平均值變化的方法準確找出。本論文提出了R 統計量,可偵測基因群中的共表現變化, 並且具有穩健的性質; 找出可能與複雜疾病有關的基因群後, 進一步可找出對共表現差異貢獻最多的重點基因。在模擬部分, 探討了R 統計量的穩健性質, 以及重點基因的辨認。本論文使用p53 資料進行資料分析,R 統計量偵測出許多跟p53 在生物上有所相關的基因群, 其中只有一個基因群同時被前人所發表的偵測平均值變化的方法所偵測出, 這顯示了偵測共表現量差異的重要性。在討論方面,R 統計量除了具有穩健性之外, 在偵測具有較多基因數量的基因群會有較高的檢定力。此外, 本論文提出的方法有兩個地方需注意, 此方法不適合偵測不同疾病狀態下共表現排序類似的情況;又此法的穩健性較佳, 所以在一些情況下效率性就會來的差一些。最後, 本論文所探討的R 統計量是用於偵測基因群中, 兩兩基因的共表現差異, 實際上仍有更高階的共表現差異尚未被考慮, 此方法可以直接推展至偵測更高階的共表現差異, 這是未來需努力的方向。 It is important to find out the susceptible genetic material associated with different conditions in epidemiology. In recent years, lots of methods are developed for identifying genes or SNPs which contribute to the occurrence of complex disease. One kind of procedure, called Gene Set Analysis (GSA), considers a biologically defined gene set as an analyzed unit. GSA and related methods solved a crucial problem in bioinformatics by reducing a large number of variables. But only partial information - mean change, is utilized. The change, but this information may not be detected by the previous developed GSA methods. In this thesis, an R statistic is developed for extracting the information of gene set co-expression change, and the important genes of a gene set - hub genes, are further identified. In simulation study, we examine the robustness property of the R statistic and the hub genes identification. The p53 data is analyzed and several p53 related pathways are identified. Among those identified pathways, many are not detected by the method focusing on the detection of mean change. We have chosen several pathways for further identifying the hub genes, and the making biologically explanation on each pathway. In discussion, the proposed method has the robust property and is powerful in detection of complicated gene sets. However, two limits of the R statistic can not be ignored, the R statistic is unable to detect a condition that the case and control are similar in the correlation rank order, and the efficiency of this method may not good in some condition. Finally, we should note that the higer order co-expression change is still ignored, the possible algorithm for the consideration of higher order interaction is proposed, in the future work it will be important to extend the R statistic to be able to detect higher order co-expression.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44931
全文授權:	有償授權
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	854.52 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。