請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64850
標題: | 整合基因型資訊與生物路徑和基因表現資料提升阿茲海默症預測準確度 Improving AD Prediction using Genotyping Data Combined with Biological Pathways and Gene Expression Profiles |
作者: | Hur Wang 王賀 |
指導教授: | 陳倩瑜 |
關鍵字: | 阿茲海默症,全基因組關聯分析,上位作用,生物路徑資料, Alzheimer’s disease,Epistasis,Genome-wide association studies,Biological pathway data,Gene expression profiles, |
出版年 : | 2020 |
學位: | 碩士 |
摘要: | 隨著基因定序技術和計算資源的大幅進步,人們越來越關注疾病的基因型和表現型的關聯性;在過去十年間,許多研究利用全基因組關聯性分析(Genome-wide Association Studies, GWAS)探討遺傳變異對疾病的影響,並且找到數千個單核苷酸變異(Single Nucleotide Polymorphism, SNP)和疾病或性狀的關聯,然而遺傳學家們認為此方法嚴重低估複雜疾病的潛在生物機制,忽略了單一變異之間的交互作用,又稱為上位作用(Epistasis)。進行全基因組上位作用分析目前仍是很大的挑戰,因為分析的過程必須把所有單一變異兩兩組合做展開,需要用到大量計算資源。近年有許多機器學習應用在上位作用的方法被提出,本實驗室在先前研究中提出了以基因為單位先對遺傳變異做分組找出特徵子集,再建模尋找上位作用的偵測方法 - GenEpi(Gene-based Epistasis Discovery),可以有效降低運算複雜度。然而,這個方法在基因選擇過程中可能會忽略潛在的跨基因上位作用。因此,本論文提出另一種整合生物路徑資料以提高跨基因上位作用預測能力的方法,來解決跨基因的上位作用可能被GenEpi忽略的問題。此外,由於整合生物路徑資料以提高跨基因上位作用預測會產生過多的組合,本論文進一步整合基因表現資料,利用差異表現基因來減少計算時間。首先,本論文用R語言中的Limma套件從阿茲海默症樣本中,將實驗組與對照組相比,篩選出差異表現基因,接著從 Pathway Commons 資料庫中擷取至少包含一個差異表現基因的基因配對,並且通過組合編碼和L1正規化線性回歸特徵選擇對每個基因配對建模;最後,將每個基因配對篩選的特徵整合在一起建立用於預測表現型的最終模型。本研究所使用的全基因組關聯性資料是來自阿茲海默症神經影像計畫,經過差異表現分析後得到 192 個差異表現基因,將這些差一表現基因回貼至 Pathway Commons Database 得到 18,234 個基因配對,經過特徵篩選得到 42,427 個特徵,分佈在 11,139 個基因配對,將所有特徵合併後再次建模,最終預測模型選出 32 個變異點位特徵,包含著名的生物標記 APOE。在最終預測模型中,十折交叉驗證(10-fold cross-validation)得到的預測準確率和 F1 分數分別為 0.843 和 0.780,兩者皆高於原始 GenEpi 的預測結果。本研究利用生物路徑資料和基因表現數據在疾病表現型預測達到更好的效果,透過此方法得到的 SNP 可為未來的功能研究提供重要線索並協助了解複雜疾病的致病機制。 Epistasis is the interaction between genetic variants associated with phenotypes, a key to understanding complex diseases like Alzheimer’s disease (AD). However, discovering epistasis is a time-consuming procedure, which aims at testing all of the interactions between millions of variants. A previous study (GenEpi) of my lab used gene-based epistasis analysis by grouping genetic variants in a gene to reduce the computational complexity. In this way, potential cross-gene epistasis might be neglected during gene selection. In this regard, this thesis presents a new method that integrates biological pathways to improve the capability of predicting AD using cross-gene epistasis. Moreover, the differential genes can be applied to even reduce the computing time if expression profiles exist. First, differentially‑expressed genes (DEGs) were identified from AD samples, compared with control subjects, using the Limma package in R. Next, gene pairs in the Pathway Commons Database that contains at least one DEG are obtained. Then, we modeled each gene pairs by two-element combinatorial encoding and L1-regularized regression with stability selection proposed by GenEpi. After that, the selected features for each gene pair are pooled together to construct the final model for predicting the phenotype. The genome-wide association (GWA) data and expression profiles used in this thesis are from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. We found 192 DEGs after differential expression analysis and obtained 18,234 gene pairs after mapping DEGs to the Pathway Commons Database. After feature selection, we obtained 42,427 features in 11,139 gene pairs. We collected these features and then modeled them again. The final prediction model contains 32 significant SNP features, including the well-known AD biomarker APOE. The 10-fold cross-validation (CV) accuracy and F1 score of the final model are 0.843 and 0.780, respectively. The result is better than that delivered by the original version of GenEpi. The proposed method can predict the phenotype better by leveraging pathway data and gene expression data. It is concluded that the discovered SNPs will provide important leads to design future functional studies to understand the mechanisms of the complex disease, AD. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64850 |
DOI: | 10.6342/NTU202000208 |
全文授權: | 有償授權 |
顯示於系所單位: | 生物機電工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-109-1.pdf 目前未授權公開取用 | 4.13 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。