利用多平台基因變異尋找疾病相關之基因區域

Qi-You Yu; 余奇祐

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71719

標題:	利用多平台基因變異尋找疾病相關之基因區域 An integrative analysis of DNA copy number and SNP markers to localize causal gene region
作者:	Qi-You Yu 余奇祐
指導教授:	蕭朱杏(Chuhsing Kate Hsiao)
共同指導教授:	盧子彬(Tzu-Pin Lu)
關鍵字:	單一核?酸多型性,拷貝變異數,整合分析,臺灣人體生物資料庫,低密度脂蛋白膽固醇,三酸甘油脂, SNP,CNV,integrative analysis,Taiwan Biobank,LDL-C,TG,
出版年 :	2019
學位:	碩士
摘要:	過去許多全基因體關聯性研究(genome-wide association study, GWAS)專注在單一核苷酸多型性平台(single nucleotide polymorphism, SNP)，隨著基因定序技術不斷地進步，得以從人類身上取得更多不同型態的生物標記，例如單一核苷酸多型性陣列(SNP array)可以定型SNP，也可以估計出拷貝變異數(copy number variation, CNV)；而考慮的分子變異越多，資料的數量也越龐大，為了處理高維度資料和標記間與跨平台之複雜關係，多平台基因資料的整合分析(integrative analysis)便成為重要的議題。過去的整合分析方法多在單一平台進行初步分析，再利用文氏圖(Venn diagram)取交集或是聯集之基因群作為結果，但此方法未考慮到標記於平台內和平台之間的關係。為了克服這些缺點，本研究整合SNP和CNV兩個分子階層，透過基因之關聯性進行檢定，最終，針對挑選出來的基因進行較小區域之移動窗口分析，定位出具有遺傳變異的區段。根據模擬的結果顯示本研究提出之整合分析策略能穩健地偵測出跨平台共同作用導致而成的複雜疾病。除模擬之外，本論文將提出之整合分析策略應用於臺灣人體生物資料庫(Taiwan Biobank)，並且針對低密度脂蛋白膽固醇(low density lipoprotein cholesterol, LDL-C)和三酸甘油脂(triglyceride, TG)，整合SNP和CNV兩個平台的訊息各自找出40個具有關聯性的基因，及其重要的遺傳訊息區段。除了能夠偵測出已經被彙報具有關聯的基因外，也提供未來研究不同的基因遺傳區段。 With the fast progress in sequencing technologies, multiple levels of genomic data can now be obtained from a single set of samples; for instance, SNP array can be efficiently used to genotype SNPs and measure CNVs. The data sizes increase dramatically while considering of various types of genetic variants simultaneously. An integrative analysis therefore is required to deal with the high-dimensionality and complex relationships among markers within and across platforms. Previous integrative analyses usually identify genes purely based on one single platform, and union or intersect the results according to the gene symbols without considering the dependence among markers. To address this issue, I hereby proposed a novel pipeline to integrate genomic copy number and SNP data. In the first, an association test is used to identify significant genes. Subsequently, a moving window analysis is utilized to pinpoint the causal gene regions. The proposed analysis pipeline was implemented in several simulation scenarios, and the results showed good and robust performances, especially when the interaction effects were considered. In addition, this pipeline was applied in two real studies including low density of lipoprotein cholesterol (LDL-C) and triglyceride (TG). The data were obtained from Taiwan Biobank. Several regions in 40 genes were identified and their strong associations with LDL-C and TG were reported, respectively. In conclusion, these results demonstrate that the proposed integrated method is able to identify important causal genes, especially those genes that have not been reported previously by using the naïve method.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/71719
DOI:	10.6342/NTU201804358
全文授權:	有償授權
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	9.91 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。