於罕見變異關聯研究裡使用疾病危險分數來處理族群分層的問題

C.H. Li; 李澤華

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54440

標題:	於罕見變異關聯研究裡使用疾病危險分數來處理族群分層的問題 Using Disease Risk Scores to Account for Population Stratification in Rare Variant Association Studies
作者:	C.H. Li 李澤華
指導教授:	林菀俞
關鍵字:	次世代基因定序,病例對照研究,族群分層,罕見變異,序列核相關檢定,主成份分析,疾病危險分數, next-generation sequencing,case-control study,population stratification,rare variants,sequence kernel association test,principal component analysis,disease risk score,
出版年 :	2015
學位:	碩士
摘要:	背景：在基因研究裡，吾人常收集病例對照資料，比較兩組的對偶基因頻率。然而，病例組與對照組可能有不同的族群來源結構，而使得兩組未必可互比較。研究者常使用與檢測基因無相關的標識基因來建構主成份，使用數十個重要的主成份於羅吉斯迴歸裡以為調整項，藉此來調整掉病例組與對照組間族群來源的差異。方法：次世代定序(next-generation sequencing)的成本仍頗高，許多研究並無法負擔全基因組定序的費用，而只能定序某一小段有興趣的染色體區段。本研究探討在500 kb (kilo base pairs)的染色體區段上，使用疾病危險分數(disease risk scores)於序列核相關檢定(sequence kernel association test)中，以為族群分層之調整。結果：根據蒙地卡羅模擬(Monte Carlo simulations)，使用疾病危險分數於序列核相關檢定中，比起傳統直接使用主成份分數(principal component scores)於序列核相關檢定中，疾病危險分數更能調整族群分層的偏差。建議：若研究者有500 kb 以上的染色體區段定序資料，建議以較遠離檢測基因的常見單核苷酸多型性(常見指次要對偶基因頻率大於5%)來建構疾病危險分數，再以此疾病危險分數放入序列核相關檢定中調整病例組與對照組的族群來源差異。 Background: In genetic studies, we often collect unrelated cases and controls and compare allele frequencies between the two groups. However, cases and controls may come from different ancestral populations, and the allele frequencies of the two groups cannot be compared directly. Researchers usually use markers unlinked to the gene of interest to construct principal components. By using tens of important principal components as covariates in the logistic regression, we can adjust for the ancestral difference between the cases and the controls. Method: The cost of next-generation sequencing is still high. Many studies cannot afford to the cost of whole-genome sequencing, and may only afford to sequence a chromosomal region of interest. In this study, we discuss the situation that only a 500 kb (kilo base pairs) region can be sequenced. We use disease risk scores to account for population stratification in the sequence kernel association test. Result: According to the Monte Carlo simulations, using disease risk scores in the sequence kernel association test can adjust for population stratification more efficiently, compared with the conventional approach of using principal component scores. Suggestion: If researchers have a sequenced region longer than 500 kb, we suggest using common single-nucleotide polymorphisms (with minor allele frequency > 5%) far from the gene of interest to construct disease risk scores, and adjusting the disease risk scores in the sequence kernel association test to account for the population stratification.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54440
全文授權:	有償授權
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	523.77 kB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。