請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/1226
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 莊曜宇(Eric Y. Chuang) | |
dc.contributor.author | Li-Mei Chiang | en |
dc.contributor.author | 姜莉玫 | zh_TW |
dc.date.accessioned | 2021-05-12T09:34:32Z | - |
dc.date.available | 2020-10-18 | |
dc.date.available | 2021-05-12T09:34:32Z | - |
dc.date.copyright | 2018-10-18 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-10-11 | |
dc.identifier.citation | 1. Balmain, A., J. Gray, and B. Ponder, The genetics and genomics of cancer. Nat Genet, 2003. 33 Suppl: p. 238-44.
2. Richards, A.A. and V. Garg, Genetics of congenital heart disease. Curr Cardiol Rev, 2010. 6(2): p. 91-7. 3. Pleasants, R.A., et al., Chronic obstructive pulmonary disease and asthma-patient characteristics and health impairment. COPD, 2014. 11(3): p. 256-66. 4. WHO, The top 10 causes of death. 2017. 5. Joehanes, R., et al., Gene expression signatures of coronary heart disease. Arterioscler Thromb Vasc Biol, 2013. 33(6): p. 1418-26. 6. Wong, N.D., Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol, 2014. 11(5): p. 276-89. 7. Heidenreich, P.A., et al., Forecasting the future of cardiovascular disease in the United States: a policy statement from the American Heart Association. Circulation, 2011. 123(8): p. 933-44. 8. Metzker, M.L., Sequencing technologies - the next generation. Nat Rev Genet, 2010. 11(1): p. 31-46. 9. van Dijk, E.L., et al., Ten years of next-generation sequencing technology. Trends Genet, 2014. 30(9): p. 418-26. 10. Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009. 10(1): p. 57-63. 11. Nielsen, R., et al., Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet, 2011. 12(6): p. 443-51. 12. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001. 409(6822): p. 860-921. 13. Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: www.genome.gov/sequencingcostsdata. Accessed [2017/05/22]. 14. Cirulli, E.T. and D.B. Goldstein, Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet, 2010. 11(6): p. 415-25. 15. Ioannidis, N.M., et al., REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet, 2016. 99(4): p. 877-885. 16. Davydov, E.V., et al., Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol, 2010. 6(12): p. e1001025. 17. Pollard, K.S., et al., Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res, 2010. 20(1): p. 110-21. 18. Ng, P.C. and S. Henikoff, SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res, 2003. 31(13): p. 3812-4. 19. Adzhubei, I.A., et al., A method and server for predicting damaging missense mutations. Nat Methods, 2010. 7(4): p. 248-9. 20. Kircher, M., et al., A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet, 2014. 46(3): p. 310-5. 21. Morin, R., et al., Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques, 2008. 45(1): p. 81-94. 22. Griffith, M., et al., Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud. PLoS Comput Biol, 2015. 11(8): p. e1004393. 23. Uhlen, M., et al., Proteomics. Tissue-based map of the human proteome. Science, 2015. 347(6220): p. 1260419. 24. Huntley, M.A., et al., Complex regulation of ADAR-mediated RNA-editing across tissues. BMC Genomics, 2016. 17: p. 61. 25. Petryszak, R., et al., Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res, 2016. 44(D1): p. D746-52. 26. Singh, A.R., et al., Chamber Specific Gene Expression Landscape of the Zebrafish Heart. PLoS One, 2016. 11(1): p. e0147823. 27. Kim, D., et al., TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol, 2013. 14(4): p. R36. 28. Trapnell, C., et al., Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol, 2010. 28(5): p. 511-5. 29. NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/) [6500 samples, February, 2013]. 30. Genomes Project, C., et al., A global reference for human genetic variation. Nature, 2015. 526(7571): p. 68-74. 31. Nagasaki, M., et al., Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun, 2015. 6: p. 8018. 32. Yamaguchi-Kabata, Y., et al., iJGVD: an integrative Japanese genome variation database based on whole-genome sequencing. Hum Genome Var, 2015. 2: p. 15050. 33. Landrum, M.J., et al., ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res, 2014. 42(Database issue): p. D980-5. 34. Landrum, M.J., et al., ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res, 2016. 44(D1): p. D862-8. 35. Wang, K., M. Li, and H. Hakonarson, ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res, 2010. 38(16): p. e164. 36. Li, L., C.J. Stoeckert, Jr., and D.S. Roos, OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res, 2003. 13(9): p. 2178-89. 37. Antzelevitch, C., Brugada syndrome. Pacing Clin Electrophysiol, 2006. 29(10): p. 1130-59. 38. Bezzina, C.R., et al., Common variants at SCN5A-SCN10A and HEY2 are associated with Brugada syndrome, a rare disease with high risk of sudden cardiac death. Nat Genet, 2013. 45(9): p. 1044-9. 39. Bebarova, M., et al., Subepicardial phase 0 block and discontinuous transmural conduction underlie right precordial ST-segment elevation by a SCN5A loss-of-function mutation. Am J Physiol Heart Circ Physiol, 2008. 295(1): p. H48-58. 40. Priori, S.G., et al., Natural history of Brugada syndrome: insights for risk stratification and management. Circulation, 2002. 105(11): p. 1342-7. 41. Selga, E., et al., Comprehensive Genetic Characterization of a Spanish Brugada Syndrome Cohort. PLoS One, 2015. 10(7): p. e0132888. 42. Purcell, S.M., et al., A polygenic burden of rare disruptive mutations in schizophrenia. Nature, 2014. 506(7487): p. 185-90. 43. Kapplinger, J.D., et al., An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for Brugada syndrome genetic testing. Heart Rhythm, 2010. 7(1): p. 33-46. 44. Juang, J.M., et al., Utilizing multiple in silico analyses to identify putative causal SCN5A variants in Brugada syndrome. Sci Rep, 2014. 4: p. 3850. 45. The UniProt, C., UniProt: the universal protein knowledgebase. Nucleic Acids Res, 2017. 45(D1): p. D158-D169. 46. Howe, K., et al., The zebrafish reference genome sequence and its relationship to the human genome. Nature, 2013. 496(7446): p. 498-503. 47. Schurch, N.J., et al., How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA, 2016. 22(6): p. 839-51. 48. Stelzer, G., et al., The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics, 2016. 54: p. 1 30 1-1 30 33. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/handle/123456789/1226 | - |
dc.description.abstract | 心臟疾病近幾年來皆為世界十大死因前幾名,且花費也有逐年增高的趨勢,為了找到解決的辦法,越來越多研究者參與心臟疾病的研究,然而從活體取得心臟組織不容易,其他組織部位與心臟的基因表現圖譜可能不一致,因而造成找到可能會發生致病變體,但變體所在的基因卻不會在心臟表現的情形。為了解決基因表現圖譜在不同組織之間表達不同的問題,並且幫助研究者分析變體跟族群與疾病之間的關係,本研究的目的是建立一個心臟全面性的資料庫,因應前面提到的需求,提供兩種服務,Expression profiles和Variants Search,前者用於查詢基因相關訊息,並且用於確認目標基因是否會在心臟組織表現;而後者用於獲取變體多方面訊息。
在這項研究中,我們提出一個網頁式介面操作的變體和心臟組織基因表現圖譜的資料庫,統整了人類、老鼠、斑馬魚的心臟基因表現圖譜資料,以及1000 Genomes Project、National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) 、Integrative Japanese Genome Variation Database (IJGVD)和臺灣人體生物資料庫等發表的各大族群遺傳變體的參考資料,此外我們也收集了REVEL、GERP++、CADD等分數,用來預測可能引起疾病的變體,並且建立Index 系統,有別於以往變體分析工具,index系統加入了組織基因表現要素。此外,為了幫助研究者將變體做臨床上的連結,ClinVar發表的變體表型等資訊也整合進VariED。在結果上,我們運用幾個例子展現了VariED的應用,我們成功從多個基因中找到不會在心臟表現的基因,也以三個布魯格達氏症候群相關變體,展現VariED找到致病的變體的能力;index 系統提供的數值也能用來成功找到致病的變體,並且與CADD分數有中度相關。總言之,VariED藉由整合各大資料庫與工具的數據來提供全面性的服務,幫助研究人員減少搜索資料的時間成本,並促進心臟疾病的研究。 | zh_TW |
dc.description.abstract | Heart disease is the top ten causes of death in the world and the cost of heart disease is also increasing year by year. In order to improve the understanding of heart diseases, more and more research efforts have been devoted to the heart disease researches. However, it is difficult to gather heart tissue directly from human patients, and the gene expression profiles obtained from other tissues may be different from that of the heart. Thus, it is possible to obtain a pathogenic variant which is in a gene but does not express in the heart tissue. To overcome this problem and support researchers to analyze the relationship among variants, populations, and heart diseases, we developed a comprehensive database for heart diseases. As mention above, VariED provides two major functions, Expression Profiles and Variants Search. The former is used to query gene information and confirm whether the target gene expresses in heart tissue; the latter is used to obtain more detailed information of the interested variants.
In this study, we developed a web-based database integrating variants and tissue-based expression profiles in heart from three species, including human, mouse and zebrafish. In addition, the population allele frequency from the 1000 Genomes Project, National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP), Integrative Japanese Genome Variation Database (IJGVD), and Taiwan Biobank were included. We also collected REVEL, GERP++, and CADD scores that can help to elucidate the functional roles of interested variants for diseases. Subsequently, an index scoring system was implemented in VariED. The uniqueness for the scoring system is that we consider tissue-based gene expression level as an important factor in the prediction. Lastly, to help researchers identify causative variants in diseases, a public database named as ClinVar which collected the associations between DNA variants and diseases was integrated. In this thesis, we used several examples to show the potential applications of VariED. For examples, we successfully identified a gene which does not express in heart tissue. Three Brugada syndrome-related variants were analyzed to demonstrate the usage of VariED to find pathogenic variants. We believe VariED not only assists researchers to save time for querying data, but also helps users to identify important DNA variants related to diseases. | en |
dc.description.provenance | Made available in DSpace on 2021-05-12T09:34:32Z (GMT). No. of bitstreams: 1 ntu-107-R04945021-1.pdf: 2716094 bytes, checksum: d0f667640a424f27df4c981dca8cddb2 (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | 口試委員會審定書 i
致謝 ii 摘要 iii Abstract v Contents vii List of Figures ix List of Tables x 1 Introduction 1 1.1. Motivation 1 1.2. Specific aims 5 1.3. Heart disease research 7 1.4. Next-generation sequencing 8 1.5. Variant 10 1.6. Gene expression 13 2 Materials and Methods 15 2.1. Overview of VariED 15 2.2. Dataset collection and processing 19 2.3. Identify aliases and orthologs 23 2.4. Index system 24 2.5. Methods 26 2.5.1. Function 1 : Expression Profiles 26 2.5.2. Function 2 : Variants Search 28 3 Results 30 3.1. Example 1: Finding out pathogenic variants, which cause Brugada Syndrome 31 3.2. Example 2: Using queried variants search for gene annotation information and find pathogenic variant 35 3.3. Example 3: Using heart tissue gene expression profiles information to filter the candidate gene in heart diseases. 37 3.4. Performance of index system 43 4 Discussion 45 4.1. Accuracy 45 4.2. Tissue-based gene expression profiles 46 4.3. Data collection of gene expression profiles in different species 47 4.4. Mapping RNA-Seq reads to the reference genome 48 4.5. Characteristics 49 4.6. Processing speed 51 5 Conclusion 53 References 54 | |
dc.language.iso | en | |
dc.title | VariED: 基於心臟疾病的變異與基因表達的整合型資料庫 | zh_TW |
dc.title | VariED: an integrated database of variants and gene expression profiles for heart diseases | en |
dc.type | Thesis | |
dc.date.schoolyear | 107-1 | |
dc.description.degree | 碩士 | |
dc.contributor.coadvisor | 盧子彬 | |
dc.contributor.oralexamcommittee | 蔡孟勳,賴亮全,蕭自宏 | |
dc.subject.keyword | 心臟疾病,基因變體,人群等位基因頻率,基因表現圖譜,資料庫,線上系統, | zh_TW |
dc.subject.keyword | heart disease,genetic variant,population allele frequency,gene expression profiles,database,web-based system, | en |
dc.relation.page | 56 | |
dc.identifier.doi | 10.6342/NTU201804200 | |
dc.rights.note | 同意授權(全球公開) | |
dc.date.accepted | 2018-10-11 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 生醫電子與資訊學研究所 | zh_TW |
顯示於系所單位: | 生醫電子與資訊學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf | 2.65 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。