請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94397完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳倩瑜 | zh_TW |
| dc.contributor.advisor | Chien-Yu Chen | en |
| dc.contributor.author | 邱顯鈞 | zh_TW |
| dc.contributor.author | Hsien-Chun Chiu | en |
| dc.date.accessioned | 2024-08-15T17:16:16Z | - |
| dc.date.available | 2024-08-16 | - |
| dc.date.copyright | 2024-08-15 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-08 | - |
| dc.identifier.citation | Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., Donnelly, P., Eichler, E. E., Flicek, P., Gabriel, S. B., Gibbs, R. A., Green, E. D., Hurles, M. E., Knoppers, B. M., Korbel, J. O., Lander, E. S., Lee, C., . . . National Eye Institute, N. I. H. (2015). A global reference for human genetic variation. Nature, 526(7571), 68-74.
Ballouz, S., Dobin, A., & Gillis, J. A. (2019). Is it time to change the reference genome? Genome Biology, 20(1), 159. Chen, N.-C., Solomon, B., Mun, T., Iyer, S., & Langmead, B. (2021). Reference flow: reducing reference bias using multiple population genomes. Genome Biology, 22(1), 8. Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., Whitwham, A., Keane, T., McCarthy, S. A., Davies, R. M., & Li, H. (2021). Twelve years of SAMtools and BCFtools. GigaScience, 10(2). Danecek, P., & McCarthy, S. A. (2017). BCFtools/csq: haplotype-aware variant consequences. Bioinformatics, 33(13), 2037-2039. DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C., Philippakis, A. A., del Angel, G., Rivas, M. A., Hanna, M., McKenna, A., Fennell, T. J., Kernytsky, A. M., Sivachenko, A. Y., Cibulskis, K., Gabriel, S. B., Altshuler, D., & Daly, M. J. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics, 43(5), 491-498. Ebler, J., Ebert, P., Clarke, W. E., Rausch, T., Audano, P. A., Houwaart, T., Mao, Y., Korbel, J. O., Eichler, E. E., Zody, M. C., Dilthey, A. T., & Marschall, T. (2022). Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nature Genetics, 54(4), 518-525. Garrison, E., Sirén, J., Novak, A. M., Hickey, G., Eizenga, J. M., Dawson, E. T., Jones, W., Garg, S., Markello, C., Lin, M. F., Paten, B., & Durbin, R. (2018). Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nature Biotechnology, 36(9), 875-879. Guarracino, A., Heumos, S., Nahnsen, S., Prins, P., & Garrison, E. (2022). ODGI: understanding pangenome graphs. Bioinformatics, 38(13), 3319-3326. Heydari, M., Miclotte, G., Van de Peer, Y., & Fostier, J. (2018). BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs. BMC Bioinformatics, 19(1), 311. Kim, D., Paggi, J. M., Park, C., Bennett, C., & Salzberg, S. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology, 37(8), 907-915. Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature Methods, 9(4), 357-359. Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997. Retrieved March 01, 2013 Liao, W.-W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., Lu, S., Lucas, J. K., Monlong, J., Abel, H. J., Buonaiuto, S., Chang, X. H., Cheng, H., Chu, J., Colonna, V., Eizenga, J. M., Feng, X., Fischer, C., Fulton, R. S., . . . Paten, B. (2023). A draft human pangenome reference. Nature, 617(7960), 312-324. Lin, H.-Y., Chuang, H.-W., Hung, T.-K., Wang, T.-J., Lin, C.-J., Hsu, J. S., Hsu, C.-L., Yang, Y.-C., Chen, P.-L., & Chen, C.-Y. (2023). Graph-KIR: Graph-based KIR Copy Number Estimation and Allele Calling Using Short-read Sequencing Data. bioRxiv, 2023.2011.2029.568665. Liu, B., Guo, H., Brudno, M., & Wang, Y. (2016). deBGA: read alignment with de Bruijn graph-based seed and extension. Bioinformatics, 32(21), 3224-3232. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., & Daly, M. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20(9), 1297-1303. McLaren, W., Gil, L., Hunt, S. E., Riat, H. S., Ritchie, G. R. S., Thormann, A., Flicek, P., & Cunningham, F. (2016). The Ensembl Variant Effect Predictor. Genome Biology, 17(1), 122. Miga, K. H., & Wang, T. (2021). The Need for a Human Pangenome Reference Sequence. Annu Rev Genomics Hum Genet, 22, 81-102. Paten, B., Novak, A. M., Eizenga, J. M., & Garrison, E. (2017). Genome graphs and the evolution of genome inference. Genome Res, 27(5), 665-676. Pritt, J., Chen, N.-C., & Langmead, B. (2018). FORGe: prioritizing variants for graph genomes. Genome Biology, 19(1), 220. Rakocevic, G., Semenyuk, V., Lee, W.-P., Spencer, J., Browning, J., Johnson, I. J., Arsenijevic, V., Nadj, J., Ghose, K., Suciu, M. C., Ji, S.-G., Demir, G., Li, L., Toptaş, B. Ç., Dolgoborodov, A., Pollex, B., Spulber, I., Glotova, I., Kómár, P., . . . Kural, D. (2019). Fast and accurate genomic analyses using genome graphs. Nature Genetics, 51(2), 354-362. Rautiainen, M., & Marschall, T. (2020). GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biology, 21(1), 253. Robinson, J., Halliwell, J. A., McWilliam, H., Lopez, R., & Marsh, S. G. (2013). IPD--the Immuno Polymorphism Database. Nucleic Acids Res, 41(Database issue), D1234-1240. Sirén, J., Monlong, J., Chang, X., Novak, A. M., Eizenga, J. M., Markello, C., Sibbesen, J. A., Hickey, G., Chang, P.-C., Carroll, A., Gupta, N., Gabriel, S., Blackwell, T. W., Ratan, A., Taylor, K. D., Rich, S. S., Rotter, J. I., Haussler, D., Garrison, E., & Paten, B. (2021). Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science, 374(6574), abg8871. Wu, D.-C., Hsu, J. S.-J., Chen, C.-Y., Shih, S.-H., Liu, J.-F., Tsai, Y.-C., Lee, T.-L., Chen, W.-A., Tseng, Y.-H., Lo, Y.-C., Lin, H.-Y., Chen, Y.-C., Chen, J.-Y., Chang, D. T.-H., Guo, W.-H., Mao, H.-H., & Chen, P.-L. (2021). Complete genomic profiles of 1,496 Taiwanese reveal curated medical insights. medRxiv, 2021.2012.2023.21268291. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94397 | - |
| dc.description.abstract | 由於目前主流的個人定序技術,常為大量短序列,而短序列通常已經遺失了位置資訊,因此需要藉由對比參考基因組來將所有短序列回貼至參考基因組上。現有常見的參考基因組為 hg19 或 hg38,但該基因組取自少數個體,又缺少東亞人參與其中,因此對於臺灣人而言做為參考基因組可能會因個體差異而有所偏差。本研究使用了臺灣人體生物資料庫 (Taiwan Biobank) 的資料,所採用的基因變異集合為先前本實驗團隊所得,其中包含許多臺灣人特有的變異點位,本研究用以建立臺灣人泛參考基因組 (TW-graph),以提升短序列回貼之品質及未來應用之準確性。過去的研究中多採用圖基因組方法來建立泛參考基因組,常用工具中又以 HISAT2 為最熱門者,因此本研究使用 bcftools 篩選變異點位以及 HISAT2 這個基於圖基因組概念的演算法,實現本研究欲建立臺灣人泛參考基因組的目標,以及建立做為對照組的 hg38 圖參考基因組 (意即不加入任何變異點位) 及全球泛參考基因組 (即為加入全球千人基因組點位計劃的變異點位資料) 共兩個版本的對照參考基因組。建立泛參考基因組後,再將臺灣人的短序列資料回貼並做後續分析,使用回貼率來做初步結果判讀及比較。本研究使用七個 Taiwan Biobank 的以及四個非 Taiwan Biobank 的短序列資料,觀察臺灣泛參考基因組對比上述的其他兩者參考基因組而言,回貼率有顯著提升,十一個樣本的總體回貼率對比 hg38-graph 有提升約1%的趨勢,對比 1000G-graph 有提升約0.9%的趨勢。本研究進而將所建立的臺灣人泛參考基因組,應用於KIR基因家族的等位基因分型,在採用的HPRC之44 個樣本中,唯一回貼短序列數 (Unique mapped reads) 的數據顯示,BWA-linear 表現會優於 1000G-graph 再優於 hg38-graph;而和先前一樣的十一個臺灣人樣本中,TW-graph 在 KIR 區域中的唯一回貼短序列數明顯比其他三個對照組多,雖然此刻缺乏正確答案作為評量之參考,仍期待未來有更多的實驗數據來探討回貼序列數的增加,是否助於提升 KIR 等位基因分型之準確性。 | zh_TW |
| dc.description.abstract | Due to the current mainstream personal sequencing technology often producing large numbers of short sequences that have lost positional information, it is necessary to align these short sequences to a reference genome for comparison. The commonly used reference genomes are hg19 or hg38, but these genomes are derived from a small number of individuals and lack East Asian participation. Therefore, for Taiwanese people, using these as reference genomes may lead to biases due to population differences. This study uses data from the Taiwan Biobank, which contains variants unique to Taiwanese people. This data is used to construct a Taiwanese reference pangenome to improve the quality of short sequence alignment and future applications. Past research has often used graph-based genome methods to build reference pangenomes, with HISAT2 being the most popular. Therefore, this study uses bcftools to filter variants and HISAT2, an algorithm that helps to implement the concept of graph genomes, to construct a Taiwanese reference pangenome. For comparison, this study also created two versions of reference genomes: an hg38-graph reference genome (without adding any variant positions, hg38-graph) and a global reference pangenome (incorporating variant data from the global 1000 Genomes Project, 1000G-graph). After establishing the reference pangenomes, this study aligned Taiwanese short read data and performed subsequent analyses such as KIR allele typing. The mapping rate was used for result interpretation and comparison. This study adopts 7 Taiwan Biobank and 4 non-Taiwan Biobank short reads data. Compared to the other two reference genomes mentioned above, the Taiwanese reference pangenome shows a significant improvement in mapping rate. The overall mapping rate for 11 samples shows an improvement trend of about 1% compared to hg38-graph and about 0.9% compared to 1000G-graph.
Regarding KIR typing, among the 44 samples from HPRC, the data on uniquely mapped reads shows that BWA-linear performs better than 1000G-graph, which in turn performs better than hg38-graph. For the eleven Taiwanese samples, TW-graph shows significantly more uniquely mapped reads in the KIR region compared to the other three control groups. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-08-15T17:16:16Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-08-15T17:16:16Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
中文摘要 ii 英文摘要 iv 第一章 前言 1 1.1 背景介紹 1 1.2 研究目的 3 第二章 文獻探討 4 2.1 泛基因組 4 2.2 線性基因組 4 2.3 圖基因組 5 2.4 圖基因組回貼工具HISAT2 6 2.5 線性基因組回貼工具BWA 7 2.6 圖基因組KIR分型工具Graph-KIR 8 第三章 材料與方法 9 3.1 臺灣人體生物資料庫 (TaiWan Biobank, TWB) 9 3.2 人類泛參考基因體聯盟 12 3.3 建立臺灣人泛參考圖基因組及其他對照組 12 3.4 短序列回貼比較 15 3.5 Graph-KIR整合臺灣人變異點位 17 3.6 短序列回貼對KIR分型結果影響分析 19 第四章 結果與討論 20 4.1 原始資料基本統計 20 4.2 WGS短序列回貼率 23 4.3 不同參考基因組之HPRC短序列樣本之KIR分型結果差異 26 4.4 不同參考基因組之TWB短序列樣本之KIR分型結果差異 27 4.5 相同參考基因組中不同KIR-graph (有無整合臺灣人KIR區域變異點 位) 之比較 30 4.6 討論 38 第五章 結論 42 參考文獻 44 附錄 49 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 臺灣泛參考基因組 | zh_TW |
| dc.subject | 短序列回貼 | zh_TW |
| dc.subject | 臺灣人體生物資料庫 | zh_TW |
| dc.subject | KIR | zh_TW |
| dc.subject | 單核苷酸多態性 | zh_TW |
| dc.subject | Short read mapping | en |
| dc.subject | SNP | en |
| dc.subject | KIR | en |
| dc.subject | Taiwan Biobank | en |
| dc.subject | Taiwanese reference pangenome | en |
| dc.title | 建立臺灣人泛參考基因組提升短序列回貼及KIR分型正確性 | zh_TW |
| dc.title | Constructing Taiwanese Reference Pangenome (TW-graph) to Improve Read Mapping Rates and KIR typing | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳沛隆;許書睿;楊雅倩;許家郎 | zh_TW |
| dc.contributor.oralexamcommittee | Pei-Lung Chen;Shu-Jui Hsu;Ya-Chien Yang;Chia-Lang Hsu | en |
| dc.subject.keyword | 臺灣泛參考基因組,短序列回貼,臺灣人體生物資料庫,KIR,單核苷酸多態性, | zh_TW |
| dc.subject.keyword | Taiwanese reference pangenome,Short read mapping,Taiwan Biobank,KIR,SNP, | en |
| dc.relation.page | 56 | - |
| dc.identifier.doi | 10.6342/NTU202401388 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2024-08-12 | - |
| dc.contributor.author-college | 生物資源暨農學院 | - |
| dc.contributor.author-dept | 生物機電工程學系 | - |
| 顯示於系所單位: | 生物機電工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf | 1.74 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
