請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20128
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 陳少傑(Sao-Jie Chen) | |
dc.contributor.author | Zi-Yuan Lin | en |
dc.contributor.author | 林子源 | zh_TW |
dc.date.accessioned | 2021-06-08T02:40:35Z | - |
dc.date.copyright | 2018-04-18 | |
dc.date.issued | 2018 | |
dc.date.submitted | 2018-03-19 | |
dc.identifier.citation | [1] Broad Institute (January, 2018), Genome Analysis Toolkit. Retrieved from https://software.broadinstitute.org/gatk/ .
[2] Avnet, Inc. (September, 2017), ZedBoard introduction. Retrieved from http://zedboard.org/product/zedboard/ . [3] The SAM/BAM Format Specification Working Group (August, 2017), Sequence Alignment/Map Format Specification. Retrieved from http://samtools.github.io/hts-specs/SAMv1.pdf . [4] P. Cock, C. Fields, N. Goto, M. Heuer, and P. Rice, “The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants,” Nucleic Acids Research, vol.38, no.6, pp.1767-1771. December, 2009. [5] L. Heng and D. Richard, “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform,” Bioinformatics, vol.25, no.14, pp.1754-1760. May, 2009. [6] The SAM/BAM Format Specification Working Group (January, 2018), Samtools. Retrieved from http://www.htslib.org/doc/samtools.html . [7] Broad Institute (December, 2017), Picard. Retrieved from http://broadinstitute.github.io/picard/ . [8] Broad Institute (September, 2014), HC overview: How the HaplotypeCaller works. Retrieved from https://www.broadinstitute.org/gatk/guide/article?id=4148 . [9] H. Yun, X.-L. Wu, D. Chen, M. Jian, and W.-M. Hwu, “BLESS: Bloom Filter-Based Error Correction Solution for High-Throughput Sequencing Reads,” Bioinformatics, vol.30, no.10, pp.1354-1362. January, 2014. [10] A. Ramachandran, Y. Heo, W.-M. Hwu, J. Ma, and D. Chen, 'FPGA Accelerated DNA Error Correction,' Design, Automation & Test in Europe Conference & Exhibition, pp. 1371-1376. March, 2015. [11] I.TS. Li, W. Shum, and K. Truong, '160-Fold Acceleration of the Smith-Waterman Algorithm using A Field Programmable Gate Array (FPGA),' BMC Bioinformatics, vol.8, no. 185. June, 2007. [12] National Center for Biotechnology Information (January, 2018), 1000 Genomes. Retrieved from ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/ . [13] The European Bioinformatics Institute (July, 2017), The International Genome Sample Resource - Providing ongoing support for the 1000 Genomes Project data. Retrieved from http://www.1000genomes.org/ . [14] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, and M.A. DePristo. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data,” Genome Research, vol.20, no.9, pp.1297-1303. September, 2010. [15] Intel Corporation (December, 2017), Intel® VTune™ Amplifier. Retrieved from https://software.intel.com/en-us/intel-vtune-amplifier-xe . [16] Java virtual machine (January, 2018). Retrieved from https://en.wikipedia.org/wiki/Java_virtual_machine . [17] Wikipedia (May, 2016), Java Garbage Collection Basics. Retrieved from http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html . [18] Broad Institute (December, 2013), Accelerating variant calling. Retrieved from https://hpc.mssm.edu/files/Carneiro_workshop.pdf . [19] Broad Institute (January, 2018), Integrative Genomics Viewer. Retrieved from http://www.broadinstitute.org/software/igv/IGV . [20] Xilinx, Inc. (March, 2011), AXI Reference Guide. Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/ug761_axi_reference_guide.pdf . [21] S. Ren, V. M. Sima, and Z. Al-Ars, “FPGA Acceleration of the Pair-HMMs Forward Algorithm for DNA Sequence Analysis,” Bioinformatics and Biomedicine, IEEE International Conference, pp. 1465-1470. November, 2015. [22] Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rupnow, Wen-mei W. Hwu and Deming Chen, “Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling,” Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 275-284, February 2017. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20128 | - |
dc.description.abstract | 本論文提出一個針對生物基因定序(DNA Sequencing)之應用軟體GATK (Genome Analysis Tool-Kit)中變異探索(Variant Discovery)步驟流程改良的軟硬體混合設計的加速方式。近年來由於生物醫學的研究與發展、次世代定序(Next Generation Sequencing)技術的發明,使得基因定序技術已有相當大幅度地突破,現今的基因定序之應用軟體以麻省理工Broad Institute 所發展的GATK 基因序列分析工具包較為著名,並在生物及醫學領域的研究中被廣泛使用。然而這類軟體仍存在著許多缺陷,例如執行效能受限於其軟體開發環境、部份功能的演算法效率不佳,以及記憶體使用需求高等問題,因此極需以另一種方式實作GATK 以解決上述問題。
在本論文中我們會以軟體語言(C++)以及硬體描述語言(Verilog HDL)對GATK 中的變異探索流程進行重新設計,其中包含了簡化流程中的演算法並降低運算複雜度、使用平行化的硬體架構達到加速目的;並在硬體描述語言上,透過 Field Programmable Gate Array (FPGA)驗證我們的設計。目前在硬體與軟體模擬已達到相較GATK 軟體約6.2倍的加速與原先相比不到10%的記憶體使用量。 | zh_TW |
dc.description.abstract | This work presents a digital hardware design to accelerate HaplotypeCaller, a tool in the Variant Discovery phase of Genome Analysis Tool-Kit (GATK) [1], which is a software tools package for genetic sequencing data analyzing.
Because of the progress of development in the biomedical field and the appearance of Next Generation Sequencing (NGS) [2] technique, there has been a breakthrough on large DNA sequencing throughput. Many software tools have been developed for DNA sequencing. In this Thesis, we will introduce a tool-kit called GATK, a well-known Java-based command line tool used by many Biomedical Scientists. However, these kinds of tools suffer from the low performance issue caused by their software development environment, and some of the algorithms may not work perfectly under certain special cases. Therefore, a new design using other language and platform is needed for further clinical analysis and research. In our work, we implement the redesign of a tool called HaplotypeCaller, which is the most important tool in the Variant Discovery phase of GATK. The work is done by using a software hardware co-design environment of C++ and Verilog, and implementing the hardware part on FPGA. The overall performance of our software and hardware co-design platform achieved a speed-up of 6.2 times. | en |
dc.description.provenance | Made available in DSpace on 2021-06-08T02:40:35Z (GMT). No. of bitstreams: 1 ntu-107-R04943144-1.pdf: 3449270 bytes, checksum: c3ed99c5add8667992ad3e593b48f92f (MD5) Previous issue date: 2018 | en |
dc.description.tableofcontents | ABSTRACT i
TABLE OF CONTENTS iii LIST OF FIGURES v LIST OF TABLES vii CHAPTER 1 INTRODUCTION 1 1.1 Overview 1 1.2 Motivation 2 1.3 Thesis Organization 3 CHAPTER 2 BACKGROUND 5 2.1 Pre-Processing Process 6 2.2 Variant Discovery 8 2.2.1 Introduction to HaplotypeCaller 9 2.3 Callset Refinement 11 2.4 Memory Reduction Works in Genome Sequencing 12 2.5 Hardware Acceleration Works in Genome Sequencing 13 CHAPTER 3 ANALYSIS AND PROFILING 15 3.1 Analysis Environment and Datasets 15 3.2 Analysis on GATK 17 3.2.1 Analysis on GATK HaplotypeCaller 18 3.3 Analysis Results – About Java Virtual Machine 20 3.4 Source Code Tracing on GATK 23 3.5 Flow Tracing on HaplotypeCaller 24 3.6 Conclusion of Analysis and Profiling 26 CHAPTER 4 HYBRID PLATFORM 27 4.1 Software Hardware Co-design Platform 27 4.2 Software Architecture and Processing Flow 29 4.3.1 Software Pre-Processing Flow 30 4.3 Hardware Architecture and Processing Flow 31 4.3.1 Input and Output Specification 32 4.3.2 Hardware Decoder Flow 34 4.3.3 Hardware Assembler Flow 34 4.3.4 Pair-HMM Algorithm 36 4.3.5 Hardware Genotype Assigner 44 CHAPTER 5 EXPERIMENT RESULTS 47 5.1 Software Architecture Performance Results 47 5.2 Hardware Architecture Performance Results 49 5.3 Overall Comparison of Hybrid Architecture with Original GATK 51 CHAPTER 6 CONCLUSION 53 REFERENCE 55 | |
dc.language.iso | en | |
dc.title | GATK變異尋找工具的硬體加速 | zh_TW |
dc.title | Acceleration of Variant Discovery Tool in GATK | en |
dc.type | Thesis | |
dc.date.schoolyear | 106-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 趙坤茂(Kun-Mao Chao),盧奕璋(Yi-Chang Lu),陳倩瑜(Chien-Yu Chen) | |
dc.subject.keyword | 基因定序,GATK 基因體分析工具包,DNA變異探索, | zh_TW |
dc.subject.keyword | Genome Sequencing,GATK,DNA Variant Calling, | en |
dc.relation.page | 56 | |
dc.identifier.doi | 10.6342/NTU201800688 | |
dc.rights.note | 未授權 | |
dc.date.accepted | 2018-03-20 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-107-1.pdf 目前未授權公開取用 | 3.37 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。