請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76646完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳少傑(Sao-Jie Chen) | |
| dc.contributor.author | Je-Luen Ju | en |
| dc.contributor.author | 朱哲論 | zh_TW |
| dc.date.accessioned | 2021-07-10T21:34:26Z | - |
| dc.date.available | 2021-07-10T21:34:26Z | - |
| dc.date.copyright | 2016-11-09 | |
| dc.date.issued | 2016 | |
| dc.date.submitted | 2016-09-02 | |
| dc.identifier.citation | REFERENCE
[1] https://www.broadinstitute.org/gatk/ . [2] http://zedboard.org/ . [3] http://bio-bwa.sourceforge.net/ . [4] http://samtools.sourceforge.net/ . [5] L. Heng and D. Richard, “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform,” Bioinformatics, vol.25, no.14, pp.1754-1760, May, 2009 [6] http://broadinstitute.github.io/picard/ . [7] https://www.broadinstitute.org/gatk/guide/article?id=4148 . [8] H. Yun, X.-L. Wu, D. Chen, M. Jian, and W.-M. Hwu, “BLESS: Bloom Filter-Based Error Correction Solution for High-Throughput Sequencing Reads,” Bioinformatics, vol.30, no.10, pp.1354-1362, January, 2014. [9] A. Ramachandran, Y. Heo, W.-M. Hwu, J. Ma, and D. Chen, 'FPGA Accelerated DNA Error Correction,' 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, pp. 1371-1376. March, 2015. [10] M.C. Schatz, C. Trapnell, A.L. Delcher, and A. Varshney, 'High-Throughput Sequence Alignment using Graphics Processing Units,' BMC Bioinformatics, vol.8, no. 474, December, 2007. [11] I.TS. Li, W. Shum, and K. Truong, '160-Fold Acceleration of The Smith-Waterman Algorithm using A Field Programmable Gate Array (FPGA),' BMC Bioinformatics, vol.8, no. 185, June, 2007. [12] ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/ . [13] http://www.1000genomes.org/ . [14] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, and M.A. DePristo. “The Genome Analysis Toolkit: A MapReduce Framework for Analyzing Next-Generation DNA Sequencing Data,” Genome Research, vol.20, no.9, pp.1297-1303, September, 2010. [15] https://software.intel.com/en-us/intel-vtune-amplifier-xe . [16] https://en.wikipedia.org/wiki/Java_virtual_machine . [17] http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html . [18] https://hpc.mssm.edu/files/Carneiro_workshop.pdf . [19] http://www.broadinstitute.org/software/igv/IGV . [20] http://www.xillybus.com/ . [21] S. Ren, V. M. Sima, and Z. Al-Ars, 'FPGA Acceleration of The Pair-HMMs Forward Algorithm for DNA Sequence Analysis,' Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on, Washington, DC, pp. 1465-1470, November, 2015. [22] http://snver.sourceforge.net/ . | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/76646 | - |
| dc.description.abstract | 本論文提出一個專門針對生物基因定序(DNA Sequencing)應用軟體GATK (Genome Analysis Tool-Kit)中的變異探索(Variant Discovery)流程改良的硬體加速電路設計。有鑑於生物醫學的研究與發展、次世代定序(Next Generation Sequencing)技術的發明,使得基因定序技術在近年來已有相當大幅度地突破,現今的基因定序應用軟體以由Broad Institute所發展的GATK基因體分析工具包較為著名,並在生物醫學領域被廣泛研究使用。然而這類軟體仍存在著許多先天上的不足之處,例如:執行效能受限於其軟體開發環境、部份功能的演算法效率不佳等,因此亟需以另一種方式實現GATK以解決上述問題。
在本論文中我們會以軟體語言(C++)以及硬體描述語言(Verilog HDL)對GATK中的變異探索流程進行重新設計:包含簡化流程中的演算法並降低運算複雜度、使用平行化的硬體架構達到加速目的;並在硬體描述語言上,透過 Field Programmable Gate Array (FPGA)驗證我們的設計。目前在硬體模擬已達到相較軟體約25倍的整體速度提升。 | zh_TW |
| dc.description.abstract | This work presents a digital hardware design to accelerate the Variant Discovery phase in Genome Analysis Tool-Kit (GATK) [1], which is a software package designed for analyzing high-throughput sequencing data.
With the progress of research and development in the Biomedical field and the invention of Next Generation Sequencing (NGS) technique, there is a huge breakthrough on producing large DNA sequence data for analysis. Many software tools have been developed to assist DNA sequencing, such as GATK, a well-known Java based command line tool used by many Biomedical Scientists. However, these kinds of tools suffer from the low performance issue caused by their software development environment, and some of the algorithms may not work perfectly under certain special cases. Therefore, a new design using other language and platform is needed for further clinical analysis and research. In this work, we redesign HaplotypeCaller, a core tool in the Variant Discovery phase of GATK, using Verilog HDL and C++ language, and realize our Verilog design on a ZedBoard [2] FPGA. The performance of our hardware design achieved an average speed-up of 20,000 times compared to the software version of GATK. The overall performance to our software and hardware co-design platform still achieved a speed-up of 25 times. | en |
| dc.description.provenance | Made available in DSpace on 2021-07-10T21:34:26Z (GMT). No. of bitstreams: 1 ntu-105-R03943128-1.pdf: 6926688 bytes, checksum: 2ea4e98f80d511e13f5a5a592eb53087 (MD5) Previous issue date: 2016 | en |
| dc.description.tableofcontents | TABLE OF CONTENTS
ABSTRACT i TABLE OF CONTENTS iii LIST OF FIGURES v LIST OF TABLES vii CHAPTER 1 INTRODUCTION 1 1.1 Overview on Genome Analysis Tool-Kit 1 1.2 Motivation 2 1.3 Thesis Organization 3 CHAPTER 2 BACKGROUND 5 2.1 Best Practices WorkflowPre-Processing Phase 6 2.2 Best Practices WorkflowVariant Discovery Phase 8 2.2.1 Introduction to HaplotypeCaller 9 2.3 Best Practices WorkflowCallset Refinement Phase 11 2.4 Memory Reduction Works in Genome Sequencing 12 2.5 Hardware Acceleration Works in Genome Sequencing 13 2.5.1 Speed-up on Graphic Processing Units 14 2.5.2 Acceleration on FPGA 15 CHAPTER 3 ANALYSIS AND PROFILING 17 3.1 Analysis Environment and Datasets 17 3.2 Analysis on GATK 19 3.2.1 Analysis Flow and Tools 19 3.2.2 Analysis Results 20 3.3 Advanced Profiling 21 3.3.1 Introduction to Intel® vTune Amplifier 22 3.3.2 Profiling Processes 22 3.3.3 Profiling Results 22 3.4 Features of Java Virtual Machine 25 3.5 Source Code Tracing on GATK 29 3.5.1 Stack Tracing on Simple Function Call 29 3.5.2 Flow Tracing on HaplotypeCaller 30 3.5.3 Problems Observed from HaplotypeCaller 32 3.6 Conclusion on Analysis and Profiling 35 CHAPTER 4 HYBRID PLATFORM AND HARDWARE ACCELERATION 37 4.1 Hybrid Co-Design Platform 38 4.2 Software Architecture 39 4.2.1 Software Processing Flow 40 4.2.2 Flow of Pre-Processing Sequence Data 41 4.2.3 Multi-Thread Pipelining Design 42 4.2.4 Flow Control on Multi-Threads 43 4.3 Hardware Architecture 44 4.3.1 Input and Output Specification 45 4.3.2 Data Decoder Hardware Architecture 47 4.3.3 Design for Assembler 48 4.3.4 Assembler Hardware Architecture 50 4.3.5 Pair-HMM Forward Algorithm 51 4.3.6 Pair-HMM Hardware Accelerator 59 4.3.7 Genotype Assigner Hardware Architecture 60 CHAPTER 5 SIMULATION AND EXPERIMENT RESULTS 63 5.1 Software Simulation Results 63 5.2 Hardware Simulation Results 66 5.3 Overall Simulation Results on Hybrid Architecture 68 CHAPTER 6 CONCLUSIONS 71 REFERENCE 73 | |
| dc.language.iso | en | |
| dc.subject | DNA變異探索 | zh_TW |
| dc.subject | 基因定序 | zh_TW |
| dc.subject | GATK基因體分析工具包 | zh_TW |
| dc.subject | Variant Discovery | en |
| dc.subject | GATK | en |
| dc.subject | DNA Sequencing | en |
| dc.title | GATK變異探索工具的分析與加速 | zh_TW |
| dc.title | Analysis and Acceleration of Variant Discovery Tool in GATK | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 105-1 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳中平,游竹,林伯星 | |
| dc.subject.keyword | 基因定序,GATK基因體分析工具包,DNA變異探索, | zh_TW |
| dc.subject.keyword | DNA Sequencing,GATK,Variant Discovery, | en |
| dc.relation.page | 74 | |
| dc.identifier.doi | 10.6342/NTU201603548 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2016-09-03 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
| 顯示於系所單位: | 電子工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-105-R03943128-1.pdf 未授權公開取用 | 6.76 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
