請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58937
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 盧奕璋(Yi-Chang Lu) | |
dc.contributor.author | Chun-Shen Liu | en |
dc.contributor.author | 劉浚升 | zh_TW |
dc.date.accessioned | 2021-06-16T08:39:49Z | - |
dc.date.available | 2018-10-23 | |
dc.date.copyright | 2013-10-23 | |
dc.date.issued | 2013 | |
dc.date.submitted | 2013-09-26 | |
dc.identifier.citation | [1] F. Sanger, S. Nicklen, and A. R. Coulson, 'DNA sequencing with chain-terminating inhibitors,' Proceedings of the National Academy of Sciences, vol. 74, pp. 5463-5467, December 1, 1977 1977.
[2] T. Finsterbusch and A. Mankertz, 'Porcine circoviruses—Small but powerful,' Virus Research, vol. 143, pp. 177-183, 2009. [3] A. Nakabachi, A. Yamashita, H. Toh, H. Ishikawa, H. E. Dunbar, N. A. Moran, et al., 'The 160-Kilobase Genome of the Bacterial Endosymbiont Carsonella,' Science, vol. 314, p. 267, October 13 2006. [4] Y.-L. Huang, 'Architecture design of the parallel processing element and interconnection network for de novo sequence assembly,' M.S. Thesis, Graudate Institute of Electronics Engineering, National Taiwan University Taiwan, 2012. [5] D. R. Zerbino and E. Birney, 'Velvet: Algorithms for de novo short read assembly using de Bruijn graphs,' Genome Research, vol. 18, pp. 821-829, May 1 2008. [6] Y.-H. Kuo, C.-S. Liu, Y.-C. Li, and Y.-C. Lu, 'Parallel architecutre and hardware implementation of pre-processor and post-processor for sequence assembly,' in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, 2013. [7] Y.-L. Huang, C.-S. Liu, Y.-C. Li, and Y.-C. Lu, 'Architecture design of parallel processing elements for de novo sequence assembly,' in SOC Conference (SOCC), 2013 IEEE International, 2013. [8] O. Morozova and M. A. Marra, 'Applications of next-generation sequencing technologies in functional genomics,' Genomics, vol. 92, pp. 255-264, 2008. [9] (2011). GS FLX+ System Sanger-like read lengths - the power of next-gen throughput. Available:http://454.com/downloads/GSFLXApplicationFlyer_FINALv2.pdf [10] M. Pop, S. L. Salzberg, and M. Shumway, 'Genome sequence assembly: algorithms and issues,' Computer, vol. 35, pp. 47-54, 2002. [11] R. L. Warren, G. G. Sutton, S. J. M. Jones, and R. A. Holt, 'Assembling millions of short DNA sequences using SSAKE,' Bioinformatics, vol. 23, pp. 500-501, February 15 2007. [12] J. R. Miller, S. Koren, and G. Sutton, 'Assembly algorithms for next-generation sequencing data,' Genomics, vol. 95, pp. 315-327, 2010. [13] D. J. Studholme, R. H. Glover, and N. Boonham, 'Application of High-Throughput DNA Sequencing in Phytopathology,' Annual Review of Phytopathology, vol. 49, pp. 87-105, 2011. [14] M. R. Garey and D. S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness: W. H. Freeman; Co., 1990. [15] E. W. Myers, G. G. Sutton, A. L. Delcher, I. M. Dew, D. P. Fasulo, M. J. Flanigan, et al., 'A Whole-Genome Assembly of Drosophila,' Science, vol. 287, pp. 2196-2204, March 24, 2000 2000. [16] X. Huang and S. P. Yang, 'Generating a genome assembly with PCAP,' Curr Protoc Bioinformatics, vol. Chapter 11, p. Unit11 3, Oct 2005. [17] P. A. Pevzner, H. Tang, and M. S. Waterman, 'An Eulerian path approach to DNA fragment assembly,' Proceedings of the National Academy of Sciences, vol. 98, pp. 9748-9753, August 14 2001. [18] J. Butler, I. MacCallum, M. Kleber, I. A. Shlyakhter, M. K. Belmonte, E. S. Lander, et al., 'ALLPATHS: De novo assembly of whole-genome shotgun microreads,' Genome Research, vol. 18, pp. 810-820, May 1, 2 2008. [19] R. Li, H. Zhu, J. Ruan, W. Qian, X. Fang, Z. Shi, et al., 'De novo assembly of human genomes with massively parallel short read sequencing,' Genome Research, December 17 2009. [20] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, 'Basic local alignment search tool,' Journal of Molecular Biology, vol. 215, pp. 403-410, 10/5/ 1990. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58937 | - |
dc.description.abstract | 全新基因序列組裝的結果在基因定序計畫中扮演著一個非常重要的角色,由於定序資料非常龐大,因此其過程必須耗費大量的時間。如何去減少運算所需的時間一直都是個很重要的議題。全新基因序列組裝的流程可再細分為前置過濾以及序列組裝兩部份。前置過濾主要負責將品質較差的序列根據不同的過濾條件濾除,來達到較好的組裝結果;序列組裝則是負責將過濾完的短序列組裝成長序列,使其更接近原有的基因序列。
在本篇論文中,我們針對兩部份各提出了一個平行化的架構。在前置過濾的部分,由於其流程中,各序列的過濾結果互不影響,因此透過硬體平行處理以及管線化架構可以使得過濾所需時間大幅縮短。在序列組裝演算法部分,基於尤拉路徑演算法的概念,我們將其建構de Bruijn圖的過程改進,使得原本比較適合單一執行緒的演算法更適合平行架構,並且保有與原架構相近的輸出解品質。此外,不論是前置過濾或是序列組裝演算法的部分,我們都可以調整其平行化的程度,來符合使用者所需的加速倍率以及硬體限制。 在前置過濾器部份,我們透過了Terasic DE4-230可程式化邏輯閘陣列(Field Programmable Gate Array, FPGA)來實作。相較於傳統透過單一執行緒的電腦處理,我們的架構可以縮短54.2 %的過濾時間。而序列組裝演算法部份則是透過訊息傳遞介面(Message Passing Interface, MPI)將其實做,運算所需時間相較於傳統演算法則縮短了22~35 %。 | zh_TW |
dc.description.abstract | De novo sequence assembly plays an important role in genome projects. How to reduce the long processing time caused by the large amount of genome data becomes an important issue. The process of de novo sequence assembly can be divided into two parts: the pre-processing and the assembly stages. In the pre-processing stage, the program filters out the low quality reads, while in the assembly stage, the program assembles short reads into long contigs.
In this thesis, we propose parallel architectures for both parts. In the pre-processing, the operations of each read is independent, so we can easily use pipeline and parallel techniques of hardware design to accelerate the process. In the assembly stage, based on Eulerian path algorithm, we modify the construction of de Bruijn graph to make it more suitable for parallel architecture. We can adjust the degree of parallelism for different hardware specifications. We use a Terasic DE4-230 FPGA board as the platform to implement our pre-processing hardware design. The FPGA-PC system saves 54.2% of execution time required by the PC-only design of for the pre-processing stage. The parallel algorithm is implemented by Message Passing Interface (MPI), the proposed algorithm can save 22~35% processing time if 65 CPU cores are used. | en |
dc.description.provenance | Made available in DSpace on 2021-06-16T08:39:49Z (GMT). No. of bitstreams: 1 ntu-102-R00943022-1.pdf: 5482234 bytes, checksum: b43979d816431c5806ee6d7a361ac328 (MD5) Previous issue date: 2013 | en |
dc.description.tableofcontents | 口試委員會審定書 i
致謝 ii 中文摘要 iii ABSTRACT iv 目錄 v 圖目錄 vii 表目錄 ix 第一章 緒論 1 1.1 研究背景 1 1.2 核酸定序 2 1.3 序列組裝 3 1.4 研究動機 4 1.5 論文架構 4 第二章 基因序列組裝演算法 5 2.1 基因序列來源與資料格式 5 2.2 序列組裝演算法 7 2.2.1 序列組裝遭遇的困難 7 2.2.2 Greedy Algorithm 8 2.2.3 Overlap-Layout-Consensus 9 2.2.4 Eulerian Path 10 2.3 Velvet 11 2.4 訊息傳遞介面 14 第三章 平行演算法設計與結果 16 3.1 演算法架構與流程 16 3.2 系統資料結構 17 3.3 匯入序列片段 19 3.4 重疊資訊標記 23 3.5 序列組裝 27 3.5.1 鎖定機制 28 3.5.2 尾對頭合併 29 3.5.3 尾對尾合併 30 3.5.4 頭對頭合併 32 3.6 組裝結果最佳化 33 3.7 模擬結果 35 第四章 前置處理器設計與結果 39 4.1 前置處理器架構 39 4.2 前置處理器各單元架構 42 4.2.1 主控與隨機存取記憶體控制模組 42 4.2.2 序列修剪計算模組 45 4.2.3 信心值中位數與低信心值鹼基數計算模組 46 4.3 模擬結果與分析 50 第五章 結論與展望 53 5.1 結論 53 5.2 展望 53 參考文獻 54 | |
dc.language.iso | zh-TW | |
dc.title | 全新基因序列組裝平行演算法與硬體架構設計 | zh_TW |
dc.title | A Parallel Algorithm and its Hardware Architecture Design for De Novo Sequence Assembly | en |
dc.type | Thesis | |
dc.date.schoolyear | 102-1 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 陳倩瑜(Chien-Yu Chen),簡韶逸(Shao-Yi Chien),陳和麟(Ho-Lin Chen) | |
dc.subject.keyword | 全新基因序列組裝,平行化,可程式化邏輯閘陣列,信息傳遞介面, | zh_TW |
dc.subject.keyword | de novo sequence assembly,parallel,FPGA,Message Passing Interface, | en |
dc.relation.page | 56 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2013-09-26 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電子工程學研究所 | zh_TW |
顯示於系所單位: | 電子工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 5.35 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。