請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61381
標題: | Hadoop雲端運算平台上生物基因組裝演算法實作與效能改進 The Implementation and Performance Improvement of Genome Assembly on Hadoop MapReduce |
作者: | Chun-Yang Huang 黃峻揚 |
指導教授: | 黃乾綱(Chien-Kang Huang) |
關鍵字: | 基因組裝,Hadoop,MapReduce,Contrail, Genome Assembly,Hadoop,MapReduce,Contrail, |
出版年 : | 2013 |
學位: | 碩士 |
摘要: | 生物基因組裝是將定序出的基因組序列片段組裝回原始序列的技術,但在執行時會耗費許多硬體資源,使得當組裝大型基因組時組裝程序難以執行完畢。Hadoop雲端運算為近幾年非常熱門的話題,使用多台電腦建構出分散式運算環境,可以有效減少本地端的運算量,還可避免資料在本地端與伺服器之間過多的傳輸造成資源浪費。
本論文使用M.Schatz等人開發的結合生物基因組裝與Hadoop雲端運算的組裝工具,名為Contrail。Contrail是將基因組裝的演算法實作在Hadoop雲端運算的平台上,利用其分散式運算的特性,解決多數的組裝工具在組裝大型生物基因組時,硬體資源不足導致組裝程序難以順利執行的情況。本論文研究Hadoop改版前後系統架構與API上的差異,修改Contrail的程式,使之能在現行的Hadoop平台上順利執行,並與目前較為常用的兩個組裝工具Velvet與SOAPdenovo作組裝結果的比較。此外,更進一步針對Hadoop系統中運算資源的利用,以及基因組裝工具的圖形演算法兩者的效能問題進行改進。 研究發現Contrail的組裝結果在較小的基因組上與Velvet的結果較為相似,較大的基因組則和SOAPdenovo的結果較為類似,而在Velvet與SOAPdenovo的組裝程序皆難以完成的大型基因組,Contrail可以順利得出組裝結果。說明Contrail不僅能處理多數組裝工具難以順利執行的大型基因組,組裝的結果也有一定的參考價值。 Genome assembly is the process of taking the reads and putting them back together to reproduce the original sequences. But the process takes lots of computer resources, makes it hard to complete whole process as it assembling large genome. Hadoop is one of the hottest topics for these years. By construct distributed computational circumstance, Hadoop reduce local computation and avoid frequently data-transportation between server and client. This thesis use the assembly tool developed by M.Schatz, it combines genome assembly and Hadoop cloud computing, named Contrail. Utilizing the characteristic of distributed computation, Contrail is able to solve the problem that most assemblers are hard to complete large genome assembly. This thesis study the revision of Hadoop system architecture and API, and revise the Contrail code to make it be able to run on current version of Hadoop platfrom. Furthermore, we improve the performance of Contrail and compare the assembly result with Velvet and SOAPdenovo. We find out the assembly result of Contrail is similar with Velvet’s in small genome, and more similar with SOAPdenovo in larger genome. To the large genome assembly Velvet and SOAPdenovo are hard to complete the whole assembly process, Contrail complete the assembly process successfully. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/61381 |
全文授權: | 有償授權 |
顯示於系所單位: | 工程科學及海洋工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-102-1.pdf 目前未授權公開取用 | 4.26 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。