請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58689
標題: | 評估非模式物種次世代轉錄體組裝之方法 Evaluations of De novo Transcriptome Assembly Methods on Next-Generation Sequencing Data |
作者: | Shu-Min Kao 高樹民 |
指導教授: | 劉力瑜(Li-Yu Liu) |
共同指導教授: | 林詩舜(Shih-Shun Lin) |
關鍵字: | 次世代定序技術,次世代轉錄體定序,無參考序列轉錄體組裝, NGS,RNA-Seq,de novo transcriptome assembly, |
出版年 : | 2014 |
學位: | 碩士 |
摘要: | 近年來由於定序技術的日新月異,利用次世代定序技術 (next-generation sequencing, NGS) 解序及分析模式、非模式生物組織中所有核醣核酸的序列已發展成強大的工具,研究全轉錄體對單一實驗室而言不再遙不可及。為了偵測不同生物樣本的基因表現量,一個典型的次世代轉錄體分析流程包括了將讀取得到的序列比對回參考序列,接著計算每單位所比對到的讀取序列片段數,並將整理後的片段數進行校正以及統計檢定,最後將檢定結果與生物假說整合並以實驗佐證。如前述步驟所言,對於存在參考序列的生物樣本,吾人可以利用生物資訊演算法將讀取到的片段比對回參考序列;但是對於非模式生物而言,吾人則必須先利用生物資訊演算法,將讀取片段僅利用片段與片段間的關係連接,組裝成有可能的轉錄本後再進行校正以及後續統計分析。然而對於選擇組裝次世代定序所產生的短讀取序列片段之方法仍無定論,組裝方法的準確性與效率也仍未有太多探討。故此篇文章藉由模擬以次世代定序技術定序阿拉伯芥所產生的短讀取片段,研究並探討各組裝方法之性能及特性。根據組裝結果質性的比較,以混和讀取片段進行組裝的策略表現得比混和初步組裝長序列的策略,產生較多較長以及正確的組裝結果,但同時也產生較多類型的組裝錯誤。綜合此篇研究中的評估,從組裝的錯誤率來看,Oases 表現得較好;CLC具有組裝出最多樣正確結果的能力。本研究的結果並不能論定最佳的組裝方法,有效的結合各組裝方法的特性和結果在未來是值得研究的目標。最後,我們將最適化後的結果套用至組裝日日春全轉錄體上,期望在後續統計及生物實驗分析上取得較高的可靠性。 With recent advances in sequencing technologies, whole transcriptome sequencing using next-generation sequencing (NGS) methods has emerged as a powerful approach to unraveling the complexity of both model and non-model species, making genome-wide transcription studies even accessible to individual laboratories. A typical RNA-Seq analysis pipeline for detecting differential expression begins with mapping reads to reference, followed by calculating read summarization, normalization, statistical testing, and integrative analysis. While mapping reads to reference rely on reference genome or transcriptome for model organism, non-model species without complete or with partial reference must perform de novo transcriptome assembly, as a crucial step in detecting differential expression or generating transcriptomic resource. However, the accuracy and efficiency of de novo assembled transcripts from ultra-short reads via current assembly methods remain unclear. Here, we assess the performance and investigate the properties of selected assemblers by simulated RNA-seq reads generated from Arabidopsis. According to the qualitative evaluations, pooled-reads strategy assembled more and longer correct contigs at a cost of mis-assembly than pooled-contigs strategy. The evaluations of assemblers revealed that Oases outperformed others by assembling less noise. CLC assembled the most amount of uniquely correct contigs in most of the simulations. The results suggest that there is no best assembler so far, but different properties of assemblers could be utilized together when reconstructing transcriptome sequence without reference genome. Finally the optimized results were applied to construct the transcriptome sequence of Catharanthus roseus plants for more reliability in downstream analyses. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58689 |
全文授權: | 有償授權 |
顯示於系所單位: | 農藝學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-103-1.pdf 目前未授權公開取用 | 3.07 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。