評估非模式物種次世代轉錄體組裝之方法

Shu-Min Kao; 高樹民

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58689

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉力瑜(Li-Yu Liu)
dc.contributor.author	Shu-Min Kao	en
dc.contributor.author	高樹民	zh_TW
dc.date.accessioned	2021-06-16T08:25:51Z	-
dc.date.available	2015-08-20
dc.date.copyright	2014-01-27
dc.date.issued	2014
dc.date.submitted	2014-01-21
dc.identifier.citation	[1] M D Adams, J M Kelley, J D Gocayne, M Dubnick, M H Polymeropoulos, H Xiao, C R Merril, A Wu, B Olde, and R F Moreno. Complementary DNA sequencing: expressed sequence tags and human genome project. Science (New York, N.Y.), 252(5013):1651–6, June 1991. [2] S F Altschul,WGish,WMiller, EWMyers, and D J Lipman. Basic local alignment search tool. Journal of molecular biology, 215(3):403–10, October 1990. [3] J. C. Alwine. Method for Detection of Specific RNAs in Agarose Gels by Transfer to Diazobenzyloxymethyl-Paper and Hybridization with DNA Probes. Proceedings of the National Academy of Sciences, 74(12):5350–5354, December 1977. [4] Michael Becker-Andr’e and Klaus Hahlbrock. Absolute mRNA quantification using the polymerase chain reaction (PCR). A novel approach by a P CR aided t ranscipt t itration assay (PATTY). Nucleic Acids Research, 17(22):9437–9446, November 1989. [5] V Cahais, P Gayral, G Tsagkogeorga, J Melo-Ferreira, M Ballenghien, L Weinert, Y Chiari, K Belkhir, V Ranwez, and N Galtier. Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Molecular ecology resources, 12(5):834–45, September 2012. [6] Tean-Hsu Chang. Identification of conserved microRNAs and their targets in PnWB phytoplasma induced leafy flower of cantharanthus roseus. PhD thesis, 2012. [7] CLC Genomics Workbench. http://www.clcbio.com. [8] Lior David, Wolfgang Huber, Marina Granovskaia, Joern Toedling, Curtis J Palm, Lee Bofkin, Ted Jones, Ronald W Davis, and Lars M Steinmetz. A high-resolution map of transcription in the yeast genome. Proceedings of the National Academy of Sciences of the United States of America, 103(14):5320–5, April 2006. [9] Jialei Duan, Chuan Xia, Guangyao Zhao, Jizeng Jia, and Xiuying Kong. Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. BMC genomics, 13:392, January 2012. [10] Limin Fu, Beifang Niu, Zhengwei Zhu, Sitao Wu, and Weizhong Li. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England), 28(23):3150–2, December 2012. [11] Xiangchao Gan, Oliver Stegle, Jonas Behr, Joshua G Steffen, Philipp Drewe, Katie L Hildebrand, Rune Lyngsoe, Sebastian J Schultheiss, Edward J Osborne, Vipin T Sreedharan, Andr’e Kahles, Regina Bohnert, G’eraldine Jean, Paul Derwent, Paul Kersey, Eric J Belfield, Nicholas P Harberd, Eric Kemen, Christopher Toomajian, Paula X Kover, Richard M Clark, Gunnar R‥atsch, and Richard Mott. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature, 477(7365):419–23, September 2011. [12] Chen Geng, Y I N Kangping,Wang Charles, and S H I Tieliu. De novo transcriptome assembly of RNA-Seq reads with different strategies. 54(12):1129–1133, 2011. [13] Elsa G’ongora-Castillo and C Robin Buell. Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence. Natural product reports, 30(4):490–500, April 2013. [14] Manfred G Grabherr, Brian J Haas, Moran Yassour, Joshua Z Levin, Dawn a Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, Zehua Chen, Evan Mauceli, Nir Hacohen, Andreas Gnirke, Nicholas Rhind, Federica di Palma, Bruce W Birren, Chad Nusbaum, Kerstin Lindblad-Toh, Nir Friedman, and Aviv Regev. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology, 29(7):644–52, July 2011. [15] Nicole Gruenheit, Oliver Deusch, Christian Esser, Matthias Becker, Claudia Voelckel, and Peter Lockhart. Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants. BMC genomics, 13(1):92, January 2012. [16] Xiaoqiu Huang and Anup Madan. CAP3 : A DNA Sequence Assembly Program. (906):868–877, 1999. [17] Clarke Kaitlin, Yang Yi, Marsh Ronald, X I E Linglin, and Zhang Ke K. Comparative analysis of de novo transcriptome assembly. 56(2):156–162, 2013. [18] Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4):357–9, April 2012. [19] Heng Li. https://github.com/nh13/DWGSIM. [20] Weizhong Li and Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England), 22(13):1658–9, July 2006. [21] Ruibang Luo, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiaoqian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam, and JunWang. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 1(1):18, January 2012. [22] Allyson M MacLean, Akiko Sugio, Olga V Makarova, Kim C Findlay, Victoria M Grieve, R’eka T’oth, Mogens Nicolaisen, and Saskia A Hogenhout. Phytoplasma effector SAP54 induces indeterminate leaf-like flower development in Arabidopsis plants. Plant physiology, 157(2):831–41, October 2011. [23] Elaine R Mardis. Next-generation DNA sequencing methods. Annual review of genomics and human genetics, 9:387–402, January 2008. [24] Jeffrey A Martin and ZhongWang. Next-generation transcriptome assembly. Nature Publishing Group, 12(10):671–682, 2011. [25] Laetitia B B Martin, Zhangjun Fei, James J Giovannoni, and Jocelyn K C Rose. Catalyzing plant science research with RNA-seq. Frontiers in plant science, 4:66, January 2013. [26] Paul A McGettigan. Transcriptomics in the RNA-seq era. Current opinion in chemical biology, 17(1):4–11, February 2013. [27] Olena Morozova, Martin Hirst, and Marco a Marra. Applications of new sequencing technologies for transcriptome analysis. Annual review of genomics and human genetics, 10:135–51, January 2009. [28] Kai-Oliver Mutz, Alexandra Heilkenbrinker, Maren L‥onne, Johanna-Gabriela Walter, and Frank Stahl. Transcriptome analysis using next-generation sequencing. Current opinion in biotechnology, 24(1):22–30, February 2013. [29] Ugrappa Nagalakshmi, Zhong Wang, Karl Waern, Chong Shou, Debasish Raha, Mark Gerstein, and Michael Snyder. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science (New York, N.Y.), 320(5881):1344–9, June 2008. [30] K. E. Noonan, C. Beck, T. A. Holzmayer, J. E. Chin, J. S. Wunder, I. L. Andrulis, A. F. Gazdar, C. L. Willman, B. Griffith, and D. D. Von Hoff. Quantitative analysis of MDR1 (multidrug resistance) gene expression in human tumors by polymerase chain reaction. Proceedings of the National Academy of Sciences, 87(18):7160–7164, September 1990. [31] Alicia Oshlack, Mark D Robinson, and Matthew D Young. From RNA-seq reads to differential expression results. Genome biology, 11(12):220, January 2010. [32] Fatih Ozsolak and Patrice M Milos. RNA sequencing: advances, challenges and opportunities. Nature reviews. Genetics, 12(2):87–98, February 2011. [33] P A Pevzner, H Tang, and M S Waterman. An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences of the United States of America, 98(17):9748–53, August 2001. [34] Andrea Rau, M’elina Gallopin, Gilles Celeux, and Florence Jaffr’ezic. Data-based filtering for replicated high-throughput transcriptome sequencing experiments. Bioinformatics (Oxford, England), 29(17):2146–2152, July 2013. [35] P Rice, I Longden, and A Bleasby. EMBOSS: the European Molecular Biology Open Software Suite. Trends in genetics : TIG, 16(6):276–7, June 2000. [36] Adam Roberts and Lior Pachter. Streaming fragment assignment for real-time analysis of sequencing experiments. Nature methods, 10(1):71–3, January 2013. [37] Michael C Schatz, Arthur L Delcher, and Steven L Salzberg. Assembly of large genomes using second-generation sequencing. Genome research, 20(9):1165–73, September 2010. [38] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science, 270(5235):467–470, October 1995. [39] Simon Schliesky, Udo Gowik, P Andreas, M Weber, and Andrea Br‥autigam. RNA-seq assembly – are we there yet ? 3(September):1–12, 2012. [40] Marcel H Schulz, Daniel R Zerbino, Martin Vingron, and Ewan Birney. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics (Oxford, England), 28(8):1086–92, April 2012. [41] Susan R Strickler, Aureliano Bombarely, and Lukas A Mueller. Designing a transcriptome next-generation sequencing project for a nonmodel plant species. American journal of botany, 99(2):257–66, February 2012. [42] Yann Surget-Groba and Juan I Montoya-Burgos. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome research, 20(10):1432–40, October 2010. [43] Cole Trapnell, Lior Pachter, and Steven L Salzberg. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (Oxford, England), 25(9):1105–11, May 2009. [44] Hsin-I Tseng. High-throughput transcriptome and small RNA analysis for studying phytoplasma infection on catharanthus roseus using next generation sequencing. PhD thesis, 2011. [45] V. E. Velculescu, L. Zhang, B. Vogelstein, and K. W. Kinzler. Serial Analysis of Gene Expression. Science, 270(5235):484–487, October 1995. [46] V E Velculescu, L Zhang, WZhou, J Vogelstein, M A Basrai, D E Bassett, P Hieter, B Vogelstein, and K W Kinzler. Characterization of the yeast transcriptome. Cell, 88(2):243–51, January 1997. [47] Nagarjun Vijay, Jelmer W Poelstra, Axel K‥unstner, and Jochen B W Wolf. Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Molecular ecology, 22(3):620–34, February 2013. [48] Zhong Wang, Mark Gerstein, and Michael Snyder. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10(1):57–63, 2009. [49] Alexander J Westermann, Stanislaw A Gorski, and J‥org Vogel. Dual RNA-seq of pathogen and host. Nature reviews. Microbiology, 10(9):618–30, September 2012. [50] Jochen B W Wolf. Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Molecular ecology resources, 13(4):559–72, April 2013. [51] Yinlong Xie, Gengxiong Wu, Jingbo Tang, Ruibang Luo, Jordan Patterson, Weihua Huang, Guangzhu He, Shengchang Gu, Shengkang Li, Xin Zhou, Yingrui Li, Xun Xu, Gane Ka-shu Wong, and Jun Wang. SOAPdenovo-Trans : De novo transcriptome assembly with short RNA-Seq reads. pages 1–7, 2013. [52] Daniel R Zerbino, Ewan Birney, Juliane C Dohm, Claudio Lottaz, Tatiana Borodina, Jonathan Butler, Iain Maccallum, Michael Kleber, Jared T Simpson, and Richard Durbin. Velvet : Algorithms for de novo short read assembly using de Bruijn graphs structures. pages 821–829, 2008. [53] Daniel Robert Zerbino. Genome assembly and comparison using de Bruijn graphs. 2009. [54] Qiong-yi Zhao, Yi Wang, Yi-meng Kong, Da Luo, Xuan Li, and Pei Hao. Optimizing de novo transcriptome assembly from short-read RNA-Seq data : a comparative study. BMC Bioinformatics, 12(Suppl 14):S2, 2011.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/58689	-
dc.description.abstract	近年來由於定序技術的日新月異，利用次世代定序技術 (next-generation sequencing, NGS) 解序及分析模式、非模式生物組織中所有核醣核酸的序列已發展成強大的工具，研究全轉錄體對單一實驗室而言不再遙不可及。為了偵測不同生物樣本的基因表現量，一個典型的次世代轉錄體分析流程包括了將讀取得到的序列比對回參考序列，接著計算每單位所比對到的讀取序列片段數，並將整理後的片段數進行校正以及統計檢定，最後將檢定結果與生物假說整合並以實驗佐證。如前述步驟所言，對於存在參考序列的生物樣本，吾人可以利用生物資訊演算法將讀取到的片段比對回參考序列；但是對於非模式生物而言，吾人則必須先利用生物資訊演算法，將讀取片段僅利用片段與片段間的關係連接，組裝成有可能的轉錄本後再進行校正以及後續統計分析。然而對於選擇組裝次世代定序所產生的短讀取序列片段之方法仍無定論，組裝方法的準確性與效率也仍未有太多探討。故此篇文章藉由模擬以次世代定序技術定序阿拉伯芥所產生的短讀取片段，研究並探討各組裝方法之性能及特性。根據組裝結果質性的比較，以混和讀取片段進行組裝的策略表現得比混和初步組裝長序列的策略，產生較多較長以及正確的組裝結果，但同時也產生較多類型的組裝錯誤。綜合此篇研究中的評估，從組裝的錯誤率來看，Oases 表現得較好；CLC具有組裝出最多樣正確結果的能力。本研究的結果並不能論定最佳的組裝方法，有效的結合各組裝方法的特性和結果在未來是值得研究的目標。最後，我們將最適化後的結果套用至組裝日日春全轉錄體上，期望在後續統計及生物實驗分析上取得較高的可靠性。	zh_TW
dc.description.abstract	With recent advances in sequencing technologies, whole transcriptome sequencing using next-generation sequencing (NGS) methods has emerged as a powerful approach to unraveling the complexity of both model and non-model species, making genome-wide transcription studies even accessible to individual laboratories. A typical RNA-Seq analysis pipeline for detecting differential expression begins with mapping reads to reference, followed by calculating read summarization, normalization, statistical testing, and integrative analysis. While mapping reads to reference rely on reference genome or transcriptome for model organism, non-model species without complete or with partial reference must perform de novo transcriptome assembly, as a crucial step in detecting differential expression or generating transcriptomic resource. However, the accuracy and efficiency of de novo assembled transcripts from ultra-short reads via current assembly methods remain unclear. Here, we assess the performance and investigate the properties of selected assemblers by simulated RNA-seq reads generated from Arabidopsis. According to the qualitative evaluations, pooled-reads strategy assembled more and longer correct contigs at a cost of mis-assembly than pooled-contigs strategy. The evaluations of assemblers revealed that Oases outperformed others by assembling less noise. CLC assembled the most amount of uniquely correct contigs in most of the simulations. The results suggest that there is no best assembler so far, but different properties of assemblers could be utilized together when reconstructing transcriptome sequence without reference genome. Finally the optimized results were applied to construct the transcriptome sequence of Catharanthus roseus plants for more reliability in downstream analyses.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T08:25:51Z (GMT). No. of bitstreams: 1 ntu-103-R00621207-1.pdf: 3147133 bytes, checksum: bfd064bbad077855eace4467f034d037 (MD5) Previous issue date: 2014	en
dc.description.tableofcontents	誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Materials and Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Analyses workflow in this study . . . . . . . . . . . . . . . . . . . . . . 7 2.2 RNA-Seq simulations and real data . . . . . . . . . . . . . . . . . . . . . 7 2.3 Selected assemblers and assembling protocols . . . . . . . . . . . . . . . 9 2.4 Procedure of evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Comparisons of correlations of log2 fold-change of FPKM between simulations and de novo assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Reconciliation of assemblies from different assemblers . . . . . . . . . . 12 3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Assembling results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Evaluations of strategies for assembling . . . . . . . . . . . . . . . . . . 14 3.3 Evaluations of algorithms for assembling using pooled-reads strategy . . 15 3.4 Assembly metrics of completely and partially assembled contigs . . . . . 17 3.5 Comparison of uniquely complete contigs across simulations . . . . . . . 18 3.6 Evaluations in terms of gene expression . . . . . . . . . . . . . . . . . . 18 3.7 Construction of transcriptome of Catharanthus roseus . . . . . . . . . . . 19 4 Discussions and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Supplementary tables and figures . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
dc.language.iso	en
dc.subject	無參考序列轉錄體組裝	zh_TW
dc.subject	次世代定序技術	zh_TW
dc.subject	次世代轉錄體定序	zh_TW
dc.subject	RNA-Seq	en
dc.subject	de novo transcriptome assembly	en
dc.subject	NGS	en
dc.title	評估非模式物種次世代轉錄體組裝之方法	zh_TW
dc.title	Evaluations of De novo Transcriptome Assembly Methods on Next-Generation Sequencing Data	en
dc.type	Thesis
dc.date.schoolyear	102-1
dc.description.degree	碩士
dc.contributor.coadvisor	林詩舜(Shih-Shun Lin)
dc.contributor.oralexamcommittee	蔡政安(Cheng-An Tsai)
dc.subject.keyword	次世代定序技術,次世代轉錄體定序,無參考序列轉錄體組裝,	zh_TW
dc.subject.keyword	NGS,RNA-Seq,de novo transcriptome assembly,	en
dc.relation.page	62
dc.rights.note	有償授權
dc.date.accepted	2014-01-21
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-103-1.pdf 未授權公開取用	3.07 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。