建構全面性細菌基因體全新組裝之線上系統

Yi-Fang Lee; 李沂芳

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19968

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊曜宇(Eric Y. Chuang)
dc.contributor.author	Yi-Fang Lee	en
dc.contributor.author	李沂芳	zh_TW
dc.date.accessioned	2021-06-08T02:38:12Z	-
dc.date.copyright	2018-07-23
dc.date.issued	2018
dc.date.submitted	2018-07-20
dc.identifier.citation	1. Shendure J, Ji H. Next-generation DNA sequencing. Nature biotechnology. 2008;26(10):1135-45. 2. Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR, Ahn T-H, et al. Insights from 20 years of bacterial genome sequencing. Functional & integrative genomics. 2015;15(2):141-61. 3. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804. 4. Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Verezemska O, Isbandi M, et al. Genomes OnLine Database (GOLD) v. 6: data updates and feature enhancements. Nucleic acids research. 2016:gkw992. 5. McCutcheon JP, McDonald BR, Moran NA. Origin of an alternative genetic code in the extremely small and GC–rich genome of a bacterial symbiont. PLoS genetics. 2009;5(7):e1000565. 6. Han K, Li Z-f, Peng R, Zhu L-p, Zhou T, Wang L-g, et al. Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu. Scientific reports. 2013;3:2101. 7. Kuo C-H, Moran NA, Ochman H. The consequences of genetic drift for bacterial genome complexity. Genome research. 2009;19(8):1450-4. 8. Hou Y, Lin S. Distinct gene number-genome size relationships for eukaryotes and non-eukaryotes: gene content estimation for dinoflagellate genomes. PLoS One. 2009;4(9):e6978. 9. Metzker ML. Sequencing technologies—the next generation. Nature reviews genetics. 2010;11(1):31. 10. Brakmann S. Single-molecule analysis: a ribosome in action. Nature. 2010;464(7291):987. 11. Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nature Reviews Genetics. 2016;17(3):175. 12. Koren S, Schatz MC, Walenz BP, Martin J, Howard JT, Ganapathy G, et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature biotechnology. 2012;30(7):693. 13. Ekblom R, Wolf JB. A field guide to whole‐genome sequencing, assembly and annotation. Evolutionary applications. 2014;7(9):1026-42. 14. Treangen TJ, Salzberg SL. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics. 2012;13(1):36. 15. Nagarajan N, Pop M. Sequence assembly demystified. Nature Reviews Genetics. 2013;14(3):157. 16. Lischer HE, Shimizu KK. Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC bioinformatics. 2017;18(1):474. 17. Handelsman J. Metagenomics: application of genomics to uncultured microorganisms. Microbiology and molecular biology reviews. 2004;68(4):669-85. 18. Riesenfeld CS, Schloss PD, Handelsman J. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004;38:525-52. 19. Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proceedings of the National Academy of Sciences. 1985;82(20):6955-9. 20. Segovia L, Young JPW, Martínez-Romero E. Reclassification of American Rhizobium leguminosarum biovar phaseoli type I strains as Rhizobium etli sp. nov. International Journal of Systematic and Evolutionary Microbiology. 1993;43(2):374-7. 21. Ranjan R, Rani A, Metwally A, McGee HS, Perkins DL. Analysis of the microbiome: advantages of whole genome shotgun versus 16S amplicon sequencing. Biochemical and biophysical research communications. 2016;469(4):967-77. 22. Leipzig J. A review of bioinformatic pipeline frameworks. Briefings in bioinformatics. 2017;18(3):530-6. 23. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20(17):3045-54. 24. Goodstadt L. Ruffus: a lightweight Python library for computational pipelines. Bioinformatics. 2010;26(21):2778-9. 25. Goecks J, Nekrutenko A, Taylor J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology. 2010;11(8):R86. 26. Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520-2. doi: 10.1093/bioinformatics/bts480. 27. Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010. 28. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32(19):3047-8. 29. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114-20. 30. Coil D, Jospin G, Darling AE. A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data. Bioinformatics. 2014;31(4):587-9. 31. Kolmogorov M, Raney B, Paten B, Pham S. Ragout—a reference-assisted assembly tool for bacterial genomes. Bioinformatics. 2014;30(12):i302-i9. 32. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29(8):1072-5. 33. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-2. 34. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9(4):357-9. 35. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome biology. 2004;5(2):R12. 36. Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic acids research. 2005;33(suppl_2):W465-W7. 37. Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic acids research. 2001;29(12):2607-18. 38. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of molecular biology. 1990;215(3):403-10. 39. Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30(9):1236-40. 40. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic acids research. 2015;44(D1):D457-D62. 41. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nature genetics. 2000;25(1):25. 42. Consortium GO. Expansion of the Gene Ontology knowledgebase and resources. Nucleic acids research. 2016;45(D1):D331-D8. 43. Tang H, Klopfenstein D, Pedersen B, Flick P, Sato K, Ramirez F, et al. GOATOOLS: tools for gene ontology. Zenodo doi. 2015;10. 44. Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome research. 2003;13(9):2178-89. 45. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004;32(5):1792-7. 46. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312-3. 47. Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Molecular biology and evolution. 2016;33(6):1635-8. 48. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome biology. 2014;15(3):R46. 49. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature methods. 2015;12(10):902. 50. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, Nielsen PH. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nature biotechnology. 2013;31(6):533. 51. Asnicar F, Weingart G, Tickle TL, Huttenhower C, Segata N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ. 2015;3:e1029. 52. Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic acids research. 2010;38(12):e132-e. 53. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150-2. 54. Roberts A, Pachter L. Streaming fragment assignment for real-time analysis of sequencing experiments. Nature methods. 2013;10(1):71. 55. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 2009;10(3):R25. 56. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Molecular biology and evolution. 2017;34(8):2115-22. 57. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F, et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature. 2012;490(7418):55. 58. Eddy SR. Accelerated profile HMM searches. PLoS computational biology. 2011;7(10):e1002195. 59. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic acids research. 2015;44(D1):D279-D85. 60. Antonin Foller (Motobit Software). Upload limits for Internet Eplorer, Mozilla firefox, Google Chrome, Opera, IIS and ASP. Article 1996-2011 [cited 2018 25 Jun]. Available from: https://www.motobit.com/help/ScptUtl/pa98.htm. 61. Sequence Read Archive (SRA). Construction Protocol Details(SRR394603) Metadata 2012 [cited 2018 28, Jun]. Available from: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR394603. 62. Lalucat J, Bennasar A, Bosch R, García-Valdés E, Palleroni NJ. Biology of Pseudomonas stutzeri. Microbiology and Molecular Biology Reviews. 2006;70(2):510-47. 63. Thomsen MCF, Ahrenfeldt J, Cisneros JLB, Jurtz V, Larsen MV, Hasman H, et al. A bacterial analysis platform: an integrated system for analysing bacterial whole genome sequencing data for clinical diagnostics and surveillance. PloS one. 2016;11(6):e0157718. 64. Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harbor Protocols. 2010;2010(1):pdb. prot5368. 65. Caboche S, Even G, Loywick A, Audebert C, Hot D. MICRA: an automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data. Genome biology. 2017;18(1):233. 66. Wattam AR, Abraham D, Dalay O, Disz TL, Driscoll T, Gabbard JL, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids research. 2013;42(D1):D581-D91. 67. Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, et al. Complete genome sequence of enterohemorrhagic Eschelichia coli O157: H7 and genomic comparison with a laboratory strain K-12. DNA research. 2001;8(1):11-22. 68. van den Beld M, Reubsaet F. Differentiation between Shigella, enteroinvasive Escherichia coli (EIEC) and noninvasive Escherichia coli. European journal of clinical microbiology & infectious diseases. 2012;31(6):899-904. 69. Meier-Kolthoff JP, Hahnke RL, Petersen J, Scheuner C, Michael V, Fiebig A, et al. Complete genome sequence of DSM 30083 T, the type strain (U5/41 T) of Escherichia coli, and a proposal for delineating subspecies in microbial taxonomy. Standards in genomic sciences. 2014;9(1):2. 70. Saxena R, Dhakan DB, Mittal P, Waiker P, Chowdhury A, Ghatak A, et al. Metagenomic analysis of hot springs in Central India reveals hydrocarbon degrading thermophiles and pathways essential for survival in extreme environments. Frontiers in microbiology. 2017;7:2123. 71. Völkl P, Huber R, Drobner E, Rachel R, Burggraf S, Trincone A, et al. Pyrobaculum aerophilum sp. nov., a novel nitrate-reducing hyperthermophilic archaeum. Applied and Environmental Microbiology. 1993;59(9):2918-26. 72. Patel BKC, Morgan HW, Daniel RM. Fervidobacterium nodosum gen. nov. and spec. nov., a new chemoorganotrophic, caldoactive, anaerobic bacterium. Archives of Microbiology. 1985;141(1):63-9. doi: 10.1007/bf00446741. 73. Wagner M, Erhart R, Manz W, Amann R, Lemmer H, Wedi D, et al. Development of an rRNA-targeted oligonucleotide probe specific for the genus Acinetobacter and its application for in situ monitoring in activated sludge. Applied and Environmental Microbiology. 1994;60(3):792-800. 74. Fitz-Gibbon ST, Ladner H, Kim U-J, Stetter KO, Simon MI, Miller JH. Genome sequence of the hyperthermophilic crenarchaeon Pyrobaculum aerophilum. Proceedings of the National Academy of Sciences. 2002;99(2):984-9. 75. Bilgin H, Sarmis A, Tigen E, Soyletir G, Mulazimoglu L. Delftia acidovorans: a rare pathogen in immunocompetent and immunocompromised patients. Canadian Journal of Infectious Diseases and Medical Microbiology. 2015;26(5):277-9. 76. Eisenstein M. Oxford Nanopore announcement sets sequencing sector abuzz. Nature Publishing Group; 2012. 77. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18. 78. Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics. 2013;29(21):2669-77. 79. Chu T-C, Lu C-H, Liu T, Lee GC, Li W-H, Shih AC-C. Assembler for de novo assembly of large genomes. Proceedings of the National Academy of Sciences. 2013:201314090. 80. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome research. 2009:gr. 089532.108.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19968	-
dc.description.abstract	現今隨著次世代定序技術之成本顯著下降，建構未知細菌物種的完整基因體或是從環境樣本中提取大量微生物資訊對於研究者而言更為容易，在細菌的定序資料不斷增加累積之下，細菌基因體的相關研究重要性不斷上升，更逐漸成為生物資訊的熱門研究主題。完整基因體的組裝可以有助於了解生物之相關功能和訊息傳輸方式，而如今也有相當多的生物資訊工具和演算法被開發來研究原核生物基因序列，不過如何有效的組合各第三方軟體來創建有效的分析流程是為一大挑戰，尤其複雜的參數設定和以指令列為基礎的軟體操作更成為了研究者分析的阻礙。為了有效解決上列問題，我們以創建對使用者友善的線上分析系統為目標，旨在提供有效而易於操作的細菌全新基因體組裝和宏基因體學資料分析流程，並以Illumina公司平台產出之次世代定序資料作為輸入，在系統中直接提供包括序列品質評估、基因體組裝、基因預測以及基因功能性分析等功能。整體而言，此平台將可大幅節省研究者進行細菌基因體全新組裝或是分析環境樣本的時間與精力，有助於使用者專注深入瞭解目標細菌之致病性或是微生物之生態組成。	zh_TW
dc.description.abstract	Nowadays, the substantial reduction of experimental cost in the next-generation sequencing techniques makes it feasible to assemble a de novo bacterial genome of unknown species and acquire plenty of genetic information from environmental samples. With the explosive accumulation of bacterial sequencing data, the research focusing on bacterial genome has become more and more important and popular. The development of the whole genome can help to elucidate biological functions and signaling pathways. Many bioinformatics tools and algorithms have been developed to study prokaryotic genome; however, how to efficiently construct pipelines to integrate all the data poses a major challenge. Complex parameters settings and the command line-based packages cause a great entry barrier for researchers. To address these issues, this study aims to develop an online analytical system with a user-friendly interface to support both de novo assembling and metagenomic analysis pipelines. Multiple analytical steps, including reads cleaning, genome assembly, gene prediction, and functional annotation can be directly performed on the system. In conclusion, this analytical system can greatly reduce the time and efforts for assembling a de novo bacterial genome and analyzing metagenomic samples, which can improve the understanding of the etiology in targeting bacterial species and the ecology of microorganisms.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T02:38:12Z (GMT). No. of bitstreams: 1 ntu-107-R05945010-1.pdf: 2663313 bytes, checksum: 1ea0db887f3f98c456b3c0c8305ae60e (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	Chapter 1. Introduction and Background 1 1.1 Motivation 1 1.2 Specific Aims 2 1.3 Background 4 1.3.1 Bacterial Genome 4 1.3.2 Next-Generation Sequencing 5 1.3.3 De Novo Assembly 7 1.3.4 Metagenomics 8 1.3.5 Bioinformatics Pipeline Framework 9 Chapter 2. Materials and Methods 11 2.1 System Implementation 11 2.2 Reference-Guided and Non-Reference Guided De Novo Assembly 11 2.2.1 Quality Control and De Novo Assembly 12 2.2.2 Gene Prediction Model Comparison and Assessment 14 2.2.3 Functional Analysis, and Phylogenetic Tree 15 2.3 Metagenomic Analysis Pipeline 17 2.3.1 Quality Control, Contamination Removal and Assembly 17 2.3.2 Taxonomic Abundance Counting 17 2.3.3 Gene Prediction, Clustering, and Abundance 18 2.3.4 Functional Annotation, Abundance, and Domain Mapping 18 Chapter 3. Results 21 3.1 Website Interface 21 3.1.1 Parameter Settings and Task Submission 21 3.1.2 Job Queueing and Status Monitoring 22 3.1.3 Result Pages 23 3.2 Resource Usage 24 3.3 Example I: De Novo Assembly of E. coli EC4437 25 3.3.1 Analysis Result 25 3.4 Example II: Metagenomic Analysis of Hot Spring Samples 27 3.4.1 Dataset 27 3.4.2 Analysis Result 28 Chapter 4. Discussion 31 4.1 Performance evaluation 31 4.2 System Feature Comparison 32 4.3 Example I: De Novo Assembly of E. coli EC4437 33 4.4 Example II: Metagenomic Analysis of Hot Spring Samples 35 4.5 Limitation and Future work 37 Chapter 5. Conclusions 40 Figures 42 Tables 54 References 62
dc.language.iso	en
dc.title	建構全面性細菌基因體全新組裝之線上系統	zh_TW
dc.title	Development of a Comprehensive Online System for Bacterial De Novo Genome Assembly	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.coadvisor	盧子彬(Tzu-Pin Lu)
dc.contributor.oralexamcommittee	蔡孟勳(Mong-Hsun Tsai),賴亮全(Liang-Chuan Lai),倪衍玄(Yen-Hsuan Ni)
dc.subject.keyword	細菌,全新組裝,宏基因體學,使用者友善,線上系統,	zh_TW
dc.subject.keyword	bacteria,de novo assembly,metagenomics,user-friendly,online system,	en
dc.relation.page	67
dc.identifier.doi	10.6342/NTU201801747
dc.rights.note	未授權
dc.date.accepted	2018-07-23
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	2.6 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。