BioCloud：線上定序分析平台

Liang-Bo Wang; 王亮博

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/3868

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	莊曜宇(Eric Y. Chuang)
dc.contributor.author	Liang-Bo Wang	en
dc.contributor.author	王亮博	zh_TW
dc.date.accessioned	2021-05-13T08:37:46Z	-
dc.date.available	2016-08-02
dc.date.available	2021-05-13T08:37:46Z	-
dc.date.copyright	2016-08-02
dc.date.issued	2016
dc.date.submitted	2016-07-25
dc.identifier.citation	1. DNAnexus <https://www.dnanexus.com/> (2016). 2. Partek Flow <http://www.partek.com/partekflow> (2016). 3. Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology 11, R86 (2010). 4. Griffith, M. et al. Genome Modeling System: A Knowledge Management Platform for Genomics. PLoS Comput Biol 11, 1–21 (2015). 5. Metzker, M. L. Sequencing technologies —the next generation. Nature Reviews Genetics 11, 31–46 (2010). 6. Van Dijk, E. L., Auger, H., Jaszczyszyn, Y. & Thermes, C. Ten years of next-generation sequencing technology. Trends in Genetics 30, 418–426 (2014). 7. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics 10, 57–63 (2009). 8. Nielsen, R., Paul, J. S., Albrechtsen, A. & Song, Y. S. Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics 12, 443–451 (2011). 9. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). 10. Gawad, C., Koh, W. & Quake, S. R. Single-cell genome sequencing: current state of the science. Nature Reviews Genetics 17, 175–188 (2016). 11. NCBI GRCh38.p7 Assembly <http://www.ncbi.nlm.nih.gov/assembly/GCA_ 000001405.22> (2016). 12. E pluribus unum. Nature Methods 7, 331–331 (2010). 13. Leipzig, J. A review of bioinformatic pipeline frameworks. Briefings in Bioinformatics, bbw020 (2016). 14. bcbio-nextgen <https://bcbio-nextgen.readthedocs.io/> (2016). 15. Guimera, R. V. bcbio-nextgen: Automated, distributed next-gen sequencing pipeline. EMBnet.journal 17, p. 30 (B 2012). 16. Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012). 17. IPython Parallel <http://ipyparallel.readthedocs.io/> (2016). 18. Amstutz, P. et al. Common Workflow Language, draft 3. <https://figshare.com/articles/Common_Workflow_Language_draft_3/3115156> (2016). 19. GNU Make <https://www.gnu.org/software/make/> (2016). 20. Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, btw354 (2016). 21. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech- nology 28, 511–515 (2010). 22. Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology 31, 46–53 (2013). 23. cummeRbund <http://bioconductor.org/packages/cummeRbund/> (2016). 24. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15, 550 (2014). 25. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics (Oxford, England) 30, 923–930 (2014). 26. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with highthroughput sequencing data. Bioinformatics 31, 166–169 (2015). 27. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology 14, R36 (2013). 28. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12, 357–360 (2015). 29. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). 30. Picard <https://broadinstitute.github.io/picard/> (2016). 31. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, England) 25, 2078–2079 (2009). 32. Patro, R., Mount, S. M. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnology 32, 462–464 (2014). 33. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nature Biotechnology 34, 525–527 (2016). 34. Patro, R., Duggal, G. & Kingsford, C. Accurate, fast, and model-aware transcript expression quantification with Salmon. bioRxiv, 021592 (2015). 35. Pimentel, H. J., Bray, N., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-Seq incorporating quantification uncertainty. bioRxiv, 058164 (2016). 36. Alamancos, G. P., Pagès, A., Trincado, J. L., Bellora, N. & Eyras, E. Leveraging transcript quantification for fast computation of alternative splicing profiles. RNA (New York, N.Y.) 21, 1521–1531 (2015). 37. FastQC <http://www.bioinformatics.babraham.ac.uk/projects/fastqc/>. 38. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760 (2009). 39. Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research 22, 568–576 (2012). 40. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Current Protocols in Bioinformatics 43, 11.10.1–33 (2013). 41. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297–1303 (2010). 42. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Research 38, e164–e164 (2010). 43. CommonMark <http://commonmark.org/>. 44. iGenomes <http://support.illumina.com/sequencing/sequencing_software/igenome.html>. 45. YAML Ain’t Markup Language (YAML) Version 1.1 <http://yaml.org/spec/1.1/>. 46. Eswaran, J. et al. Transcriptomic landscape of breast cancers through mRNA sequencing. Scientific Reports 2, 264 (2012). 47. Himes, B. E. et al. RNA-Seq transcriptome profiling identifies CRISPLD2 as a glucocorticoid responsive gene that modulates cytokine function in airway smooth muscle cells. PloS One 9, e99625 (2014). 48. Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nature Methods 10, 1185–1191 (2013). 49. Rapaport, F. et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biology 14, 3158 (2013).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/3868	-
dc.description.abstract	随著次世代定序技術的問世，它已經成為基因體研究中最重要的資料來源之一。與傳統的方法比較，次世代定序能在可行的時間與預算內，提供高通量的序列以及定量生物體活動的能力。然而，在獲取生物上有意義的結果之前，完成一個次世代定序的分析需要一系列命令列下運作工具的參與，以及龐大的計算資源，給予生物學家以及臨床人員很高的分析進入門檻，他們甚而無法解讀自己的序結資料。因此本研究提出了一個線上分析次世代定序平台，BioCloud，它得以自動化常見的定序分析流程，並且根據分析結果產生總覽報表。進一步，使用者能設計自訂的分析流程並擴充現有的流程實作來支援更多種類的定序與分析方法。藉由在 BioCloud 上分析次世代定序，研究者能以更互動式與方便的方式來了解他們的資料，並且讓整個分析更容易地重現。	zh_TW
dc.description.abstract	With the advent of next-generation sequencing (NGS), it has become one of the most important data sources in genome-wide study. Compared with traditional methods, NGS provides high throughput sequencing reads and ability to quantify expression of biological activities in feasible range of time and budget. However, before obtaining biologically meaningful results, a NGS data analysis involves series of command-line tools to process and requires extensive computation resources, which imposes a high barrier for biologists and clinicians to enter NGS analysis and even interpret their own data. Therefore, in this study, an online NGS analysis platform, BioCloud, is proposed to automate common analysis pipelines and generate summary report based on the analysis results. Furthermore, users can design their custom analysis pipelines and extends the existed implementation to support a wider set of NGS sequencing types and analysis methods. By conducting NGS analyses on BioCloud, researchers can understand their data in a more interactive and convenient way and the analyses results can be easily reproducible.	en
dc.description.provenance	Made available in DSpace on 2021-05-13T08:37:46Z (GMT). No. of bitstreams: 1 ntu-105-R02945054-1.pdf: 9539738 bytes, checksum: eda9ffb088cc9fbed0d75c1019c7971a (MD5) Previous issue date: 2016	en
dc.description.tableofcontents	口試委員會審定書.................................. i 誌謝.......................................... ii 摘要.......................................... iii Abstract........................................ iv Contents........................................ v List of Figures..................................... viii List of Tables ..................................... x 1 Introduction.................................... 1 1.1 Motivation.................................. 1 1.2 Specific aims................................ 2 1.3 Next-generation sequencing ........................ 3 1.4 Genome reference.............................. 5 1.5 General NGS analysis workflow ...................... 6 2 Related work ................................... 8 2.1 Commercial online analysis platforms ................... 8 2.2 Open source donline analysis platform................... 9 2.3 Pipeline execution tools........................... 12 2.4 Analysis report generation ......................... 14 3 Methods...................................... 17 3.1 RNA-Seq pipelines ............................. 17 3.2 DNA-Seq pipelines............................. 20 3.3 BioCloud website.............................. 21 3.3.1 Overview.............................. 21 3.3.2 Data integrity check and authentication............... 24 3.3.3 User account management ..................... 26 3.3.4 Data source management...................... 27 3.3.5 Experiment design ......................... 27 3.3.6 Genome reference ......................... 29 3.3.7 Analysis submission ........................ 30 3.3.8 Job queue management....................... 30 3.3.9 Report and result access control .................. 31 3.4 Report generation.............................. 33 3.4.1 Analysis result structure and information............... 33 3.4.2 BCReport: result processing framework............... 35 3.5 Implementation ............................... 36 3.5.1 Website............................... 37 3.5.2 Deployment............................. 38 3.5.3 Report................................ 39 4 Results....................................... 40 4.1 Datasets................................... 40 4.2 Account registration and user dashboard.................. 42 4.3 Data source discovery............................ 45 4.4 Experiment design ............................. 45 4.5 Analysis design............................... 50 4.6 Job queue monitoring............................ 52 4.7 Summary report............................... 54 4.7.1 Quality control ........................... 55 4.7.2 Genome alignment – STAR .................... 56 4.7.3 Cuffdiff............................... 58 4.7.4 Integration with external genome browsers ......... 60 4.8 Admin interface............................... 62 5 Discussions .................................... 64 5.1 Data source uploading ........................... 64 5.2 Supported analyses ............................. 65 5.3 Pipeline extension.............................. 65 5.4 Integration with other frameworks ..................... 67 6 Conclusions.................................... 69 Bibliography ..................................... 71
dc.language.iso	en
dc.subject	線上分析平台	zh_TW
dc.subject	次世代定序	zh_TW
dc.subject	Next-generation sequencing	en
dc.subject	online analysis platform	en
dc.title	BioCloud：線上定序分析平台	zh_TW
dc.title	BioCloud: an online sequencing analysis platform	en
dc.type	Thesis
dc.date.schoolyear	104-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	盧子彬(Tzu-Pin Lu),蔡孟勳(Mong-Hsun Tsai),賴亮全(Liang-Chuan Lai),陳倩瑜(Chien-Yu Chen)
dc.subject.keyword	次世代定序,線上分析平台,	zh_TW
dc.subject.keyword	Next-generation sequencing,online analysis platform,	en
dc.relation.page	75
dc.identifier.doi	10.6342/NTU201601295
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2016-07-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-105-1.pdf	9.32 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。