Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 生物資源暨農學院
  3. 生物機電工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54860
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor陳倩瑜
dc.contributor.authorChing Changen
dc.contributor.author張勤zh_TW
dc.date.accessioned2021-06-16T03:40:06Z-
dc.date.available2020-03-16
dc.date.copyright2015-03-16
dc.date.issued2015
dc.date.submitted2015-02-15
dc.identifier.citationAltschul, S. F., W. Gish, W. Miller, E. W. Myers and D. J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215(3): 403-410. DOI: http://dx.doi.org/10.1016/S0022-2836(05)80360-2
Andrews, S. 2010. FastQC: a quality control tool for high throughput sequence data. Retrieved June, 2014. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.
Bentley, D. R. 2006. Whole-genome re-sequencing. Current Opinion in Genetics & Development 16(6): 545-552. DOI: 10.1016/j.gde.2006.10.009
Bostock, M. D3: Data-Driven Documents. Available from: http://d3js.org/.
Bostock, M., V. Ogievetsky and J. Heer. 2011. D3: Data-Driven Documents. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis).
De Bruijn, N. G. 1946. A combinatorical problem. Koninklijke Nederlandse Akademie v. Wetenschappen 46: 758-764.
Dhillon, M., R. Singh, J. Naresh and H. Sharma. 2005. The melon fruit fly, Bactrocera cucurbitae: A review of its biology and management. Journal of Insect Science 5.
Gibbons, J. G., E. M. Janson, C. T. Hittinger, M. Johnston, P. Abbot and A. Rokas. 2009. Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics. Mol Biol Evol 26(12): 2731-2744. DOI: 10.1093/molbev/msp188
Gille, C., W. Birgit and A. Gille. 2014. Sequence alignment visualization in HTML5 without Java. Bioinformatics 30(1): 121-122. DOI: 10.1093/bioinformatics/btt614
Grabherr, M. G., B. J. Haas, M. Yassour, J. Z. Levin, D. A. Thompson, I. Amit, X. Adiconis, L. Fan, R. Raychowdhury, Q. Zeng, Z. Chen, E. Mauceli, N. Hacohen, A. Gnirke, N. Rhind, F. di Palma, B. W. Birren, C. Nusbaum, K. Lindblad-Toh, N. Friedman and A. Regev. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7): 644-652. DOI: 10.1038/nbt.1883
Haas, B. J., A. Papanicolaou, M. Yassour, M. Grabherr, P. D. Blood, J. Bowden, M. B. Couger, D. Eccles, B. Li, M. Lieber, M. D. Macmanes, M. Ott, J. Orvis, N. Pochet, F. Strozzi, N. Weeks, R. Westerman, T. William, C. N. Dewey, R. Henschel, R. D. Leduc, N. Friedman and A. Regev. 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8): 1494-1512. DOI: 10.1038/nprot.2013.084
Hsu, J. C., T. Y. Chien, C. C. Hu, M. J. Chen, W. J. Wu, H. T. Feng, D. S. Haymer and C. Y. Chen. 2012. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome. Plos One 7(8): e40950. DOI: 10.1371/journal.pone.0040950
Hu, C. C. 2012. Study of insecticide resistance in Bactrocera dorsalis by de novo transcriptome assembly. Bio-Industrial Mechatronics Engineering. Taipei, National Taiwan University. Master.
Illumina Inc. 2010. Technical note. De novo assembly using Illumina reads. Available from: http://www.illumina.com/Documents/products/technotes/technote_denovo_assembly_ecoli.pdf.
Kalderimis, A., R. Lyne, D. Butano, S. Contrino, M. Lyne, J. Heimbach, F. Hu, R. Smith, R. Stepan, J. Sullivan and G. Micklem. 2014. InterMine: extensive web services for modern biology. Nucleic Acids Res 42(Web Server issue): 468-472. DOI: 10.1093/nar/gku301
Katoh, K., K. Misawa, K.-i. Kuma and T. Miyataa. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059-3066.
Kuo, T. C.-Y., C.-C. Hu, T.-Y. Chien, M.-J. M. Chen, H.-T. Feng, L.-F. O. Chen, C.-Y. Chen* and J.-C. Hsu*. 2014. Discovery of genes related to formothion resistance in oriental fruit fly (Bactrocera dorsalis) by a constrained functional genomics analysis. Insect Molecular Biology.
Li, R., Y. Li, K. Kristiansen and J. Wang. 2008. SOAP: short oligonucleotide alignment program. Bioinformatics 24(5): 713-714. DOI: 10.1093/bioinformatics/btn025
Luo, R., B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, J. Tang, G. Wu, H. Zhang, Y. Shi, Y. Liu, C. Yu, B. Wang, Y. Lu, C. Han, D. W. Cheung, S.-M. Yiu, S. Peng, Z. Xiaoqian, G. Liu, X. Liao, Y. Li, H. Yang, J. Wang, T.-W. Lam and J. wang. 2012. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(1): 18. DOI: 10.1186/2047-217X-1-18
Lyne, R., R. Smith, K. Rutherford, M. Wakeling, A. Varley, F. Guillier, H. Janssens, W. Ji, P. McLaren, P. North, D. Rana, T. Riley, J. Sullivan, X. Watkins, M. Woodbridge, K. Lilley, S. Russell, M. Ashburner, K. Mizuguchi and G. Micklem. 2007. FlyMine: an integrated database for Drosophila and Anopheles genomics. Genome Biol 8(7): R129. DOI: 10.1186/gb-2007-8-7-r129
Mardis, E. R. 2008. The impact of next-generation sequencing technology on genetics. Trends in Genetics 24(3): 133-141. DOI: 10.1016/j.tig.2007.12.007
Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y.-J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. I. Alenquer, T. P. Jarvie, K. B. Jirage, J.-B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley and J. M. Rothberg. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057): 376-380. DOI: 10.1038/nature03959
Martin, J. A. and Z. Wang. 2011. Next-generation transcriptome assembly. Nat Rev Genet 12(10): 671-682.
Medina, I., F. Salavert, R. Sanchez, A. de Maria, R. Alonso, P. Escobar, M. Bleda and J. Dopazo. 2013. Genome Maps, a new generation genome browser. Nucleic Acids Res 41(W1): W41-W46. DOI: 10.1093/nar/gkt530
Metzker, M. L. 2010. Sequencing technologies - the next generation. Nat Rev Genet 11(1): 31-46. DOI: 10.1038/nrg2626
Misra, S., M. Crosby, C. Mungall, B. Matthews, K. Campbell, P. Hradecky, Y. Huang, J. Kaminker, G. Millburn, S. Prochnik, C. Smith, J. Tupy, E. Whitfield, L. Bayraktaroglu, B. Berman, B. Bettencourt, S. Celniker, A. de Grey, R. Drysdale, N. Harris, J. Richter, S. Russo, A. Schroeder, S. Shu, M. Stapleton, C. Yamada, M. Ashburner, W. Gelbart, G. Rubin and S. Lewis. 2002. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biology 3(12): research0083.0081 - 0083.0022. DOI: 10.1186/gb-2002-3-12-research0083
Pertea, G., X. Huang, F. Liang, V. Antonescu, R. Sultana, S. Karamycheva, Y. Lee, J. White, F. Cheung, B. Parvizi, J. Tsai and J. Quackenbush. 2003. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19(5): 651-652. DOI: 10.1093/bioinformatics/btg034
Pruitt, K. D., T. Tatusova, G. R. Brown and D. R. Maglott. 2012. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40(D1): D130-D135. DOI: 10.1093/nar/gkr1079
Robertson, G., J. Schein, R. Chiu, R. Corbett, M. Field, S. D. Jackman, K. Mungall, S. Lee, H. M. Okada, J. Q. Qian, M. Griffith, A. Raymond, N. Thiessen, T. Cezard, Y. S. Butterfield, R. Newsome, S. K. Chan, R. She, R. Varhol, B. Kamoh, A. L. Prabhu, A. Tam, Y. Zhao, R. A. Moore, M. Hirst, M. A. Marra, S. J. Jones, P. A. Hoodless and I. Birol. 2010. De novo assembly and analysis of RNA-seq data. Nat Methods 7(11): 909-912. DOI: 10.1038/nmeth.1517
Shen, G. M., W. Dou, J. Z. Niu, H. B. Jiang, W. J. Yang, F. X. Jia, F. Hu, L. Cong and J. J. Wang. 2011. Transcriptome analysis of the oriental fruit fly (Bactrocera dorsalis). Plos One 6(12). DOI: 10.1371/journal.pone.0029127
Shendure, J. and H. Ji. 2008. Next-generation DNA sequencing. Nat Biotechnol 26(10): 1135-1145. DOI: 10.1038/nbt1486
Smith, R. N., J. Aleksic, D. Butano, A. Carr, S. Contrino, F. Hu, M. Lyne, R. Lyne, A. Kalderimis, K. Rutherford, R. Stepan, J. Sullivan, M. Wakeling, X. Watkins and G. Micklem. 2012. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics 28(23): 3163-3165. DOI: 10.1093/bioinformatics/bts577
St. Pierre, S. E., L. Ponting, R. Stefancsik, P. McQuilton and t. F. Consortium. 2014. FlyBase 102—advanced approaches to interrogating FlyBase. Nucleic Acids Res 42(D1): D780-D788. DOI: 10.1093/nar/gkt1092
Stephens, A. E. A., D. J. Kriticos and A. Leriche. 2007. The current and future potential geographical distribution of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae). Bulletin of Entomological Research 97(04): 369-378. DOI: doi:10.1017/S0007485307005044
Wang, Z., M. Gerstein and M. Snyder. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1): 57-63.
Wu, C.-H., M.-H. Tsai, C.-C. Ho, C.-Y. Chen and H.-S. Lee. 2013. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration. Bmc Genomics 14: 434. DOI: 10.1186/1471-2164-14-434
Yandell, M. and D. Ence. 2012. A beginner's guide to eukaryotic genome annotation. Nat Rev Genet 13(5): 329-342. DOI: 10.1038/nrg3174
Zerbino, D. R. and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5): 821-829. DOI: 10.1101/gr.074492.107
Zhang, J., R. Chiodini, A. Badr and G. Zhang. 2011. The impact of next-generation sequencing on genomics. J Genet Genomics 38(3): 95-109. DOI: 10.1016/j.jgg.2011.02.003
Zhang, Z., V. B. Bajic, J. Yu, K.-H. Cheung and J. P. Townsend (2011). Data Integration in Bioinformatics: Current Efforts and Challenges, InTech.
Zhao, Q.-Y., Y. Wang, Y.-M. Kong, D. Luo, X. Li and P. Hao. 2011. Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. Bmc Bioinformatics 12(Suppl 14): S2. DOI: 10.1186/1471-2105-12-S14-S2
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/54860-
dc.description.abstract次世代定序技術的進步不僅可以提供了高通量的轉錄體序列資訊,更增進無參考基因組物種相關的研究,利用針對無參考基因組之序列組裝工具,短片段序列可以被組裝成轉錄體序列,為了進一步推測各轉錄體的序列功能性,我們需要註解這些轉錄體基因。針對這些無參考基因組的組裝序列,普遍的方法是使用序列比對工具,例如BLASTx,利用與蛋白質資料庫中的序列相似度比對,可以推測出序列轉譯成蛋白質後可能的功能。此研究分析了東方果實蠅(Bactrocera dorsalis)與瓜實蠅(Bactrocera cucurbitae)的轉錄體序列在模式生物資料庫中的序列比對後所能獲得的基因註解情形,研究顯示只有49 %的東方果實蠅序列 (約為24,800條組裝序列) 可以利用黃果蠅(Drosophila melanogaster)取得基因註解;對瓜實蠅來說,只有46% (約為25,400條組裝序列),此結果揭露了單純使用模式生物來做基因註解有一定的極限,因為待分析物種與模式生物在演化上的關係仍然有一定的差距。然而,若在比對序列相似度時能不只是利用最相似的模式生物,而是利用兩個更鄰近的物種,將可得到更多的註解,東方果實蠅與瓜實蠅屬於同一個屬(genus),因而有高度的相似特性和同源基因,本研究建立了一套分析方法,利用此同屬物種的序列相似特性做為連結,建立無方向性的連通分量圖(connected components),改善基因註解的完整度,另一方面,比較兩物種的組裝結果的統計分析時顯示,瓜實蠅的平均序列組裝長度約為東方果實蠅的兩倍長,此結果暗示比起瓜實蠅,東方果實蠅的組裝序列擁有更多不完整的組裝序列,利用建立連通分量的分析,本研究可提供一套未能組裝成一體,但本質上應該相連接的組裝序列名單,進而改善無參考基因組的序列組裝結果。
進行連通分量的分析時,為了確保序列高度相似的可靠性,本論文採用的序列比對參數之標準為:相似度高於80%、E-value小於10-20、比對到的蛋白質長度大於70個胺基酸,透過此標準獲得7,086個連通分量單元,利用本論文之建議策略,序列因為自身在連通分量中找到另一同屬物種中相關連的序列,透過其序列所擁有的註解,而提供自身潛在的基因註解,在僅利用黃果蠅做序列比對時,那些原本無法利用黃果蠅取得基因註解的序列之中,共有925條東方果實蠅序列、272條瓜實蠅序列可以獲得額外的基因註解。針對改善無參考基因組之序列組裝效果,使用連通分量的分析下建議共有1,919條東方果實蠅序列、71條瓜實蠅序列應該被接得更長,分別轉變為680與52條轉錄體序列;最後,透過資料庫的建立,本研究提供一個連通分量方法的線上分析平台,方便生物學家存取本研究的成果,研究員可以觀察使用連通分量後每個分量中的多重序列比對情形,促進未來的生物實驗設計以及後續應用。
zh_TW
dc.description.abstractThe revolutionary advances of next-generation sequencing technology not only provide high-throughput sequencing data, but also considerably facilitate studies with regard to transcriptome without a reference genome. By means of de novo assembly, assembled transcripts can be retrieved from the sequencing reads. In order to infer the protein function of the assembled sequences, one conventional approach is to utilize the sequence similarity against the protein database by BLASTx. In this study, only 49% (approximately 24,800 sequences) of the assembled Bactrocera dorsalis (B. dorsalis) sequences can be annotated with Drosophila melanogaster (D. melanogaster) genes by BLASTx. For Bactrocera cucurbitae (B. cucurbitae), it is only 46% (approximately 25,400 sequences) of the assembled transcripts which can be annotated with D. melanogaster genes. It reveals an inevitable limitation when the target organism is evolutionarily distant from the model organism.
Compared to the traditional approach, if the process of similarity comparison is not only against the most relative model organism, but also utilizes the assistance of much more closely-relative organism, it can further enhance the completeness of the annotation list. B. cucurbitae and B. dorsalis belong to the same genus, and share a high level of homology to each other. With the procedure of finding connected components (CCs), we can utilize the linkage of the similarity information from these two species for further improvement of annotation. On the other hand, the statistics of the assembly result has shown that the average length of B. cucurbitae assembled sequences is twice longer than that of B. dorsalis, suggesting that the assembly of B. dorsalis may contain much more incompletely assembled transcripts than the assembly of B. cucurbitae. Under the procedure of CCs analysis, we can leverage the CCs to improve the de novo assembly result, by providing a list of transcripts that could have been intrinsically joined together.
A total of 7,086 CCs was obtained by using a strict criteria of the similarity parameters (identity higher than 80%, E-value smaller than 10-20 and alignment length longer than 70 amino acids). With the assistance of the mutually comparison among the sequences with the same Bactrocera genus, it suggested the potential annotation of the transcripts that cannot be provided when the transcripts are only compared with D. melanogaster sequences. For increasing the completeness of the annotation list, there are 925 B. dorsalis sequences and 272 B. cucurbitae sequences that can be additionally annotated with D. melanogaster genes. As well, for further improvement toward de novo assembly, a total of 1,919 B. dorsalis sequences are recommended to be concatenated into 680 longer transcripts. Similarly, a total of 71 B. cucurbitae sequences are suggested to be joined into 52 longer transcripts. Finally, a database was constructed to provide a user-friendly platform for the CC analysis and to assist the biologists retrieving the illustration of the relationship of sequence alignment within CCs.
en
dc.description.provenanceMade available in DSpace on 2021-06-16T03:40:06Z (GMT). No. of bitstreams: 1
ntu-104-R01631006-1.pdf: 4832342 bytes, checksum: e996fff289cdae9e3597e995e88d51ff (MD5)
Previous issue date: 2015
en
dc.description.tableofcontents論文口試委員審定書 i
中文摘要 ii
ABSTRACT iv
TABLE OF CONTENTS vii
LIST OF FIGURES x
LIST OF TABLES xii
CHAPTER 1 INTRODUCTION 1
1.1 Motivation 2
1.2 Objective 2
CHAPTER 2 LITERATURE SURVEY 4
2.1 Next-generation sequencing 4
2.1.1 Next-generation sequencing platforms 4
2.1.2 De novo assembly 5
2.2 Gene annotation 9
2.3 Fruit flies of the genera Bactrocera 9
2.4 Biological data management and web service 10
CHAPTER 3 MATERIALS AND METHODS 13
3.1 Datasets 13
3.2 De novo assembly 13
3.3 Annotation of the transcripts 14
3.4 Connected component analysis 14
3.4.1 Definition of connected components 16
3.4.2 Annotation of the CCs 18
3.5 Improvement of the de novo Assembly 20
3.5.1 Translation of the transcript sequences of CCs 22
3.5.2 Multiple sequence alignment (MSA) 22
3.5.3 Alignment blocks identification 22
3.6 Web service for insect database 24
CHAPTER 4 RESULTS AND DISCUSSION 29
4.1 Statistics of the assembly results 29
4.1.1 Comparison of assembly result of B. cucurbitae and B. dorsalis 33
4.1.2 Annotation of the assembled transcript 34
4.1.3 Parameter tuning for CCs 39
4.1.4 Annotation of the identified CCs 41
4.2 Improvement of the annotation 42
4.3 Improvement of the de novo assembly 46
4.4 Database construction and Web service 48
CHAPTER 5 Conclusions 51
REFERENCE: 53
dc.language.isoen
dc.subject無參考基因組之轉錄體序列組裝zh_TW
dc.subject東方果實蠅zh_TW
dc.subject瓜實蠅zh_TW
dc.subject基因註解zh_TW
dc.subject跨物種分析zh_TW
dc.subject黃果蠅zh_TW
dc.subject連通分量zh_TW
dc.subjectgene annotationen
dc.subjectde novo transcriptome assemblyen
dc.subjectconnected componenten
dc.subjectBactrocera dorsalisen
dc.subjectBactrocera cucurbitaeen
dc.subjectcross-species studyen
dc.subjectDrosophila melanogasteren
dc.title利用同屬物種資訊提升無參考基因組轉錄體序列組裝與功能註解之完整度zh_TW
dc.titleImproving completeness of de novo transcriptome assembly and gene annotation by comparison of species within the same genusen
dc.typeThesis
dc.date.schoolyear103-1
dc.description.degree碩士
dc.contributor.oralexamcommittee蘇中才,許如君,吳君泰
dc.subject.keyword東方果實蠅,瓜實蠅,基因註解,跨物種分析,黃果蠅,連通分量,無參考基因組之轉錄體序列組裝,zh_TW
dc.subject.keywordBactrocera dorsalis,Bactrocera cucurbitae,cross-species study,Drosophila melanogaster,gene annotation,connected component,de novo transcriptome assembly,en
dc.relation.page56
dc.rights.note有償授權
dc.date.accepted2015-02-15
dc.contributor.author-college生物資源暨農學院zh_TW
dc.contributor.author-dept生物產業機電工程學研究所zh_TW
顯示於系所單位:生物機電工程學系

文件中的檔案:
檔案 大小格式 
ntu-104-1.pdf
  未授權公開取用
4.72 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved