模擬研究定序覆蓋率對探勘簡單重複序列的影響

Ying-Tsui Wang; 王瀅翠

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64660

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉力瑜(Li-Yu Liu)
dc.contributor.author	Ying-Tsui Wang	en
dc.contributor.author	王瀅翠	zh_TW
dc.date.accessioned	2021-06-16T22:57:02Z	-
dc.date.available	2017-08-15
dc.date.copyright	2012-08-15
dc.date.issued	2012
dc.date.submitted	2012-08-09
dc.identifier.citation	1.Li B, Xia Q, Lu C, Zhou Z, Xiang Z: Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes. Genomics Proteomics Bioinformatics 2004, 2(1):24-31. 2.Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nature Genetics 2002, 30(2):194-200. 3.McCouch SR, Chen X, Panaud O, Temnykh S, Xu Y, Cho YG, Huang N, Ishii T, Blair M: Microsatellite marker development, mapping and applications in rice genetics and breeding. Plant Molecular Biology 1997, 35(1):89-99. 4.Zane L, Bargelloni L, Patarnello T: Strategies for microsatellite isolation: a review. Molecular ecology 2002, 11(1):1-16. 5.Abdelkrim J, Robertson BC, Stanton JAL, Gemmell NJ: Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing. BioTechniques 2009, 46(3):185-192. 6.Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, McCown B, Harbut R, Simon P: Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. American Journal of Botany 2012, 99(2):193-208. 7.Kircher M, Kelso J: High throughput DNA sequencing-concepts and limitations. BioEssays 2010, 32(6):524-536. 8.Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol 2008, 26(10):1135-1145. 9.Metzker ML: Sequencing technologies-the next generation. Nature Reviews Genetics 2009, 11(1):31-46. 10.Mardis ER: The impact of next-generation sequencing technology on genetics. Trends in Genetics 2008, 24(3):133-141. 11.Rasmussen D, Noor M: What can you do with 0.1× genome coverage? A case study based on a genome survey of the scuttle fly Megaselia scalaris (Phoridae). Bmc Genomics 2009, 10(1):382. 12.Wicker T, Narechania A, Sabot F, Stein J, Vu G, Graner A, Ware D, Stein N: Low-pass shotgun sequencing of the barley genome facilitates rapid identification of genes, conserved non-coding sequences and novel repeats. Bmc Genomics 2008, 9(1):518. 13.Waring M, Britten RJ: Nucleotide sequence repetition: a rapidly reassociating fraction of mouse DNA. Science 1966, 154(3750):791. 14.Lysholm F, Andersson B, Persson B: An efficient simulator of 454 data using configurable statistical models. BMC research notes 2011, 4(1):449. 15.Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 1999, 27(2):573. 16.Peterson DG, Wessler SR, Paterson AH: Efficient capture of unique sequences from eukaryotic genomes. Trends in Genetics 2002, 18(11):547-550. 17.Britten RJ, Graham DE, Neufeld BR: Analysis of repeating DNA sequences by reassociation. Methods in enzymology 1974, 29:363-418. 18.Britten R, Ko D: Repeated Sequences in DA'. 1968. 19.Peterson DG, Pearson WR, Stack SM: Characterization of the tomato (Lycopersicon esculentum) genome using in vitro and in situ DNA reassociation. Genome 1998, 41(3):346-356. 20.Pearson W, Davidson E, Britten R: A program for least squares analysis of reassociation and hybridization data. Nucleic Acids Research 1977, 4(6):1727. 21.Paterson AH: Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nature Reviews Genetics 2006, 7(3):174-184. 22.Gupta VS, Gadre S, Ranjekar P: Novel DNA sequence organization in rice genome* 1. Biochimica et Biophysica Acta (BBA)-Nucleic Acids and Protein Synthesis 1981, 656(2):147-154. 23.Smit A, Hubley R, Green P: RepeatMasker Open-3.0. 2004. 24.Mdust [http://compbio.dfci.harvard.edu/tgi/software/]. 25.Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics 2005, 21(suppl 1):i351-i358. 26.McGinnis S, Madden TL: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research 2004, 32(suppl 2):W20-W25. 27.Goldberg RB: DNA sequence organization in the soybean plant. Biochemical genetics 1978, 16(1):45-68.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/64660	-
dc.description.abstract	微衛星序列(microsatellite)，又稱為簡單重複序列(simple sequence repeats, SSRs)，是以1-6個核苷酸為單位，不斷重複之序列，並分佈於各物種的基因體內。因為具有廣泛分佈及高多型性的特徵，簡單重複序列常被設計為分子標誌(molecular marker)應用於各種研究。近年來，二世代定序技術(next generation sequencing technology)的出現與發展，改變了傳統上用來找尋簡單重複序列的方式，二世代定序的高通量及相對低價格，亦提供給科學家們一個新的機會，尋找更多未曾發現的簡單重複序列，然而，在預算有限的先驗實驗中，經費通常只足以進行低倍率(low coverage)之定序，此情形也將在基因體越大的物種中越漸明顯。此研究中，我們將以模擬的方式，在低倍率定序下，探討未解序物種的簡單重複序列個數和定序倍率之相關性。模擬分為兩個步驟，首先，我們以水稻全基因體去建立三個資料庫，再利用三個資料庫中水稻的片段序列(subsequence)去組裝出有興趣物種的模擬基因體，而此模擬基因體與原始物種基因體具有相似的DNA複雜度(complexity)，接著，利用模擬器454sim去模擬在不同定序倍率下454平台的定序結果，並找尋簡單重複序列。結果顯示，簡單重複序列個數隨著定序倍率增高而增加，更重要的事，此方法使我們得以利用模擬的方式，估計未解序物種之簡單重複序列個數，以幫助我們事先做預算的分配。	zh_TW
dc.description.abstract	Microsatellites or simple sequence repeats (SSRs) are tandem repeats distributed across genomes with 1 to 6 nucleotide motifs. Because of their genomic abundance and high level of polymorphism, SSRs is designed as molecular markers to apply in a variety of researches. In recent year, the rapidly-developing next generation sequencing technology (NGST) has impacted the ways of mining SSRs. NGST not only has the advantage of higher speed and lower cost but also offers the opportunities to discover novel SSRs. However, in a pilot study, the budget may be limited and one can only afford a low-coverage sequencing project regarding to the genome of interest. The situation may be more severe when the genome size is large. In this study, we aimed to investigate the relation between the mined SSR counts and the sequencing depth for a genome whose sequence which is not yet available by simulations at low coverage sequencing. The simulation was two-fold. First, we separate whole rice genome to establish three databases. Second, we simulated a genome with approximate complexity by recombining known rice genome subsequences. Moreover, we mimicked 454 sequencing results under different coverage using 454sim and mined SSRs accordingly. The results showed that the number of mined SSRs increased as the sequencing depth increased. More importantly, this procedure provided a mean to estimate the number of mined SSRs without whole genome sequence and hence to assist to set budget in advance.	en
dc.description.provenance	Made available in DSpace on 2021-06-16T22:57:02Z (GMT). No. of bitstreams: 1 ntu-101-R99621205-1.pdf: 8361046 bytes, checksum: 081301d934eb75327f686a986b75c4c4 (MD5) Previous issue date: 2012	en
dc.description.tableofcontents	口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv Chapter 1 INTRODUCTION 1 Chapter 2 MATERIALS AND METHOD 4 2.1 Database establishment 5 2.1.1 The approach to establish three databases 7 2.2 Sequencing data simulation 11 2.2.1 Genome simulation 11 2.2.2 Sequencing simulation 13 2.2.3 SSR mining 14 Chapter 3 RESULT AND DISCUSSION 17 3.1 Database establishment 17 3.2 Sequencing data simulation 18 3.2.1 Similarity between true and simulated genome 18 3.2.2 Simulation with 454sim 19 3.2.3 SSR mining－method 1 21 3.2.4 SSR mining－method 2 23 3.3 Technical issue 25 3.3.1 Two additional programs 25 3.3.2 Five main programs 25 Chapter 4 CONCLUSION 31 4.1 Summary 31 4.2 Future work 33 Reference 34 Appendix 37 A.1 37 A.2 39 A.3 40 A.4 42
dc.language.iso	en
dc.subject	二世代定序技術	zh_TW
dc.subject	簡單重複序列探勘	zh_TW
dc.subject	低倍率定序	zh_TW
dc.subject	模擬	zh_TW
dc.subject	454sim	zh_TW
dc.subject	low coverage sequencing	en
dc.subject	SSR mining	en
dc.subject	simulation	en
dc.subject	454sim	en
dc.subject	next generation sequencing technology	en
dc.title	模擬研究定序覆蓋率對探勘簡單重複序列的影響	zh_TW
dc.title	The Effect of Sequencing Coverage on Mining Simple Sequence Repeats by Simulation	en
dc.type	Thesis
dc.date.schoolyear	100-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	胡凱康(Kae-Kang Hwu),林彥蓉(Yann-Rong Lin)
dc.subject.keyword	簡單重複序列探勘,低倍率定序,模擬,454sim,二世代定序技術,	zh_TW
dc.subject.keyword	SSR mining,low coverage sequencing,simulation,454sim,next generation sequencing technology,	en
dc.relation.page	42
dc.rights.note	有償授權
dc.date.accepted	2012-08-10
dc.contributor.author-college	生物資源暨農學院	zh_TW
dc.contributor.author-dept	農藝學研究所	zh_TW
顯示於系所單位：	農藝學系

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 未授權公開取用	8.17 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。