請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9563完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 高成炎(Cheng-Yen Kao) | |
| dc.contributor.author | Yu-Jung Chang | en |
| dc.contributor.author | 張育榮 | zh_TW |
| dc.date.accessioned | 2021-05-20T20:28:51Z | - |
| dc.date.available | 2009-08-04 | |
| dc.date.available | 2021-05-20T20:28:51Z | - |
| dc.date.copyright | 2008-08-04 | |
| dc.date.issued | 2008 | |
| dc.date.submitted | 2008-08-01 | |
| dc.identifier.citation | Abouelhoda,M.I., Kurtz,S. and Ohlebusch,E. (2004) Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms. 2, 53-86.
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403-410. Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389-3402. Batzoglou,S. et al. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res., 10, 950-958. Batzoglou,S. (2005) The many faces of sequence alignment. Brief. Bioinformatics, 6, 6-22. Bray,N. et al. (2003) AVID: a global alignment program. Genome Res., 13, 97-102. Brejova,B., Brown,D.G., and Vinar,T. (2004) Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comput. Biol., 1, 595-610. Brown,D.G., Li,M. and Ma,B. (2004) A tutorial of recent developments in the seeding of local alignment. J. Bioinf. and Comput. Biol., 2, 819-842. Brudno,M. and Morgenstern,B. (2002) Fast and sensitive alignment of large genomic sequences. Proc. IEEE Computer Society Bioinformatics Conference (CSB). Buhler,J. (2001) Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17, 419-428. Burrows,M. and Wheeler,D.J. (1994) A block-sorting lossless data compression algorithm. Research Report 124, Digital Systems Research Center. Chain,P. et al. (2003) An application-focused review of comparative genomics tools: capabilities, limitations and future challenges. Brief. Bioinformatics, 4, 105-123. Chen,L.Y., Lu,S.H., Shih,E.S. and Hwang M.J. (2002) Single nucleotide polymorphism mapping using genome-wide unique sequences. Genome Res., 12, 1106-1111. Choi,K.P., Zeng,F. and Zhang,L. (2004) Good spaced seeds for homology search. Bioinformatics, 20, 1053 - 1059. Clamp,M. et al. (2003) Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res., 31, 38-42. Darling,A.C.E. et al. (2004a) GRIL: genome rearrangement and inversion locator. Bioinformatics, 20, 122-124. Darling,A.C.E. et al. (2004b) Mauve: multiple alignment of conserved genomic sequence with rearrangement. Genome Res., 14, 1394-1403. Delcher,A,L., Kasif,S., Fleischmann,R.D., Peterson,J., White,O. and Salzberg,S.L. (1999) Alignment of whole genomes. Nucleic Acids Res., 27, 2369-2376. Delcher,A,L., Phillippy,A., Carlton,J. and Salzberg,S.L. (2002) Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res., 30, 2478-2483. Deway,C.N. and Pachter,L. (2006) Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Human Molecular Genet., 15, R51-R56. Ehrlich,J., Sankoff,D. and Nadeau,J.H. (1997) Synteny conservation and chromosome rearrangements during mammalian evolution. Genetics, 147, 289-296. Frazer,K.A. et al. (2003) Cross-species sequence comparisons: a review of methods and available resources. Genome Res., 13, 1-12. Fawcett,T. (2004). ROC Graphs: Notes and practical considerations for researchers. Technical report, Palo Alto, USA: HP Laboratories. Fitch,W.M. (2000) Homology a personal view on some of the problems. Trends Genet., 16, 227-231. Gregory,S.G. et al. (2002) A physical map of the mouse genome. Nature, 418, 743–750. Gusfield,G. (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, New York. Hedges,S.B. (2002) The origin and evolution of model organisms. Nature Rev. Genet., 3, 838-849. Hubbard,T.J. et al. (2002) The Ensembl genome database project. Nucleic Acids Res., 30, 38-41. Hubbard,T.J. et al. (2007) Ensembl 2007. Nucleic Acids Res.; 35, D610-D617. Höhl,M., Kurtz,S. and Ohlebusch,E. (2002) Efficient multiple genome alignment, Bioinformatics, 18, S312-S320. Jaillon,O. et al. (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature, 431, 946–957. Kasai,T. et al. (2001) Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. Annu. Symp. on Combinatorial Pattern Matching, Lecture Notes in Comput. Sci., 2089, Springer, 181-192. Kent,W.J. and Zahler,A.M. (2000) Conservation, regulation, synteny, and introns in large-scale C. briggsae-C. elegans genomic alignment. Genome Res., 10, 1115-1125. Kent,W.J. (2002) BLAT: the BLAST-like alignment tool. Genome Res., 12, 656-664. Koonin,E.V. (2005) Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet., 39, 309-338. Kurtz,S. (1999) Reducing the Space Requirement of Suffix Trees. Software Pract. Exper, 29, 1149-1171. Kurtz,S. et al. (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res., 29, 4633-4642. Kurtz,S. and Lonardi,S. (2004) Computational biology, handbook on data structures and applications. Mehta,D.P. and Sahni,S. (editors), Chapman and Hall/CRC computer and information science series 2004. Kurtz,S. et al. (2004) Versatile and open software for comparing large genomes. Genome Biology, 5: R12. Lefebvre, A., Lecroq, T., Dauchel, H. and Alexandre, J. (2003) FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics, 19 (3): 319-326. Li,M., Ma,B., Kisman,D. and Tromp,J. (2004) PatternHumter II: Highly sensitive and fast homology search, J. Bioinf. and Comput. Biol., 2, 417-439. Liao,B.Y., Chang,Y.J., Ho,C.M. and Hwang,M.J. (2004) The UniMarker (UM) method for synteny mapping of large genomes. Bioinformatics, 20, 3156-3165. Lipman,D.J. and Pearson,W.R. (1985) Rapid and sensitive protein similarity searches. Science, 227, 1435-1441. Ma,B., Tromp,J. and Li,M. (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics, 18, 440-445. Manber,U. and Myers,E. (1993) Suffix arrays: a new method for on-line string matches. SIAM J. Comput., 22, 935-948. Nadeau,J.H. and Sankoff,D. (1998) Counting on comparative maps. Trends Genet., 14,495-501. Ning,Z., Cox,A.J. and Mullikin,J.C. (2001) SSAHA: a fast search method for large DNA databases. Genome Res., 11, 1725-1729. Pevzner,P. and Tesler,G. (2003) Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc. Natl. Acad. Sci. USA., 100, 7672-7677. Schwartz,S. et al. (2003) Human-mouse alignments with BLASTZ. Genome Res., 13, 103-107. Schuler,G.D. (1997) Sequence mapping by electronic PCR. Genome Res., 7, 541-550. Shih,A. and Li,W.H. (2003) GS-aligner: a novel tool for aligning genomic sequences using bit-level operations. Mol. Biol. Evol., 20, 1299–1309. Tatusov,R.L. et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4, 41. Theissen,G. (2002) Orthology: secret life of genes. Nature, 415, 741. Ureta-Vidal,A., Ettwiller,L. and Birney,E. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nature Rev. Genet., 4, 251-262. Vinga,S. and Almeida,J. (2003) Alignment-free sequence comparison - a review. Bioinformatics, 17, 391-397. Waterston,R.H. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562. Wilson, M.D. et al. (2001) Comparative analysis of the gene-dense ACHE/TFR2 region on human chromosome 7q22 with the orthologous region on mouse chromosome 5. Nucleic Acids Res., 29, 1352–1365. Zhang,Z., Schwartz,S., Wagner,L. and Miller,W. (2000) A greedy algorithm for aligning DNA sequences. J. Comput. Biol., 7, 203-214. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9563 | - |
| dc.description.abstract | 尋找與回溯不同生物基因體間在演化上之共同來源區段(稱之為演化同源與同線圖譜對映,synteny and orthology mapping),是比較基因體學中基礎的工作。隨著定序技術的進展,愈來愈多的大型基因體序列已經定序完成或近乎完成。這一方面使得以全基因體比對進行演化同源與同線圖譜對映顯得日益重要,另一方面也帶來了新的研究挑戰。面對為數眾多、隨時間分歧演化且動輒數十億萬鹼基對的基因體序列比對,我們要如何建立具備高靈敏度、高特異度以及高效率的比對引擎與方法是其中核心的研究課題。
我們首先針對近距大型基因體間同源與同線圖譜對映,發展出UniMarker方法。以人與小鼠比對為例,此方法採用長度16且在這兩個基因體都只出現一次的短序列來建立出次數頻譜,以偵測尋找同源與同線的基因體區段。實驗結果顯示,人與小鼠(基因體長度均為約三十億萬鹼基對)的基因體同源與同線對映只需數小時於一台個人電腦即能完成,同時其產出之圖譜與小鼠基因體定序協會(MGSC)之圖譜有99%的一致。 接著,針對非近距大型基因體間同源與同線圖譜對映,我們提出新型態的種子詞彙(seed),稱為maximal α-marker pairs(簡稱α-pairs),α代表該種子詞彙在兩個欲比對序列上之總出現次數的上限,這種選取方式有別於常見以限制種子詞彙長度而不考慮詞頻的選取方式,例如:採用固定長度的k-mer與設定長度下限的MEM方法。奠基於增強式後綴陣列(enhanced suffix arrays),我們提出了一個線性演算法來產生所有的α-pairs。根據人比對小鼠、雞與河豚的實驗結果,上述α-marker方法較之限制長度的方法(k-mer, MEM)在連續性匹配(contiguous matching)的同源種子詞彙選取(orthology seeding)上,能同時達成明顯較佳的靈敏度與較佳的效率。此外,我們更延伸此詞頻探索方法到非連續性匹配(discontiguous matching)的同源種子詞彙選取。從ROC曲線上的比較結果顯示,非連續性的wobble α-pairs明顯優於其他未限制詞頻之非連續性種子詞彙(spaced k-mer seeds)。 | zh_TW |
| dc.description.abstract | Motivation: Orthology/synteny mapping—finding orthologous regions among genomes and organizing these evolutionary counterparts into a coherent global picture—is fundamental to studies of comparative genomics. With the increasing number of completely sequenced genomes and thus the increase in comparisons of massive nucleotide sequences, the need for orthology/synteny mapping methods of high sensitivity/specificity and high efficiency becomes even more compelling.
Results: First we have developed the UniMarker (UM) method for synteny mapping of large genomes that are closely related, such as the human and mouse. In this method, the occurrence spectra of genome-wide unique 16mer sequences present in both the human and mouse genome are used to directly detected orthologous genomic segments. Being sequence alignment-free, the UM method is very fast and the high-quality human-mouse synteny maps based on DNA comparisons can be completed in a few hours on single desktop computer. Second, we propose a new type of DNA sequence seed for use in orthology mapping of not closely related genomes. We call our seeds α-pairs, where α is an integer equal to or greater than the number of times any qualifying seed can be found in the compared genomes. These copy number-based seeds are thus distinct from the well-known length-based seeds, such as the fixed-length k-mer seeds or the maximal exact match (MEM) seeds which have a length no less than k. We present a linear time algorithm to efficiently retrieve α-pairs in two given genomic sequences based on enhanced suffix arrays. A comparison of the results using α-pairs with those using length-based seeds for their ability to detect the orthologues annotated by Ensembl and COG for several vertebrate genomes/chromosomes and for prokaryote genomes of long evolutionary distances suggested that orthology seeding using copy number can achieve a higher sensitivity and better efficiency than orthology seeding using length. Moreover, we extend the α-pair method to generate discontiguous wobble seeds of maximal length with copy number constraints. The comparative results of ROC curves for human chr.15 vs. mouse chr.7, chicken chr.10, and pufferfish genome showed that the discontiguous wobble α-pairs achieved significantly better performances than spaced k-mer seeding methods tested. | en |
| dc.description.provenance | Made available in DSpace on 2021-05-20T20:28:51Z (GMT). No. of bitstreams: 1 ntu-97-D90922014-1.pdf: 2051928 bytes, checksum: f8b71a7848fde2445ba5b20ae87200c0 (MD5) Previous issue date: 2008 | en |
| dc.description.tableofcontents | 1 Introduction 1
1.1 Motivation 1 1.2 Dissertation organization 2 2 Background 3 2.1 Homology and synteny 3 2.1.1 Homology 3 2.1.2 Synteny 4 2.2 Index-based sequence comparison 6 3 The UniMarker method for synteny mapping 9 3.1 Introduction 9 3.2 Methods 12 3.2.1 pUMp vs. hUMp 12 3.2.2 Occurrence spectra of UMps and anchoring islands 13 3.2.3 Overlapped anchoring islands 16 3.2.4 Bidirectional mapping 18 3.2.5 Conserved segments and syntenic blocks 19 3.2.6 Comparison with other maps 19 3.2.7 BLASTZ evaluation 20 3.2.8 Software 21 3.3 Results 22 3.3.1 Maps from various versions of the human genome 22 3.3.2 Comparison with maps produced by MGSC and Ensembl 24 3.3.3 Evaluation with sequence alignment 28 3.3.4 Evaluation with LIS analysis of UMps 31 4 Copy number-based orthology seeding using contiguous matches 33 4.1 Introduction 33 4.2 Methods 37 4.2.1 α-markers and α-pairs 37 4.2.2 A linear time α-pair retrieval algorithm 39 4.2.3 Evaluation of orthology seeding 42 4.2.4 Datasets and software 44 4.3 Results 45 4.3.1 α-pairs vs. MEM or k-mer in vertebrate sequences 45 4.3.2 α-pairs vs. MEM or k-mer in prokaryote sequences 50 4.3.3 α-pairs vs. MUM or MAM 54 4.3.4 The number of α-pairs increases linearly with α 56 5 Extending α-markers/α-pairs to discontiguous seeding models 59 5.1 Introduction 59 5.2 Methods 60 5.2.1 Discontiguous α-markers and α-pairs 60 5.2.2 Evaluation of orthology seeding 64 5.3 Results 65 5.3.1 Comparisons of ROC curves for wobble-aware α-pairs/MEMs, spaced k-mer seeds and exact α-pairs/MEMs 67 5.3.2 Comparisons of colinear identities vs. total number of seeds for wobble-aware α-pairs/MEMs, spaced k-mer seeds and exact α-pairs/MEMs 75 6 Discussion and conclusions 80 6.1 Discussion 80 6.2 Conclusions 82 Bibliography 84 A List of Publications 89 | |
| dc.language.iso | en | |
| dc.title | 詞頻探索方法用於高效率之基因體同源與同線圖譜對映 | zh_TW |
| dc.title | Copy Number-Based Seeding Approaches to Efficient Orthology and Synteny Mapping in Genome Comparisons | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 96-2 | |
| dc.description.degree | 博士 | |
| dc.contributor.coadvisor | 何建明(Jan-Ming Ho) | |
| dc.contributor.oralexamcommittee | 黃明經(Ming-Jing Hwang),趙坤茂(Kun-Mao Chao),呂學一(Hsueh-I Lu),施純傑(Arthur Chun-Chieh Shih) | |
| dc.subject.keyword | 比較基因體學,演化同線對映,演化同源對映,序列比對,後綴陣列, | zh_TW |
| dc.subject.keyword | comparative genomics,synteny mapping,orthology mapping,sequence alignment,seeding,suffix array, | en |
| dc.relation.page | 89 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2008-08-01 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-97-1.pdf | 2 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
