具序列比對與資料壓縮功能之基因體資料庫系統設計

Yu-Wen Lu; 盧裕文

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44010

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	賴飛羆
dc.contributor.author	Yu-Wen Lu	en
dc.contributor.author	盧裕文	zh_TW
dc.date.accessioned	2021-06-15T02:36:10Z	-
dc.date.available	2014-08-18
dc.date.copyright	2009-08-18
dc.date.issued	2009
dc.date.submitted	2009-08-13
dc.identifier.citation	[1] “DNA sequencing,” http://genomics.org/index.php/DNA_sequencing [2] F. Sanger, G. M. Air, B. G. Barrell, et. al., “Nucleotide sequence of bacteriophage phi X174 DNA,” Nature, vol. 265, no. 5596, pp. 687- 695, 1977. [3] J. Shendure1 and H. Ji, ” Next-generation DNA sequencing,” Nature Biotechnology, vol. 26, no. 10, pp. 1135- 1145, 2008. [4] “Human Genome Project HomePage,” http://hgph.com/index.htm [5] “Human Genome Project Information,” http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml [6] S. McGinnis and T. L. Madden, “BLAST: at the core of a powerful and diverse set of sequence analysis tools,” Nucleic Acids Research, vol. 32, 2004. [7] D. A. Benson, M. S. Boguski, D. J. Lipman, et. al., “GenBank,” Nucleic Acids Research, vol. 27, no.1, pp. 12-17, 1999. [8] T. Hubbard, D. Barker, E. Birney, et. al., “The Ensembl genome database project,” Nucleic Acids Research, vol. 30, no. 1, pp. 38- 41, 2002. [9] Y. Tateno , S. Miyazaki , M. Ota , et. al., “DNA Data Bank of Japan (DDBJ) in collaboration with mass sequencing teams,” Nucleic Acids Research, vol. 28, no. 1, pp. 24- 26, 2000 [10] J. A. Blake, J. E. Richardson, C. J. Bult, et. al., “MGD: the Mouse Genome Database,” Nucleic Acids Reaserch, vol. 31, no. 1, pp. 193-195, 2003. [11] Y. Maruyama1, A. Wakamatsu, Y. Kawamura, et. al., “Human Gene and Protein Database (HGPD): a novel database presenting a large quantity of experiment-based results in human proteomics,” Nucleic Acids Research, vol. 37, 2009. [12] K. D. Pruitt, T. Tatusova and D. R. Maglott, “NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins,” Nucleic Acids Research, vol. 35, 2006. [13] “NCBI Reference Sequence (RefSeq),” http://www.ncbi.nlm.nih.gov/RefSeq/ [14] “BLAST Program Selection Guide,” http://blast.ncbi.nlm.nih.gov/blast/producttable.shtml [15] “MegaBLAST Search,” http://www.ncbi.nlm.nih.gov/blast/megablast.shtml [16] S. F. Altschul, W. Gish, W. Miller, et. al., “Basic local alignment search tool,” Journal of Molecular Biology, vol. 215, no. 3, pp. 403-410, 1990. [17] S. Grumbach and F. Tahi, “Compression of DNA sequences,” Data Compression Conference. IEEE Computer Society Press, pp. 340-350, 1993. [18] X. Chen, S. Kwong and M. Li, “A compression algorithm for DNA sequences and its applications in genome comparison,” The 10th Workshop on Genome Informatics, 1999. [19] X. Chen, M. Li, B. Ma, et. al., “ DNACompress: fast and effective DNA sequence compression,” Bioinformatics, vol. 18, no. 12, pp. 1696- 1698, 2002. [20] B. Ma, J. Tromp and M. Li, “PatternHunter: faster and more sensitive homology search,” Bioinformatics, vol. 18, no. 3, pp. 440- 445, 2002. [21] B. Behzadi and F. L. Fessant, “DNA compression challenge revisited: a dynamic programming approach,” Lecture Notes in Computer Science, vol. 3537, pp. 190-200, 2005. [22] J. Jurka, V. V. Kapitonov, A. Pavlice , et. al., “Repbase Update, a database of eukaryotic repetitive elements,” Cytogentic and Genome Research vol. 110, no. 1- 4. pp. 462-467, 2005 [23] W. R. Pearson and D. J. Lipman, “Improved tools for biological sequence comparison”, PNAS, vol. 85, no. 8, pp. 2444 – 2448, 1988. [24] “SWFUpload,” http://swfupload.org/ [25] “FASTA format description,” http://www.ncbi.nlm.nih.gov/blast/fasta.shtml [26] A. Apostolico, A. Fraenkel, ”Robust transmission of unbounded strings using Fibonacci representations,” IEEE Trans. on Information Theory, vol. 33, no. 2, pp. 238- 245, 1987. [27] P. Elias, “Universal codeword sets and representations of the integers,” IEEE Trans. on Information Theory, vol. 21, no. 2, pp. 194-203, 1975. [28] R. Li, Y. Li., K. Kristiansen, et. al., “SOAP: short oligonucleotide alignment program,” Bioinformatics, vol. 24, no. 5, pp. 713– 714, 2008. [29] Genetic Information Nondiscrimination Act of 2008 (Public Law 110-233).
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/44010	-
dc.description.abstract	隨著人類基因體計畫的初步完成，後基因體時代正逐漸來臨，人類的序列圖譜雖然已被解開，卻不代表已完全了解每個序列片段所扮演的角色，它們的功能仍然需要被進一步地分析與研究。另一方面，因序列資料的快速擴增，資料整理與儲存議題更顯得日趨重要。本篇論文在探討並實作出一套基因體資料庫系統，供使用者儲放與管理序列資料。此系統最大的特色在於它整合了序列比對與資料壓縮的功能。藉由嵌入NCBI提供的程式- blastall，它能將使用者上傳的序列作比對，尋找出該序列在基因體相對應的位置；此外，該系統針對序列之間的差異部份作編碼，有效地將資料壓縮，減少儲存的空間。同時，它提供多樣化的查詢功能。使用者可依檢體編號、GI、序列位置等查詢方式，快速地搜尋到他們所感興趣的序列資料。	zh_TW
dc.description.abstract	With the initial completion of Human Genome Project, the post-genomic era is coming. Although the genome map of human has been decoded, the roles that each segment of sequences acts are not totally discovered. Their actually functions are still needed to be analyzed and researched. On the other hand, with the rapid expansion of sequence information, the issues of data compilation and data storage are increasingly important. In this thesis, a “Human Genome Database System” is designed and implemented in National Taiwan University Hospital (NTUH). By accessing this system, users can store and manage the experimental sequence data. The greatest achievement of this system is that it integrates the modules of sequence alignment and data compression. By embedding with the NCBI alignment program- blastall, it automatically aligns the uploaded sequences and searches for the corresponding genomic positions. Besides, the system encodes the differences between sequences, effectively compresses them and decreases the demand of storage space. At the same time, it offers a variety of query methods. Users can quickly access the interesting data by inputting the keywords of specimen number, GI and sequence position, etc.	en
dc.description.provenance	Made available in DSpace on 2021-06-15T02:36:10Z (GMT). No. of bitstreams: 1 ntu-98-R96945042-1.pdf: 559426 bytes, checksum: f9b5d09c002132d87f21f8688a09fec7 (MD5) Previous issue date: 2009	en
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 The DNA Sequencing technology 1 1.2 Human Genome Project 3 1.3 Motivation and Objective 4 1.4 Thesis Organization 5 Chapter 2 Background 6 2.1 Genome Database 6 2.2 NCBI Reference sequence 7 2.3 NCBI BLAST Tool 10 2.4 Mega BLAST Search 13 2.5 DNA Compression Algorithm 15 Chapter 3 Methods 17 3.1 System Architecture 17 3.2 Sequence Conversion 19 3.2.1 Sequence Pre-processing 20 3.2.2 Sequence Aligning 22 3.2.3 Sequence Post-Processing 24 3.3 Sequence Retrieve 26 3.3.1 The Utilization of fastacmd 27 3.3.2 Sequence Assembly 27 3.4 The Access to Genome Database 28 3.5 Data Integration 29 Chapter 4 Results and Discussion 31 4.1 System Implementation 31 4.2 The Experimental Results of Compression 33 4.3 Discussion 36 Chapter 5 Conclusion 39 References 40
dc.language.iso	en
dc.subject	資料壓縮	zh_TW
dc.subject	人類基因體計畫	zh_TW
dc.subject	DNA定序	zh_TW
dc.subject	基因體資料庫系統	zh_TW
dc.subject	序列比對	zh_TW
dc.subject	data compression	en
dc.subject	Human Genome Project	en
dc.subject	DNA sequencing	en
dc.subject	genome database system	en
dc.subject	sequence alignment	en
dc.title	具序列比對與資料壓縮功能之基因體資料庫系統設計	zh_TW
dc.title	Design of Genome Database System with the Sequence Aligning and Data Compression Mechanism	en
dc.type	Thesis
dc.date.schoolyear	97-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	簡穎秀,陳俊良,林正偉,李鴻章
dc.subject.keyword	人類基因體計畫,DNA定序,基因體資料庫系統,序列比對,資料壓縮,	zh_TW
dc.subject.keyword	Human Genome Project,DNA sequencing,genome database system,sequence alignment,data compression,	en
dc.relation.page	42
dc.rights.note	有償授權
dc.date.accepted	2009-08-13
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	生醫電子與資訊學研究所	zh_TW
顯示於系所單位：	生醫電子與資訊學研究所

文件中的檔案：

檔案	大小	格式
ntu-98-1.pdf 未授權公開取用	546.31 kB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。